cert-manager: cainjector failing frequently
Describe the bug: cainjector 0.11.0 fails frequently with message:
I1114 12:05:33.699752 1 controller.go:242] cert-manager/controller-runtime/controller "level"=1 "msg"="Successfully Reconciled" "controller"="apiservice" "request"={"Namespace":"","Name":"v1.networking.k8s.io"}
I1114 12:05:33.699805 1 controller.go:242] cert-manager/controller-runtime/controller "level"=1 "msg"="Successfully Reconciled" "controller"="apiservice" "request"={"Namespace":"","Name":"v1alpha1.kubeapps.com"}
I1114 12:05:33.699814 1 controller.go:242] cert-manager/controller-runtime/controller "level"=1 "msg"="Successfully Reconciled" "controller"="customresourcedefinition" "request"={"Namespace":"","Name":"issuers.cert-manager.io"}
I1114 12:05:33.699916 1 controller.go:242] cert-manager/controller-runtime/controller "level"=1 "msg"="Successfully Reconciled" "controller"="apiservice" "request"={"Namespace":"","Name":"v1beta1.metrics.k8s.io"}
I1114 12:05:33.700003 1 controller.go:242] cert-manager/controller-runtime/controller "level"=1 "msg"="Successfully Reconciled" "controller"="apiservice" "request"={"Namespace":"","Name":"v1.apiextensions.k8s.io"}
E1114 12:25:09.735275 1 leaderelection.go:365] Failed to update lock: etcdserver: request timed out
I1114 12:25:10.923019 1 leaderelection.go:287] failed to renew lease kube-system/cert-manager-cainjector-leader-election: failed to tryAcquireOrRenew context deadline exceeded
F1114 12:25:10.923143 1 start.go:127] error running manager: leader election lost
$ kubectl -n cert-manager get pod cert-manager-cainjector-576978ffc8-mtg7b -o yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
cni.projectcalico.org/podIP: 10.2.0.178/32
creationTimestamp: "2019-11-07T11:02:50Z"
generateName: cert-manager-cainjector-576978ffc8-
labels:
app: cainjector
app.kubernetes.io/instance: cert-manager
app.kubernetes.io/managed-by: Tiller
app.kubernetes.io/name: cainjector
helm.sh/chart: cainjector-v0.11.0
pod-template-hash: 576978ffc8
name: cert-manager-cainjector-576978ffc8-mtg7b
namespace: cert-manager
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: cert-manager-cainjector-576978ffc8
uid: 4f7bbfc9-dfda-4264-bd06-b722bd88d21b
resourceVersion: "6158194385"
selfLink: /api/v1/namespaces/cert-manager/pods/cert-manager-cainjector-576978ffc8-mtg7b
uid: 052447f0-59af-4a93-bfef-c09c8727610e
spec:
containers:
- args:
- --v=2
- --leader-election-namespace=kube-system
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: quay.io/jetstack/cert-manager-cainjector:v0.11.0
imagePullPolicy: Always
name: cainjector
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: cert-manager-cainjector-token-vbfw7
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: kubernetes-internal-node0
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: cert-manager-cainjector
serviceAccountName: cert-manager-cainjector
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: cert-manager-cainjector-token-vbfw7
secret:
defaultMode: 420
secretName: cert-manager-cainjector-token-vbfw7
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2019-11-07T11:02:50Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2019-11-14T12:25:14Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2019-11-14T12:25:14Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2019-11-07T11:02:50Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://5b2521cd7d0fe3c6a5516aaf90e20a5589e47ef8c04134e51974b724ea7aaff3
image: quay.io/jetstack/cert-manager-cainjector:v0.11.0
imageID: docker-pullable://quay.io/jetstack/cert-manager-cainjector@sha256:cf77d14d1c825190a38ac6b593f591998e3b34464f626f24479eb3e21dd589b3
lastState:
terminated:
containerID: docker://dcba9c63711b87cd1ce348577546ee9d2cde557b0c993866997fb8f24028c9bc
exitCode: 255
finishedAt: "2019-11-14T12:25:10Z"
reason: Error
startedAt: "2019-11-14T12:05:16Z"
name: cainjector
ready: true
restartCount: 323
started: true
state:
running:
startedAt: "2019-11-14T12:25:13Z"
hostIP: 51.83.15.9
phase: Running
podIP: 10.2.0.178
podIPs:
- ip: 10.2.0.178
qosClass: BestEffort
startTime: "2019-11-07T11:02:50Z"
Happens quite frequently:
$ kubectl get pods -A|grep cert-manager
cert-manager cert-manager-55c44f98f-dghd7 1/1 Running 0 7d1h
cert-manager cert-manager-cainjector-576978ffc8-mtg7b 1/1 Running 323 7d1h
cert-manager cert-manager-webhook-c67fbc858-gbn8w 1/1 Running 1 7d1h
Expected behaviour: cainjector container should run without frequent failures/restarts.
Steps to reproduce the bug: Install cert-manager following official documentation instructions, configure letsencrypt, use it for some ingresses.
Environment details::
- Kubernetes version (e.g. v1.10.2):
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-13T11:23:11Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-15T19:09:08Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
- Cloud-provider/provisioner (e.g. GKE, kops AWS, etc): OVH Managed Kubernetes on public cloud (https://www.ovh.com/world/public-cloud/kubernetes/)
- cert-manager version (e.g. v0.4.0): 0.11.0
- Install method (e.g. helm or static manifests): manifests (CustomResourceDefinition, namespace) + helm
/kind bug
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 17
- Comments: 31 (7 by maintainers)
We saw the same issue, when we set the
--leader-electflag tofalsethe error doesn’t occur. We are running cainjector with a replica count of 1. The relevant part from the values file for the Helm chart looks as follows:Maybe this helps someone. Does someone have an idea how to solve this for a higher replica count?
I have the same issue using cert manager v0.13.1 when etcd has long timeout. Turning off leader election seems to fix the issue but is there any down side of doing that?
Just throwing it out, but I am running v1.5.3 straight from the Helm chart and I found this crash.
The strange thing is, I just discovered this issue today, but it appears this pod crashed ~6 months ago and did not realize it, nor did it restart properly. I’m NOT running more than one pod of anything:
I’m going to upgrade to the current version and cross my fingers this is solved. But from what I’ve seen, most of the time restarting is enough to get things running again (for a while).
Thanks!
I also noticed some other error messages in caininjector container logs.