cert-manager: Leader election timeout (?) causes exit

I’ve been seeing this over a few clusters (v1.6.1)

E0304 04:54:00.561791       1 leaderelection.go:367] Failed to update lock: Put "https://10.128.0.1:443/api/v1/namespaces/kube-system/configmaps/cert-manager-controller": context deadline exceeded
I0304 04:54:00.564601       1 leaderelection.go:283] failed to renew lease kube-system/cert-manager-controller: timed out waiting for the condition
E0304 04:54:00.841843       1 leaderelection.go:306] Failed to release lock: Operation cannot be fulfilled on configmaps "cert-manager-controller": the object has been modified; please apply your changes to the latest version and try again
I0304 04:54:00.843278       1 controller.go:126] cert-manager/controller/certificaterequests-issuer-ca "msg"="shutting down queue as workqueue signaled shutdown"
I0304 04:54:00.843603       1 controller.go:126] cert-manager/controller/certificaterequests-issuer-acme "msg"="shutting down queue as workqueue signaled shutdown"
I0304 04:54:00.843694       1 controller.go:126] cert-manager/controller/certificates-request-manager "msg"="shutting down queue as workqueue signaled shutdown"
I0304 04:54:00.843746       1 controller.go:126] cert-manager/controller/certificates-issuing "msg"="shutting down queue as workqueue signaled shutdown"
I0304 04:54:00.843912       1 controller.go:126] cert-manager/controller/certificaterequests-issuer-vault "msg"="shutting down queue as workqueue signaled shutdown"
I0304 04:54:00.843968       1 controller.go:126] cert-manager/controller/certificates-trigger "msg"="shutting down queue as workqueue signaled shutdown"
I0304 04:54:00.844003       1 controller.go:126] cert-manager/controller/certificaterequests-issuer-selfsigned "msg"="shutting down queue as workqueue signaled shutdown"
I0304 04:54:00.844055       1 controller.go:126] cert-manager/controller/certificates-revision-manager "msg"="shutting down queue as workqueue signaled shutdown"
I0304 04:54:00.844112       1 controller.go:126] cert-manager/controller/certificates-metrics "msg"="shutting down queue as workqueue signaled shutdown"
I0304 04:54:00.846530       1 controller.go:126] cert-manager/controller/issuers "msg"="shutting down queue as workqueue signaled shutdown"
I0304 04:54:00.846724       1 controller.go:126] cert-manager/controller/challenges "msg"="shutting down queue as workqueue signaled shutdown"
I0304 04:54:00.846835       1 controller.go:126] cert-manager/controller/orders "msg"="shutting down queue as workqueue signaled shutdown"
I0304 04:54:00.846897       1 controller.go:126] cert-manager/controller/ingress-shim "msg"="shutting down queue as workqueue signaled shutdown"
I0304 04:54:00.846945       1 controller.go:126] cert-manager/controller/certificates-key-manager "msg"="shutting down queue as workqueue signaled shutdown"
I0304 04:54:00.847006       1 controller.go:126] cert-manager/controller/certificates-readiness "msg"="shutting down queue as workqueue signaled shutdown"
I0304 04:54:00.847049       1 controller.go:126] cert-manager/controller/certificaterequests-approver "msg"="shutting down queue as workqueue signaled shutdown"
I0304 04:54:00.847207       1 controller.go:126] cert-manager/controller/clusterissuers "msg"="shutting down queue as workqueue signaled shutdown"
I0304 04:54:00.847269       1 controller.go:126] cert-manager/controller/certificaterequests-issuer-venafi "msg"="shutting down queue as workqueue signaled shutdown"
E0304 04:54:00.852721       1 main.go:39] cert-manager "msg"="error while executing" "error"="error starting controller: leader election lost"

It seems to corrospond to periods of time in which the kube api server is taking longer than usual to respond. These usually last a couple of minutes at most, People using managed clusters (LKE, GKE, etc) don’t tend to have control over this as the master node is provided by their cloud vendor.

There seems to have been a few issues that have also had this (or similar) including most similarly #2362

I suspect a portion of api servers high load is caused by #3766 (amongst other things which should be resolved in v1.18 😃 )

Is there anything further that a more experienced (with cert-manager) eye spots here?

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 26 (3 by maintainers)

Most upvoted comments

I’m receiving a similar error on GKE 1.22 using cert-manager v1.8.2:

I1029 18:39:59.076049       1 conditions.go:261] Setting lastTransitionTime for CertificateRequest "config-manager-web-tls-q9x5l" condition "Ready" to 2022-10-29 18:39:59.076038893 +0000 UTC m=+439.763801627
I1029 18:40:00.846227       1 acme.go:216] cert-manager/certificaterequests-issuer-acme/sign "msg"="certificate issued" "related_resource_kind"="Order" "related_resource_name"="config-manager-web-tls-q9x5l-630819962" "related_resource_namespace"="config-manager" "related_resource_version"="v1" "resource_kind"="CertificateRequest" "resource_name"="config-manager-web-tls-q9x5l" "resource_namespace"="config-manager" "resource_version"="v1" 
I1029 18:40:00.846405       1 conditions.go:250] Found status change for CertificateRequest "config-manager-web-tls-q9x5l" condition "Ready": "False" -> "True"; setting lastTransitionTime to 2022-10-29 18:40:00.846397214 +0000 UTC m=+441.534159942
I1029 18:40:00.932350       1 controller.go:161] cert-manager/certificates-issuing "msg"="re-queuing item due to optimistic locking on resource" "key"="config-manager/config-manager-web-tls" "error"="Operation cannot be fulfilled on certificates.cert-manager.io \"config-manager-web-tls\": the object has been modified; please apply your changes to the latest version and try again"
I1029 18:40:01.001631       1 controller.go:161] cert-manager/certificates-key-manager "msg"="re-queuing item due to optimistic locking on resource" "key"="config-manager/config-manager-web-tls" "error"="Operation cannot be fulfilled on certificates.cert-manager.io \"config-manager-web-tls\": the object has been modified; please apply your changes to the latest version and try again"
W1031 10:44:00.583847       1 reflector.go:442] k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: watch of *v1.Secret ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W1031 10:44:00.583843       1 reflector.go:442] k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: watch of *v1.Secret ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W1031 10:44:00.583848       1 reflector.go:442] k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W1031 10:44:00.583842       1 reflector.go:442] k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: watch of *v1.Ingress ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W1031 10:44:00.583923       1 reflector.go:442] k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: watch of *v1.Challenge ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W1031 10:44:00.583925       1 reflector.go:442] k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: watch of *v1.Issuer ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W1031 10:44:00.583931       1 reflector.go:442] k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: watch of *v1.Order ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W1031 10:44:00.583944       1 reflector.go:442] k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: watch of *v1.Certificate ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W1031 10:44:00.583943       1 reflector.go:442] k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: watch of *v1.ClusterIssuer ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W1031 10:44:00.583965       1 reflector.go:442] k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: watch of *v1.Pod ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W1031 10:44:00.583967       1 reflector.go:442] k8s.io/client-go@v0.23.4/tools/cache/reflector.go:167: watch of *v1.CertificateRequest ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
E1031 10:44:00.584202       1 leaderelection.go:330] error retrieving resource lock kube-system/cert-manager-controller: Get "https://10.92.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cert-manager-controller": http2: client connection lost
I1031 10:44:10.583734       1 leaderelection.go:283] failed to renew lease kube-system/cert-manager-controller: timed out waiting for the condition
E1031 10:44:10.583898       1 leaderelection.go:306] Failed to release lock: resource name may not be empty
I1031 10:44:10.584307       1 controller.go:126] cert-manager/ingress-shim "msg"="shutting down queue as workqueue signaled shutdown"  
I1031 10:44:10.584362       1 controller.go:126] cert-manager/certificates-metrics "msg"="shutting down queue as workqueue signaled shutdown"  
I1031 10:44:10.584381       1 controller.go:126] cert-manager/certificaterequests-issuer-ca "msg"="shutting down queue as workqueue signaled shutdown"  
I1031 10:44:10.584401       1 controller.go:126] cert-manager/certificaterequests-approver "msg"="shutting down queue as workqueue signaled shutdown"  
I1031 10:44:10.584468       1 controller.go:126] cert-manager/certificates-trigger "msg"="shutting down queue as workqueue signaled shutdown"  
I1031 10:44:10.584489       1 controller.go:126] cert-manager/certificates-revision-manager "msg"="shutting down queue as workqueue signaled shutdown"  
I1031 10:44:10.584494       1 controller.go:126] cert-manager/certificates-key-manager "msg"="shutting down queue as workqueue signaled shutdown"  
I1031 10:44:10.584511       1 controller.go:126] cert-manager/orders "msg"="shutting down queue as workqueue signaled shutdown"  
I1031 10:44:10.584522       1 controller.go:126] cert-manager/certificaterequests-issuer-selfsigned "msg"="shutting down queue as workqueue signaled shutdown"  
I1031 10:44:10.584621       1 controller.go:126] cert-manager/certificates-readiness "msg"="shutting down queue as workqueue signaled shutdown"  
I1031 10:44:10.584512       1 controller.go:126] cert-manager/certificaterequests-issuer-vault "msg"="shutting down queue as workqueue signaled shutdown"  
I1031 10:44:10.584642       1 controller.go:126] cert-manager/certificates-issuing "msg"="shutting down queue as workqueue signaled shutdown"  
I1031 10:44:10.584646       1 controller.go:126] cert-manager/certificaterequests-issuer-acme "msg"="shutting down queue as workqueue signaled shutdown"  
I1031 10:44:10.584669       1 controller.go:126] cert-manager/issuers "msg"="shutting down queue as workqueue signaled shutdown"  
I1031 10:44:10.584679       1 controller.go:126] cert-manager/challenges "msg"="shutting down queue as workqueue signaled shutdown"  
I1031 10:44:10.584672       1 controller.go:126] cert-manager/certificates-request-manager "msg"="shutting down queue as workqueue signaled shutdown"  
I1031 10:44:10.584691       1 controller.go:126] cert-manager/clusterissuers "msg"="shutting down queue as workqueue signaled shutdown"  
I1031 10:44:10.584309       1 controller.go:126] cert-manager/certificaterequests-issuer-venafi "msg"="shutting down queue as workqueue signaled shutdown"  
E1031 10:44:10.585376       1 main.go:39] cert-manager "msg"="error while executing" "error"="error starting controller: leader election lost"  
k logs --previous cert-manager-cainjector-5ff98c66d-zl2h7 -n cert-manager

I1031 10:47:02.512488       1 start.go:126] "starting" version="v1.8.2" revision="f1943433be7056804e4f628ff0d6685a132c407b"
E1031 10:47:12.517504       1 logr.go:265] cert-manager "msg"="Failed to get API Group-Resources" "error"="Get \"https://10.92.0.1:443/api?timeout=32s\": net/http: TLS handshake timeout"  
Error: error creating manager: Get "https://10.92.0.1:443/api?timeout=32s": net/http: TLS handshake timeout
Usage:
  ca-injector [flags]

Flags:
      --add_dir_header                            If true, adds the file directory to the header of the log messages
      --alsologtostderr                           log to standard error as well as files
      --enable-profiling                          Enable profiling for cainjector
      --feature-gates mapStringBool               A set of key=value pairs that describe feature gates for alpha/experimental features. Options are:
                                                  AllAlpha=true|false (ALPHA - default=false)
                                                  AllBeta=true|false (BETA - default=false)
  -h, --help                                      help for ca-injector
      --kubeconfig string                         Paths to a kubeconfig. Only required if out-of-cluster.
      --leader-elect                              If true, cainjector will perform leader election between instances to ensure no more than one instance of cainjector operates at a time (default true)
      --leader-election-lease-duration duration   The duration that non-leader candidates will wait after observing a leadership renewal until attempting to acquire leadership of a led but unrenewed leader slot. This is effectively the maximum duration that a leader can be stopped before it is replaced by another candidate. This is only applicable if leader election is enabled. (default 1m0s)
      --leader-election-namespace string          Namespace used to perform leader election. Only used if leader election is enabled (default "kube-system")
      --leader-election-renew-deadline duration   The interval between attempts by the acting master to renew a leadership slot before it stops leading. This must be less than or equal to the lease duration. This is only applicable if leader election is enabled. (default 40s)
      --leader-election-retry-period duration     The duration the clients should wait between attempting acquisition and renewal of a leadership. This is only applicable if leader election is enabled. (default 15s)
      --log-flush-frequency duration              Maximum number of seconds between log flushes (default 5s)
      --log_backtrace_at traceLocation            when logging hits line file:N, emit a stack trace (default :0)
      --log_dir string                            If non-empty, write log files in this directory
      --log_file string                           If non-empty, use this log file
      --log_file_max_size uint                    Defines the maximum size a log file can grow to. Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
      --logtostderr                               log to standard error instead of files (default true)
      --namespace string                          If set, this limits the scope of cainjector to a single namespace. If set, cainjector will not update resources with certificates outside of the configured namespace.
      --one_output                                If true, only write logs to their native severity level (vs also writing to each lower severity level)
      --profiler-address string                   Address of the Go profiler (pprof) if enabled. This should never be exposed on a public interface. (default "localhost:6060")
      --skip_headers                              If true, avoid header prefixes in the log messages
      --skip_log_headers                          If true, avoid headers when opening log files
      --stderrthreshold severity                  logs at or above this threshold go to stderr (default 2)
  -v, --v Level                                   number for the log level verbosity (default 0)
      --vmodule moduleSpec                        comma-separated list of pattern=N settings for file-filtered logging

error creating manager: Get "https://10.92.0.1:443/api?timeout=32s": net/http: TLS handshake timeout

I’m using k3s 1.23.x and experiencing the same issue since v.1.6.0. I also tried setting extraArgs and related hotfixes. I’ve updated to v1.8.0 and without success.

Can I provide you any more info for investigating this issue?