autoscaler: failed to renew lease kube-system/cluster-autoscaler: failed to tryAcquireOrRenew
I am running on Kubernetes 12.5 with etcd3 with cluster-autoscaler v1.2.2 (on AWS) and my cluster is running healthy with everything operation. After some scaling activity. cluster autoscaler goes into crash loop with error as following:
F0205 23:32:52.241542 1 main.go:384] lost master
goroutine 1 [running]:
k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/klog.stacks(0xc000022100, 0xc000574000, 0x37, 0xee)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/klog/klog.go:828 +0xd4
k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/klog.(*loggingT).output(0x4333560, 0xc000000003, 0xc00056e000, 0x429c819, 0x7, 0x180, 0x0)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/klog/klog.go:779 +0x306
k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/klog.(*loggingT).printf(0x4333560, 0x3, 0x26f2036, 0xb, 0x0, 0x0, 0x0)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/klog/klog.go:678 +0x14b
k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/klog.Fatalf(0x26f2036, 0xb, 0x0, 0x0, 0x0)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/klog/klog.go:1207 +0x67
main.main.func3()
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:384 +0x47
k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run.func1(0xc000668000)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:163 +0x40
k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run(0xc000668000, 0x29c4b00, 0xc000591dc0)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:172 +0x112
k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection.RunOrDie(0x29c4b40, 0xc000046040, 0x29cbd20, 0xc0001e6a20, 0x37e11d600, 0x2540be400, 0x77359400, 0xc00001f030, 0x27baac0, 0x0, ...)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:184 +0x99
main.main()
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:372 +0x5cf
I0205 23:32:52.241724 1 factory.go:33] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"kube-system", Name:"cluster-autoscaler", UID:"e78ccdca-2440-11e9-8514-0a1153ba0cc4", APIVersion:"v1", ResourceVersion:"6949892", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' cluster-autoscaler-57f79874cf-c45xb stopped leading
I0205 23:32:52.745013 1 auto_scaling_groups.go:124] Registering ASG XXXX
Everything in cluster seem to work perfectly find and masters, cluster and etcd are all healthy.
Is there a way any way to resurrect/resolve this issue?
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 43 (6 by maintainers)
Commits related to this issue
- cluster-autoscaler-autodiscover.yaml sets --leader-elect=false in cluster-autoscaler Deployment (https://github.com/kubernetes/autoscaler/issues/1653) — committed to mr3project/mr3-run-k8s by deleted user 2 years ago
I had a similar issue on my cluster (using EKS):
Then the pod died and restarted, it seems to be an hiccup but I would like to know why that happened.
I have the same problem:
I am running auto scaler version 1.15.6
For what it worth, if I do the following, it will crash less often. I think it really cut down the k8s API call and less chance for crashing.
I have also seen most people are running on replicas(1) of CA and forgetting to check the default value for leader-elect=true according to the FAQs
If this is set to false as replied by @tkbrex , the election process is disabled and we will not see this lost master error.
We’re seeing this on EKS 1.21.
Seeing a weird behaviour with cluster-autoscaler, not sure what’s exactly causing this… Autoscaler Version: 1.21.1 Noticed few number of restarts, not resource limis/request set for CPU.
Describe the cluster-autoscaler pod shows:
Is disabling leader election really recommended? All of the official examples I’m aware of specify
replicas: 1but keep the default value forleader-elect.Even when running
replicas: 1, wouldn’t leader election be necessary during rolling updates of the CA deployment? Otherwise, I would think there’d be periods where you could have multiple CA pods stepping on each other.I had this issue with autoscaler , with CPU limit set to 100m
setting limit to 1 CPU solved the issue (it needs more CPU when it starts) so in my case, it was a CPU throttling and it slowed down autoscaler itself
I have the same issue on EKS 1.19.