autoscaler: My pod is in the CrashLoopBackOff state after configuring cluster-autoscaler
Which component are you using?: cluster-autoscaler
What version of the component are you using?: v1.17.3
Component version:
What k8s version are you using (kubectl version)?:
kubectl -n kube-system version --short Client Version: v1.21.1 Server Version: v1.17.17-eks-c5067d WARNING: version difference between client (1.21) and server (1.17) exceeds the supported minor version skew of +/-1
What environment is this in?:
aws eks
What did you expect to happen?:
I’m expecting to get my first cluster-autoscaler to be setup and working. Meaning replace my ASG.
What happened instead?:
Getting exactly this error reported by https://aws.amazon.com/premiumsupport/knowledge-center/eks-pod-status-troubleshooting/
$ kubectl describe po crash-app-6847947bf8-28rq6
Name: crash-app-6847947bf8-28rq6 Namespace: default Priority: 0 PriorityClassName: <none> Node: ip-192-168-6-51.us-east-2.compute.internal/192.168.6.51 Start Time: Wed, 22 Jan 2020 08:42:20 +0200 Labels: pod-template-hash=6847947bf8 run=crash-app Annotations: kubernetes.io/psp: eks.privileged Status: Running IP: 192.168.29.73 Controlled By: ReplicaSet/crash-app-6847947bf8 Containers: main: Container ID: docker://6aecdce22adf08de2dbcd48f5d3d8d4f00f8e86bddca03384e482e71b3c20442 Image: alpine Image ID: docker-pullable://alpine@sha256:ab00606a42621fb68f2ed6ad3c88be54397f981a7b70a79db3d1172b11c4367d Port: 80/TCP Host Port: 0/TCP Command: /bin/sleep 1 State: Waiting Reason: CrashLoopBackOff … Events: Type Reason Age From Message
Normal Scheduled 47s default-scheduler Successfully assigned default/crash-app-6847947bf8-28rq6 to ip-192-168-6-51.us-east-2.compute.internal Normal Pulling 28s (x3 over 46s) kubelet, ip-192-168-6-51.us-east-2.compute.internal Pulling image “alpine” Normal Pulled 28s (x3 over 46s) kubelet, ip-192-168-6-51.us-east-2.compute.internal Successfully pulled image “alpine” Normal Created 28s (x3 over 45s) kubelet, ip-192-168-6-51.us-east-2.compute.internal Created container main Normal Started 28s (x3 over 45s) kubelet, ip-192-168-6-51.us-east-2.compute.internal Started container main Warning BackOff 12s (x4 over 42s) kubelet, ip-192-168-6-51.us-east-2.compute.internal Back-off restarting failed container
How to reproduce it (as minimally and precisely as possible): You need to have the same eks . kubectl, cluster-autoscaler versions
kubectl -n kube-system version --short Client Version: v1.21.1 Server Version: v1.17.17-eks-c5067d WARNING: version difference between client (1.21) and server (1.17) exceeds the supported minor version skew of +/-1
kubectl -n kube-system get node -o custom-columns=NAME:.metadata.name,VERSION:.status.nodeInfo.kubeletVersion
NAME VERSION ip-10-44-17-206.us-west-2.compute.internal v1.17.12-eks-7684af ip-10-44-20-171.us-west-2.compute.internal v1.17.12-eks-7684af
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 6
- Comments: 16 (1 by maintainers)
Commits related to this issue
- fix: autoscaler now requires 500MiB instead of 300MiB * Fixes: #88 * Original issue: https://github.com/kubernetes/autoscaler/issues/4220 — committed to acidtango/terraform-aws-eks by deleted user 3 years ago
- fix: autoscaler now requires 500MiB instead of 300MiB (#89) * Fixes: #88 * Original issue: https://github.com/kubernetes/autoscaler/issues/4220 Co-authored-by: Abel Garcia Dorta <abelgarcia@acidt... — committed to Young-ook/terraform-aws-eks by mercuriete 3 years ago
- Fix memory limit and request for cluster autoscaler Update default values for memory consumption to fix this issue: https://github.com/kubernetes/autoscaler/issues/4220 — committed to provectus/sak-scaling by viatcheslavmogilevsky 3 years ago
- add cluster autoscaler. but requirements of resource limit had been changed. See https://github.com/kubernetes/autoscaler/issues/4220 — committed to asprin107/k8s-sandbox by asprin107 2 years ago
Hi. was in same situation, do you use requests / limits settings for cluster-autoscaler? About 2 weeks ago, there was enough limit 300M RAM, now 500M is required. If you use lower number, OOM kills appear.
This was caused by the incorrect AWS role trust policy - would’ve been a bit easier to debug if there were helpful error messages but my fault for not following the aws instructions carefully.
Same issue here with version
v1.20, everything was working fine 1 week ago until I redeploy a new cluster and cluster-autoscaler starts crashing with no apparent reason (even the logs on thedeployment/poddoesn’t give any clue on what’s happening)Hi, I am facing the same issue with eks kubernetes version 1.21
cluster-autoscaler-5dd6459897-mpqf8 0/1 CrashLoopBackOff 7 13m
this is the log I see, I applied the changed from the memory from 300Mi to 500Mi but still getting the same error, also open id is in the trusted relationships
goroutine 285 [sync.Cond.Wait]: runtime.goparkunlock(…) /usr/local/go/src/runtime/proc.go:310 sync.runtime_notifyListWait(0xc000b34b40, 0xc000000000) /usr/local/go/src/runtime/sema.go:513 +0xf8 sync.(*Cond).Wait(0xc000b34b30) /usr/local/go/src/sync/cond.go:56 +0x9d golang.org/x/net/http2.(*pipe).Read(0xc000b34b28, 0xc00134b200, 0x200, 0x200, 0x0, 0x0, 0x0) /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/golang.org/x/net/http2/pipe.go:65 +0x8f golang.org/x/net/http2.transportResponseBody.Read(0xc000b34b00, 0xc00134b200, 0x200, 0x200, 0x0, 0x0, 0x0) /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/golang.org/x/net/http2/transport.go:2108 +0xaf encoding/json.(*Decoder).refill(0xc0007f9760, 0xc0009ec2a0, 0x0) /usr/local/go/src/encoding/json/stream.go:165 +0xeb encoding/json.(*Decoder).readValue(0xc0007f9760, 0x0, 0x0, 0x3652400) /usr/local/go/src/encoding/json/stream.go:140 +0x1e8 encoding/json.(*Decoder).Decode(0xc0007f9760, 0x36f3f00, 0xc0009ec2a0, 0x437aa1, 0x3cd95e0) /usr/local/go/src/encoding/json/stream.go:63 +0x79 k8s.io/apimachinery/pkg/util/framer.(*jsonFrameReader).Read(0xc000c2de30, 0xc00067e400, 0x400, 0x400, 0xc000061e10, 0xc000061000, 0x38) /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/util/framer/framer.go:152 +0x1a1 k8s.io/apimachinery/pkg/runtime/serializer/streaming.(*decoder).Decode(0xc000357720, 0x0, 0x44fd580, 0xc0014600c0, 0xc0010f6dc8, 0x41e0d8, 0xc0010f6db0, 0x0, 0x0) /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/runtime/serializer/streaming/streaming.go:77 +0x89 k8s.io/client-go/rest/watch.(*Decoder).Decode(0xc0009ec260, 0xc001226d80, 0x0, 0x0, 0x0, 0x0, 0x0) /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/rest/watch/decoder.go:49 +0x6e k8s.io/apimachinery/pkg/watch.(*StreamWatcher).receive(0xc001460080) /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/watch/streamwatcher.go:105 +0xe5 created by k8s.io/apimachinery/pkg/watch.NewStreamWatcher /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/watch/streamwatcher.go:76 +0xea
Still seeing this problem here with no limit (also tried with limits 600MiB as suggested by AWS docs)
Container logs show this repeated go routine:
Does anyone have any idea what’s going on here?