autoscaler: My pod is in the CrashLoopBackOff state after configuring cluster-autoscaler

Which component are you using?: cluster-autoscaler

What version of the component are you using?: v1.17.3

Component version:

What k8s version are you using (kubectl version)?:

kubectl -n kube-system version --short Client Version: v1.21.1 Server Version: v1.17.17-eks-c5067d WARNING: version difference between client (1.21) and server (1.17) exceeds the supported minor version skew of +/-1

What environment is this in?:

aws eks

What did you expect to happen?:

I’m expecting to get my first cluster-autoscaler to be setup and working. Meaning replace my ASG.

What happened instead?:

Getting exactly this error reported by https://aws.amazon.com/premiumsupport/knowledge-center/eks-pod-status-troubleshooting/

$ kubectl describe po crash-app-6847947bf8-28rq6

Name: crash-app-6847947bf8-28rq6 Namespace: default Priority: 0 PriorityClassName: <none> Node: ip-192-168-6-51.us-east-2.compute.internal/192.168.6.51 Start Time: Wed, 22 Jan 2020 08:42:20 +0200 Labels: pod-template-hash=6847947bf8 run=crash-app Annotations: kubernetes.io/psp: eks.privileged Status: Running IP: 192.168.29.73 Controlled By: ReplicaSet/crash-app-6847947bf8 Containers: main: Container ID: docker://6aecdce22adf08de2dbcd48f5d3d8d4f00f8e86bddca03384e482e71b3c20442 Image: alpine Image ID: docker-pullable://alpine@sha256:ab00606a42621fb68f2ed6ad3c88be54397f981a7b70a79db3d1172b11c4367d Port: 80/TCP Host Port: 0/TCP Command: /bin/sleep 1 State: Waiting Reason: CrashLoopBackOff … Events: Type Reason Age From Message

Normal Scheduled 47s default-scheduler Successfully assigned default/crash-app-6847947bf8-28rq6 to ip-192-168-6-51.us-east-2.compute.internal Normal Pulling 28s (x3 over 46s) kubelet, ip-192-168-6-51.us-east-2.compute.internal Pulling image “alpine” Normal Pulled 28s (x3 over 46s) kubelet, ip-192-168-6-51.us-east-2.compute.internal Successfully pulled image “alpine” Normal Created 28s (x3 over 45s) kubelet, ip-192-168-6-51.us-east-2.compute.internal Created container main Normal Started 28s (x3 over 45s) kubelet, ip-192-168-6-51.us-east-2.compute.internal Started container main Warning BackOff 12s (x4 over 42s) kubelet, ip-192-168-6-51.us-east-2.compute.internal Back-off restarting failed container

How to reproduce it (as minimally and precisely as possible): You need to have the same eks . kubectl, cluster-autoscaler versions

kubectl -n kube-system get node -o custom-columns=NAME:.metadata.name,VERSION:.status.nodeInfo.kubeletVersion

NAME VERSION ip-10-44-17-206.us-west-2.compute.internal v1.17.12-eks-7684af ip-10-44-20-171.us-west-2.compute.internal v1.17.12-eks-7684af

About this issue

Original URL
State: closed
Created 3 years ago
Reactions: 6
Comments: 16 (1 by maintainers)

Commits related to this issue

fix: autoscaler now requires 500MiB instead of 300MiB * Fixes: #88 * Original issue: https://github.com/kubernetes/autoscaler/issues/4220 — committed to acidtango/terraform-aws-eks by deleted user 3 years ago
fix: autoscaler now requires 500MiB instead of 300MiB (#89) * Fixes: #88 * Original issue: https://github.com/kubernetes/autoscaler/issues/4220 Co-authored-by: Abel Garcia Dorta <abelgarcia@acidt... — committed to Young-ook/terraform-aws-eks by mercuriete 3 years ago
Fix memory limit and request for cluster autoscaler Update default values for memory consumption to fix this issue: https://github.com/kubernetes/autoscaler/issues/4220 — committed to provectus/sak-scaling by viatcheslavmogilevsky 3 years ago
add cluster autoscaler. but requirements of resource limit had been changed. See https://github.com/kubernetes/autoscaler/issues/4220 — committed to asprin107/k8s-sandbox by asprin107 2 years ago

Most upvoted comments

Hi. was in same situation, do you use requests / limits settings for cluster-autoscaler? About 2 weeks ago, there was enough limit 300M RAM, now 500M is required. If you use lower number, OOM kills appear.

+10

rubroboletus on Jul 26, 2021

This was caused by the incorrect AWS role trust policy - would’ve been a bit easier to debug if there were helpful error messages but my fault for not following the aws instructions carefully.

seunggs on Nov 6, 2021

Same issue here with version v1.20, everything was working fine 1 week ago until I redeploy a new cluster and cluster-autoscaler starts crashing with no apparent reason (even the logs on the deployment/pod doesn’t give any clue on what’s happening)

EKami on Jul 26, 2021

Hi, I am facing the same issue with eks kubernetes version 1.21

cluster-autoscaler-5dd6459897-mpqf8 0/1 CrashLoopBackOff 7 13m

this is the log I see, I applied the changed from the memory from 300Mi to 500Mi but still getting the same error, also open id is in the trusted relationships

goroutine 285 [sync.Cond.Wait]: runtime.goparkunlock(…) /usr/local/go/src/runtime/proc.go:310 sync.runtime_notifyListWait(0xc000b34b40, 0xc000000000) /usr/local/go/src/runtime/sema.go:513 +0xf8 sync.(*Cond).Wait(0xc000b34b30) /usr/local/go/src/sync/cond.go:56 +0x9d golang.org/x/net/http2.(*pipe).Read(0xc000b34b28, 0xc00134b200, 0x200, 0x200, 0x0, 0x0, 0x0) /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/golang.org/x/net/http2/pipe.go:65 +0x8f golang.org/x/net/http2.transportResponseBody.Read(0xc000b34b00, 0xc00134b200, 0x200, 0x200, 0x0, 0x0, 0x0) /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/golang.org/x/net/http2/transport.go:2108 +0xaf encoding/json.(*Decoder).refill(0xc0007f9760, 0xc0009ec2a0, 0x0) /usr/local/go/src/encoding/json/stream.go:165 +0xeb encoding/json.(*Decoder).readValue(0xc0007f9760, 0x0, 0x0, 0x3652400) /usr/local/go/src/encoding/json/stream.go:140 +0x1e8 encoding/json.(*Decoder).Decode(0xc0007f9760, 0x36f3f00, 0xc0009ec2a0, 0x437aa1, 0x3cd95e0) /usr/local/go/src/encoding/json/stream.go:63 +0x79 k8s.io/apimachinery/pkg/util/framer.(*jsonFrameReader).Read(0xc000c2de30, 0xc00067e400, 0x400, 0x400, 0xc000061e10, 0xc000061000, 0x38) /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/util/framer/framer.go:152 +0x1a1 k8s.io/apimachinery/pkg/runtime/serializer/streaming.(*decoder).Decode(0xc000357720, 0x0, 0x44fd580, 0xc0014600c0, 0xc0010f6dc8, 0x41e0d8, 0xc0010f6db0, 0x0, 0x0) /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/runtime/serializer/streaming/streaming.go:77 +0x89 k8s.io/client-go/rest/watch.(*Decoder).Decode(0xc0009ec260, 0xc001226d80, 0x0, 0x0, 0x0, 0x0, 0x0) /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/rest/watch/decoder.go:49 +0x6e k8s.io/apimachinery/pkg/watch.(*StreamWatcher).receive(0xc001460080) /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/watch/streamwatcher.go:105 +0xe5 created by k8s.io/apimachinery/pkg/watch.NewStreamWatcher /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/watch/streamwatcher.go:76 +0xea

GabeOpo on Jul 23, 2022

Still seeing this problem here with no limit (also tried with limits 600MiB as suggested by AWS docs)

Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  10m                  default-scheduler  Successfully assigned kube-system/cluster-autoscaler-release-aws-cluster-autoscaler-6d58fb855g84x to ip-10-0-16-212.us-west-1.compute.internal
  Normal   Pulled     10m                  kubelet            Successfully pulled image "k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.1" in 368.005529ms
  Normal   Pulled     10m                  kubelet            Successfully pulled image "k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.1" in 350.720347ms
  Normal   Pulled     9m43s                kubelet            Successfully pulled image "k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.1" in 368.7982ms
  Normal   Created    9m (x4 over 10m)     kubelet            Created container aws-cluster-autoscaler
  Normal   Pulling    9m (x4 over 10m)     kubelet            Pulling image "k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.1"
  Normal   Pulled     9m                   kubelet            Successfully pulled image "k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.1" in 359.88954ms
  Normal   Started    8m59s (x4 over 10m)  kubelet            Started container aws-cluster-autoscaler
  Warning  BackOff    31s (x44 over 10m)   kubelet            Back-off restarting failed container

Container logs show this repeated go routine:

goroutine 301 [sync.Cond.Wait]:
runtime.goparkunlock(...)
        /usr/local/go/src/runtime/proc.go:310
sync.runtime_notifyListWait(0xc000908b40, 0x0)
        /usr/local/go/src/runtime/sema.go:513 +0xf8
sync.(*Cond).Wait(0xc000908b30)
        /usr/local/go/src/sync/cond.go:56 +0x9d
golang.org/x/net/http2.(*pipe).Read(0xc000908b28, 0xc000266400, 0x200, 0x200, 0x0, 0x0, 0x0)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/golang.org/x/net/http2/pipe.go:65 +0x8f
golang.org/x/net/http2.transportResponseBody.Read(0xc000908b00, 0xc000266400, 0x200, 0x200, 0x0, 0x0, 0x0)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/golang.org/x/net/http2/transport.go:2108 +0xaf
encoding/json.(*Decoder).refill(0xc000908f20, 0xc00103e9e0, 0x0)
        /usr/local/go/src/encoding/json/stream.go:165 +0xeb
encoding/json.(*Decoder).readValue(0xc000908f20, 0x0, 0x0, 0x36423c0)
        /usr/local/go/src/encoding/json/stream.go:140 +0x1e8
encoding/json.(*Decoder).Decode(0xc000908f20, 0x36e3ec0, 0xc00103e9e0, 0x3c0c180, 0x0)
        /usr/local/go/src/encoding/json/stream.go:63 +0x79
k8s.io/apimachinery/pkg/util/framer.(*jsonFrameReader).Read(0xc00105daa0, 0xc00095a400, 0x400, 0x400, 0x10, 0x37d1a20, 0x38)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/util/framer/framer.go:152 +0x1a1
k8s.io/apimachinery/pkg/runtime/serializer/streaming.(*decoder).Decode(0xc001056730, 0x0, 0x44ebba0, 0xc0008279c0, 0xc000ed1f98, 0x15b916f, 0xc000ed1e60, 0x453c8e0, 0xc000058018)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/runtime/serializer/streaming/streaming.go:77 +0x89
k8s.io/client-go/rest/watch.(*Decoder).Decode(0xc00103e9c0, 0x3deeb27, 0x1, 0x0, 0x0, 0x0, 0x1f4)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/rest/watch/decoder.go:49 +0x6e
k8s.io/apimachinery/pkg/watch.(*StreamWatcher).receive(0xc000827980)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/watch/streamwatcher.go:105 +0xe5
created by k8s.io/apimachinery/pkg/watch.NewStreamWatcher
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/watch/streamwatcher.go:76 +0xea

Does anyone have any idea what’s going on here?

seunggs on Nov 6, 2021