autoscaler: My pod is in the CrashLoopBackOff state after configuring cluster-autoscaler

Which component are you using?: cluster-autoscaler

What version of the component are you using?: v1.17.3

Component version:

What k8s version are you using (kubectl version)?:

kubectl -n kube-system version --short Client Version: v1.21.1 Server Version: v1.17.17-eks-c5067d WARNING: version difference between client (1.21) and server (1.17) exceeds the supported minor version skew of +/-1

What environment is this in?:

aws eks

What did you expect to happen?:

I’m expecting to get my first cluster-autoscaler to be setup and working. Meaning replace my ASG.

What happened instead?:

Getting exactly this error reported by https://aws.amazon.com/premiumsupport/knowledge-center/eks-pod-status-troubleshooting/

$ kubectl describe po crash-app-6847947bf8-28rq6

Name: crash-app-6847947bf8-28rq6 Namespace: default Priority: 0 PriorityClassName: <none> Node: ip-192-168-6-51.us-east-2.compute.internal/192.168.6.51 Start Time: Wed, 22 Jan 2020 08:42:20 +0200 Labels: pod-template-hash=6847947bf8 run=crash-app Annotations: kubernetes.io/psp: eks.privileged Status: Running IP: 192.168.29.73 Controlled By: ReplicaSet/crash-app-6847947bf8 Containers: main: Container ID: docker://6aecdce22adf08de2dbcd48f5d3d8d4f00f8e86bddca03384e482e71b3c20442 Image: alpine Image ID: docker-pullable://alpine@sha256:ab00606a42621fb68f2ed6ad3c88be54397f981a7b70a79db3d1172b11c4367d Port: 80/TCP Host Port: 0/TCP Command: /bin/sleep 1 State: Waiting Reason: CrashLoopBackOff … Events: Type Reason Age From Message


Normal Scheduled 47s default-scheduler Successfully assigned default/crash-app-6847947bf8-28rq6 to ip-192-168-6-51.us-east-2.compute.internal Normal Pulling 28s (x3 over 46s) kubelet, ip-192-168-6-51.us-east-2.compute.internal Pulling image “alpine” Normal Pulled 28s (x3 over 46s) kubelet, ip-192-168-6-51.us-east-2.compute.internal Successfully pulled image “alpine” Normal Created 28s (x3 over 45s) kubelet, ip-192-168-6-51.us-east-2.compute.internal Created container main Normal Started 28s (x3 over 45s) kubelet, ip-192-168-6-51.us-east-2.compute.internal Started container main Warning BackOff 12s (x4 over 42s) kubelet, ip-192-168-6-51.us-east-2.compute.internal Back-off restarting failed container

How to reproduce it (as minimally and precisely as possible): You need to have the same eks . kubectl, cluster-autoscaler versions

kubectl -n kube-system version --short Client Version: v1.21.1 Server Version: v1.17.17-eks-c5067d WARNING: version difference between client (1.21) and server (1.17) exceeds the supported minor version skew of +/-1

kubectl -n kube-system get node -o custom-columns=NAME:.metadata.name,VERSION:.status.nodeInfo.kubeletVersion

NAME VERSION ip-10-44-17-206.us-west-2.compute.internal v1.17.12-eks-7684af ip-10-44-20-171.us-west-2.compute.internal v1.17.12-eks-7684af

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 6
  • Comments: 16 (1 by maintainers)

Commits related to this issue

Most upvoted comments

Hi. was in same situation, do you use requests / limits settings for cluster-autoscaler? About 2 weeks ago, there was enough limit 300M RAM, now 500M is required. If you use lower number, OOM kills appear.

This was caused by the incorrect AWS role trust policy - would’ve been a bit easier to debug if there were helpful error messages but my fault for not following the aws instructions carefully.

Same issue here with version v1.20, everything was working fine 1 week ago until I redeploy a new cluster and cluster-autoscaler starts crashing with no apparent reason (even the logs on the deployment/pod doesn’t give any clue on what’s happening)

Hi, I am facing the same issue with eks kubernetes version 1.21

cluster-autoscaler-5dd6459897-mpqf8 0/1 CrashLoopBackOff 7 13m

this is the log I see, I applied the changed from the memory from 300Mi to 500Mi but still getting the same error, also open id is in the trusted relationships

goroutine 285 [sync.Cond.Wait]: runtime.goparkunlock(…) /usr/local/go/src/runtime/proc.go:310 sync.runtime_notifyListWait(0xc000b34b40, 0xc000000000) /usr/local/go/src/runtime/sema.go:513 +0xf8 sync.(*Cond).Wait(0xc000b34b30) /usr/local/go/src/sync/cond.go:56 +0x9d golang.org/x/net/http2.(*pipe).Read(0xc000b34b28, 0xc00134b200, 0x200, 0x200, 0x0, 0x0, 0x0) /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/golang.org/x/net/http2/pipe.go:65 +0x8f golang.org/x/net/http2.transportResponseBody.Read(0xc000b34b00, 0xc00134b200, 0x200, 0x200, 0x0, 0x0, 0x0) /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/golang.org/x/net/http2/transport.go:2108 +0xaf encoding/json.(*Decoder).refill(0xc0007f9760, 0xc0009ec2a0, 0x0) /usr/local/go/src/encoding/json/stream.go:165 +0xeb encoding/json.(*Decoder).readValue(0xc0007f9760, 0x0, 0x0, 0x3652400) /usr/local/go/src/encoding/json/stream.go:140 +0x1e8 encoding/json.(*Decoder).Decode(0xc0007f9760, 0x36f3f00, 0xc0009ec2a0, 0x437aa1, 0x3cd95e0) /usr/local/go/src/encoding/json/stream.go:63 +0x79 k8s.io/apimachinery/pkg/util/framer.(*jsonFrameReader).Read(0xc000c2de30, 0xc00067e400, 0x400, 0x400, 0xc000061e10, 0xc000061000, 0x38) /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/util/framer/framer.go:152 +0x1a1 k8s.io/apimachinery/pkg/runtime/serializer/streaming.(*decoder).Decode(0xc000357720, 0x0, 0x44fd580, 0xc0014600c0, 0xc0010f6dc8, 0x41e0d8, 0xc0010f6db0, 0x0, 0x0) /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/runtime/serializer/streaming/streaming.go:77 +0x89 k8s.io/client-go/rest/watch.(*Decoder).Decode(0xc0009ec260, 0xc001226d80, 0x0, 0x0, 0x0, 0x0, 0x0) /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/rest/watch/decoder.go:49 +0x6e k8s.io/apimachinery/pkg/watch.(*StreamWatcher).receive(0xc001460080) /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/watch/streamwatcher.go:105 +0xe5 created by k8s.io/apimachinery/pkg/watch.NewStreamWatcher /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/watch/streamwatcher.go:76 +0xea

Still seeing this problem here with no limit (also tried with limits 600MiB as suggested by AWS docs)

Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  10m                  default-scheduler  Successfully assigned kube-system/cluster-autoscaler-release-aws-cluster-autoscaler-6d58fb855g84x to ip-10-0-16-212.us-west-1.compute.internal
  Normal   Pulled     10m                  kubelet            Successfully pulled image "k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.1" in 368.005529ms
  Normal   Pulled     10m                  kubelet            Successfully pulled image "k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.1" in 350.720347ms
  Normal   Pulled     9m43s                kubelet            Successfully pulled image "k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.1" in 368.7982ms
  Normal   Created    9m (x4 over 10m)     kubelet            Created container aws-cluster-autoscaler
  Normal   Pulling    9m (x4 over 10m)     kubelet            Pulling image "k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.1"
  Normal   Pulled     9m                   kubelet            Successfully pulled image "k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.1" in 359.88954ms
  Normal   Started    8m59s (x4 over 10m)  kubelet            Started container aws-cluster-autoscaler
  Warning  BackOff    31s (x44 over 10m)   kubelet            Back-off restarting failed container

Container logs show this repeated go routine:

goroutine 301 [sync.Cond.Wait]:
runtime.goparkunlock(...)
        /usr/local/go/src/runtime/proc.go:310
sync.runtime_notifyListWait(0xc000908b40, 0x0)
        /usr/local/go/src/runtime/sema.go:513 +0xf8
sync.(*Cond).Wait(0xc000908b30)
        /usr/local/go/src/sync/cond.go:56 +0x9d
golang.org/x/net/http2.(*pipe).Read(0xc000908b28, 0xc000266400, 0x200, 0x200, 0x0, 0x0, 0x0)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/golang.org/x/net/http2/pipe.go:65 +0x8f
golang.org/x/net/http2.transportResponseBody.Read(0xc000908b00, 0xc000266400, 0x200, 0x200, 0x0, 0x0, 0x0)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/golang.org/x/net/http2/transport.go:2108 +0xaf
encoding/json.(*Decoder).refill(0xc000908f20, 0xc00103e9e0, 0x0)
        /usr/local/go/src/encoding/json/stream.go:165 +0xeb
encoding/json.(*Decoder).readValue(0xc000908f20, 0x0, 0x0, 0x36423c0)
        /usr/local/go/src/encoding/json/stream.go:140 +0x1e8
encoding/json.(*Decoder).Decode(0xc000908f20, 0x36e3ec0, 0xc00103e9e0, 0x3c0c180, 0x0)
        /usr/local/go/src/encoding/json/stream.go:63 +0x79
k8s.io/apimachinery/pkg/util/framer.(*jsonFrameReader).Read(0xc00105daa0, 0xc00095a400, 0x400, 0x400, 0x10, 0x37d1a20, 0x38)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/util/framer/framer.go:152 +0x1a1
k8s.io/apimachinery/pkg/runtime/serializer/streaming.(*decoder).Decode(0xc001056730, 0x0, 0x44ebba0, 0xc0008279c0, 0xc000ed1f98, 0x15b916f, 0xc000ed1e60, 0x453c8e0, 0xc000058018)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/runtime/serializer/streaming/streaming.go:77 +0x89
k8s.io/client-go/rest/watch.(*Decoder).Decode(0xc00103e9c0, 0x3deeb27, 0x1, 0x0, 0x0, 0x0, 0x1f4)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/rest/watch/decoder.go:49 +0x6e
k8s.io/apimachinery/pkg/watch.(*StreamWatcher).receive(0xc000827980)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/watch/streamwatcher.go:105 +0xe5
created by k8s.io/apimachinery/pkg/watch.NewStreamWatcher
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/watch/streamwatcher.go:76 +0xea

Does anyone have any idea what’s going on here?