autoscaler: Cluster Autoscaler on AWS is OOM killed on startup in GenerateEC2InstanceTypes

We noticed our cluster autoscaler occasionally getting OOM killed on startup or when elected as leader. The memory usage spike on startup is fairly consistent even when not OOM killed, sitting just below the default limits at 250Mi or so. When it doesn’t OOM, this memory is eventually garbage collected and the autoscaler stabilizes at well under 100Mi used:

After a pprof trace (requiring an ad-hoc upgrade to cluster-autoscaler v1.18.2 to get the --profiling flag) we noticed a large chunk of memory allocated in the GenerateEC2InstanceTypes function. We were able to trace this back to PR #2249 which fetches an updated list of EC2 instance types from an AWS-hosted JSON file. Surprisingly, this file is 94 MiB, the entirety of which is fetched onto the heap before parsing. The data extracted is fairly small (under 43KiB per ec2_instance_types.go) but unfortunately the allocations sometimes live long enough to push the autoscaler over the (default) memory limit.

Additionally, with the --aws-use-static-instance-list=true flag set, the memory spike disappears:

Is there some solution that could fetch the updated list without requiring an otherwise unnecessary memory limit increase? Given the autoscaler’s special priority class, raising the limit well beyond what it actually needs at runtime feels a bit wrong.

Additional information:

autoscaler image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.16.6

Kubernetes version:

Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:58:59Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.13-eks-2ba888", GitCommit:"2ba888155c7f8093a1bc06e3336333fbdb27b3da", GitTreeState:"clean", BuildDate:"2020-07-17T18:48:53Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

pprof svg: cluster-autoscaler-pprof.tar.gz (.svg in a tarball to satisfy GitHub)

kubectl describe pod output:

Name:                 cluster-autoscaler-7b9c56647d-9v8pr
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 ip-192-168-125-18.us-west-2.compute.internal/192.168.125.18
Start Time:           Tue, 08 Sep 2020 05:18:08 -0600
Labels:               app=cluster-autoscaler
                      app.kubernetes.io/instance=cluster-autoscaler
                      app.kubernetes.io/name=cluster-autoscaler
                      pod-template-hash=7b9c56647d
Annotations:          cluster-autoscaler.kubernetes.io/safe-to-evict: false
                      kubernetes.io/psp: psp.privileged
                      prometheus.io/path: /metrics
                      prometheus.io/port: 8085
                      prometheus.io/scrape: true
Status:               Running
IP:                   192.168.121.122
IPs:
  IP:           192.168.121.122
Controlled By:  ReplicaSet/cluster-autoscaler-7b9c56647d
Containers:
  cluster-autoscaler:
    Container ID:  docker://cbdbb11a7c20b042d79744edbb5dd0c6fde71303be697a1a773307c9d5ac442c
    Image:         k8s.gcr.io/autoscaling/cluster-autoscaler:v1.16.6
    Image ID:      docker-pullable://k8s.gcr.io/autoscaling/cluster-autoscaler@sha256:cbbe98dd8f325bef54557bc2854e48983cfc706aba126bedb0c52d593e869072
    Port:          8085/TCP
    Host Port:     0/TCP
    Command:
      ./cluster-autoscaler
      --v=4
      --stderrthreshold=info
      --cloud-provider=aws
      --skip-nodes-with-local-storage=false
      --expander=least-waste
      --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/<snip>1
      --balance-similar-node-groups
      --skip-nodes-with-system-pods=false
    State:          Running
      Started:      Wed, 09 Sep 2020 16:56:37 -0600
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Wed, 09 Sep 2020 16:52:52 -0600
      Finished:     Wed, 09 Sep 2020 16:56:21 -0600
    Ready:          True
    Restart Count:  2
    Limits:
      cpu:     100m
      memory:  300Mi
    Requests:
      cpu:        100m
      memory:     300Mi
    Environment:  <none>
    Mounts:
      /etc/ssl/certs/ca-certificates.crt from ssl-certs (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from cluster-autoscaler-token-bwpc6 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  ssl-certs:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/ssl/certs/ca-bundle.crt
    HostPathType:  
  cluster-autoscaler-token-bwpc6:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cluster-autoscaler-token-bwpc6
    Optional:    false
QoS Class:       Guaranteed
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 49
Comments: 31 (10 by maintainers)

Commits related to this issue

Fix memory limit CA has a known bug https://github.com/kubernetes/autoscaler/issues/3506 The container consumes more memory than it is limited to. This fix will prevent issues with OOMKill errors... — committed to georgio-sd/amazon-eks-user-guide by georgio-sd 3 years ago
fix(cluster-autoscaler): memory requests/limits See https://github.com/argoflow/argoflow-aws/pull/178/files and https://github.com/kubernetes/autoscaler/issues/3506 — committed to honestbank/argoflow-aws by jai 3 years ago

Most upvoted comments

I reproduce the issue, even if contrary to the initial ticket, the default limit it now not set at 250 Mi but at 300 Mi.

Sometimes, the cluster-autoscaler pod needs a lot more, like bellow (572 Mi):

NAME                                        CPU(cores)   MEMORY(bytes)
...
aws-node-m548b                              4m           40Mi
cluster-autoscaler-6478668dc5-j6gql         93m          572Mi
coredns-6d97dc4b59-dc72r                    3m           7Mi
...

Increasing this limit accordingly solves the issue on my side.

+10

e-nalepa on Feb 15, 2021

I’ve deployed v1.22.1 into a cluster which was previously seeing an oomkill with a memory limit of 300Mi. It’s fixed the problem for us.

kragniz on Oct 12, 2021

Awesome, thanks for letting us know, all credit to @aidy for doing the hard work.

I’ll leave this issue open for a bit longer to see if anyone’s still seeing these issues with the new patch releases that include the streaming change, but if not will close it off in a week.

gjtempleton on Oct 12, 2021

Is there any chance we could grab this list from the local filesystem and in combination with using an initContainer or a static configmap we could contain the “controller” memory limits closer to the requests?

We have to configure requests: 96Mi and limits: 512Mi… I bet that list is going to keep growing and eventually crash the PODs 😓

carlosjgp on Mar 16, 2021

Currently #4199 has made it into the default branch and been cherry-picked back to the 1.19, 1.20, 1.21 and 1.22 release branches, so will make it into the next patch releases of all those versions. #4251 is the issue to watch for the cutting of those releases.

gjtempleton on Sep 27, 2021

@timothyb89, is there any chance your cluster is suffering similar fate?

Our largest cluster has 250 job objects at the moment, which I’d hope isn’t nearly large enough to cause any trouble.

For what it’s worth we been using --aws-use-static-instance-list=true since September and have not seen any unexpected restarts.

timothyb89 on Jan 26, 2021

# courtesy https://stackoverflow.com/a/61231027/310192
kubectl delete jobs --field-selector status.successful=1

😆

i thought i was safe because we were using ttlSecondsAfterFinished… but that’s an alpha feature and per @ellistarn “[EKS runs] feature gates that are in Beta.”

So, I had thousands of months-old jobs.

seamusabshere on Jan 26, 2021