autoscaler: CA Scale Down Fails because of Daemonset utilization

Hello,

Is there a way to disregard daemonsets(or certain pods) when considering the utilization of a node to scale down?

I have tried adding the annotation: cluster-autoscaler.kubernetes.io/safe-to-evict: "true"

but the cluster autoscaler still seems to not kill nodes with high utilization due to daemonsets:

<node name> is not suitable for removal - utilization too big (0.631092)

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 5
Comments: 17 (8 by maintainers)

Commits related to this issue

add flag to ignore daemonsets when calculating resource utilization of a node Adds the flag `--ignore-daemon-set-pods-in-utilization` (defaults to false) and when enabled, factors DaemonSet pods when... — committed to awprice/autoscaler by awprice 6 years ago

Most upvoted comments

I’ve raised https://github.com/kubernetes/autoscaler/pull/1407 which will add a flag to ignore DaemonSets when performing the resource utilization calculations. Wasn’t sure how to test it, but happy for pointers on what tests to add.

awprice on Nov 16, 2018

The proposal looks very reasonable to me. And easy to implement. With that said I am not sure if we will have time to address that. PRs are very welcome.

losipiuk on Nov 15, 2018

We are currently seeing this same issue. We have a large amount of nodes that only have daemonsets on them that are persisting in the cluster and are not being terminated by CA. These daemonsets provide metrics and system functionality, and are not user-scheduled pods.

The solution should either be a flag to ignore daemonset pods in the utilisation or an annotation that can placed on daemonset pods to ignore them.

awprice on Nov 15, 2018

So, would substracting utilization of daemonset-originating pods be a viable solution?

WebSpider on Sep 19, 2018

Seeing a similar thing. This largely affects our development staging clusters where scale-to-zero is an appealing way to make sure we at least provision the various MIGs that back the different flavors of node groups we use, while ensuring we don’t continuously run 16 node clusters with one userland pod.

With smaller nodes e.g. n1-standard-1, our logging and service mesh pods will bring utilization over 50%. These pods only exist to provide a common substrate and the node would not be otherwise be utilized or necessary if not for these DaemonSet pods.

I think in general, the heuristic mention by @WebSpider is a good one. Generally DaemonSet pods exist to provide this kind of baseline functionality for any node in the cluster and are typically not user-scheduled workloads.

That’s quite a backwards incompatible change. Perhaps a new flag to ignore daemonset-induced node utilization is the way to go? I can’t think of a case right now for the added complexity, but a more granular solution could be a pod annotation that explicitly tells the autoscaler to ignore utilization induced by a given pod.

jacobstr on Oct 22, 2018

Yes, I think that would solve our issue.

a200462790 on Sep 19, 2018