karpenter: Cordon node with a do-not-evict pod when ttlSecondsUntilExpired is met

Tell us about your request

Regarding the do-not-evict annotation, it currrently prevents some nodes to be deprovisioned on our clusters.

I know, that’s the goal and i’m fine with it, but would it be possible to cordon the node at least when the ttlSecondsUntilExpired is met ?

Tell us about the problem you’re trying to solve. What are you trying to do, and why is it hard?

We provision expensive nodes as a failback for some workloads with a ttl of 3 hours, and we saw some still running with a 6 hours uptime. The annotation was preventing the drain/deletion as cronjobs kept being scheduled on it, but a manual cordon fixed it within minutes.

Are you currently working around this issue?

Additional Context

No response

Attachments

No response

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave “+1” or “me too” comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

About this issue

Original URL
State: open
Created 2 years ago
Reactions: 19
Comments: 15 (8 by maintainers)

Most upvoted comments

Related to this, I think I’d like to see a counter metric for amount of time that a node has exceeded it’s TTL. Whilst that’s climbing, there’s a problem. If it stays climbing, there’s a bigger problem.

I’m not sure if I’d want a series per provisioner or per (provisioner,node name) tuple.

Overall, that lets me as an operator manually intervene to implement Forceful, so I don’t end up with a very stale node by accident.

sftim on Feb 27, 2023

This is tricky, since there are a number of termination cases, and the desired behavior isn’t always clear. One of the ideas we’ve come to is thinking in terms of voluntary and involuntary disruptions.

Voluntary Disruptions:

Drift (AMI is out of date, requirements no longer match provisioner, etc)
Consolidation (emptiness, replacement, etc)

Involuntary Disruption:

Spot Interruption
Node Health
Expiration (this one is debatably voluntary)

Potential termination behaviors:

[Opportunistic] Don’t cordon the node until we think we can successfully drain all of its pods
[Eventual] Cordon the node, but don’t drain it until we’re sure we can drain all of them
[Graceful] Cordon the node, drain it, respect PDBs/do-not-evict
[Forceful] Cordon the node, drain it, don’t respect PDBs/do-not-evict

We want to “do the right thing” wherever possible, and have this be something that users simply don’t think about.

Is it possible for us to pick one behavior for each type of disruption?
Do users need to be able to configure different behaviors for different types of disruption?
If configurable, should it be per provisioner?

If I were to guess at the right set of defaults, it might be something like:

Drift -> Opportunistic
Consolidation -> Opportunistic
Spot Interruption -> Forceful
Node Health -> Forceful
Expiration -> ???

ellistarn on Dec 12, 2022

@jukie sorry for the unfortunate timing, as I just came off vacation. Just noticed you asked here, but we just had a WG an hour ago. You can check when and where these meetings are happening here: https://github.com/kubernetes-sigs/karpenter#community-discussion-contribution-and-support. You can message me or @jonathan-innis in the kubernetes slack as well if you want to discuss more.

njtran on Dec 4, 2023

Has there been any more discussions on this

I think we should consider discussing this in the next working group. I’d like to see a design on this one considering the impact that this could have to the cluster, given there could be cases if we’re not careful where all nodes on the cluster get tainted all at once.

As an example, if we have a set of nodes that are expired all at once and we cordon them all at once, is that a reasonable stance to take, knowing that those nodes will get eventually disrupted? Probably, assuming that our disruption logic can handle the disruption quick enough that have underutilization on that node isn’t a problem, but what if that node has a fully-blocking PDB? What if that node has a do-not-evict pod and that pod never goes away? Do we need some forceful termination that happens after a node has been cordoned with a blocking pod for some amount of time? There might be some interactions here that we should consider with #743

jonathan-innis on Nov 24, 2023