karpenter: Cordon node with a do-not-evict pod when ttlSecondsUntilExpired is met
Tell us about your request
Regarding the do-not-evict annotation, it currrently prevents some nodes to be deprovisioned on our clusters.
I know, that’s the goal and i’m fine with it, but would it be possible to cordon the node at least when the ttlSecondsUntilExpired
is met ?
Tell us about the problem you’re trying to solve. What are you trying to do, and why is it hard?
We provision expensive nodes as a failback for some workloads with a ttl of 3 hours, and we saw some still running with a 6 hours uptime. The annotation was preventing the drain/deletion as cronjobs kept being scheduled on it, but a manual cordon fixed it within minutes.
Are you currently working around this issue?
No
Additional Context
No response
Attachments
No response
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave “+1” or “me too” comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 19
- Comments: 15 (8 by maintainers)
Related to this, I think I’d like to see a counter metric for amount of time that a node has exceeded it’s TTL. Whilst that’s climbing, there’s a problem. If it stays climbing, there’s a bigger problem.
I’m not sure if I’d want a series per provisioner or per (provisioner,node name) tuple.
Overall, that lets me as an operator manually intervene to implement Forceful, so I don’t end up with a very stale node by accident.
This is tricky, since there are a number of termination cases, and the desired behavior isn’t always clear. One of the ideas we’ve come to is thinking in terms of voluntary and involuntary disruptions.
Voluntary Disruptions:
Involuntary Disruption:
Potential termination behaviors:
We want to “do the right thing” wherever possible, and have this be something that users simply don’t think about.
If I were to guess at the right set of defaults, it might be something like:
@jukie sorry for the unfortunate timing, as I just came off vacation. Just noticed you asked here, but we just had a WG an hour ago. You can check when and where these meetings are happening here: https://github.com/kubernetes-sigs/karpenter#community-discussion-contribution-and-support. You can message me or @jonathan-innis in the kubernetes slack as well if you want to discuss more.
I think we should consider discussing this in the next working group. I’d like to see a design on this one considering the impact that this could have to the cluster, given there could be cases if we’re not careful where all nodes on the cluster get tainted all at once.
As an example, if we have a set of nodes that are expired all at once and we cordon them all at once, is that a reasonable stance to take, knowing that those nodes will get eventually disrupted? Probably, assuming that our disruption logic can handle the disruption quick enough that have underutilization on that node isn’t a problem, but what if that node has a fully-blocking PDB? What if that node has a
do-not-evict
pod and that pod never goes away? Do we need some forceful termination that happens after a node has been cordoned with a blocking pod for some amount of time? There might be some interactions here that we should consider with #743