descheduler: deschedule pods that fail to start or restart too often
It is not uncommon that pods get scheduled on nodes that are not able to start it. For example, a node may have network issues and unable to mount a networked persistent volume, or cannot pull a docker image, or has some docker configuration issue which is seen only on container startup.
Another common issue is when a container gets restarted by liveliness check because of some local node issue (e.g. wrong routing table, slow storage, network latency or packet-drop). In that case, a pod is unhealthy most of the time and hangs in a restart state forever without a chance of being migrated to another node.
As of now, there is no possibility to re-schedule pods with faulty containers. It may be helpful to introduce two new Strategies:
- container-restart-rate: re-schedule a pod if it is unhealthy since
$notReadyPeriod
seconds and one of its containers was restarted$maxRestartCount
times. - pod-startup-failure: a pod was scheduled on a node, but was unable to start all of its containers since
$maxStartupTime
seconds.
The similar issue is filed against kubernetes: https://github.com/kubernetes/kubernetes/issues/13385
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 44 (27 by maintainers)
Commits related to this issue
- Merge pull request #62 from ingvagabund/sync-with-upstream bug 1950026: Sync with upstream — committed to damemi/descheduler by openshift-merge-robot 3 years ago
@lixiang233 this feature enhancement is all yours. Thanks!
Yeah, with such a short period of time, it makes sense to limit the phase. Though, maybe not to every phase. Pending is the first phase when a pod is accepted. I can’t find any field in pod’ status saying when a pod transitioned into a given phase. Also, other phases are completely ignored (Failed, Succeeded) which leaves only
Running
andUnknown
.Running
is the default one in most cases.podStatusPhase
field is fine though I would just limit it toPending
andRunning
right now.@kabakaev thanks for the info. How about using the PodLifeTime strategy? We would need to add an additional strategy parameter to handle
status.phase != Running
.Maybe something like this …
@damemi @ingvagabund @lixiang233 please add any additional ideas you have. Thanks!
Seems like a reasonable ask. @kabakaev I am planning to defer this to 0.6 release or later. Hope you are ok with that.