kubernetes: Internal PreStartContainer hook failed: not enough cpus available to satisfy request

/kind bug What happened: Sometimes when an deployment creates new pods they get scheduled on a node successfully but the containers fail to start with:

Warning  Failed                 14m (x8 over 15m)    kubelet, ip-172-20-162-142.ec2.internal  Internal PreStartContainer hook failed: not enough cpus available to satisfy request

What you expected to happen: The scheduler shouldn’t put pods on nodes without enough resources.

How to reproduce it (as minimally and precisely as possible): I don’t know how it got into this situation. Deleting the pods and have the RS recreate them works around this issue.

Anything else we need to know?: We’re using the CPUManager with static policy which might contribute to this problem.

Environment:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.0-beta.2", GitCommit:"63dad40a0391b7af32c34fdbf41fa199c3b247ad", GitTreeState:"clean", BuildDate:"2018-03-07T20:42:18Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.0", GitCommit:"fc32d2f3698e36b93322a3465f63a14e9f0eaead", GitTreeState:"clean", BuildDate:"2018-03-26T16:44:10Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration: AWS
OS (e.g. from /etc/os-release): coreos

/sig scheduling

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 24 (24 by maintainers)

Commits related to this issue

Merge pull request #67430 from choury/cpumanager Automatic merge from submit-queue (batch tested with PRs 67430, 67550). If you want to cherry-pick this change to another branch, please follow the in... — committed to kubernetes/kubernetes by deleted user 6 years ago
Merge branch 'shuokong/1.10.5-tke.9' into 'qcloud/v1.10.5' cpumanager: rollback state if updateContainerCPUSet failed cpumanager: rollback state if updateContainerCPUSet failed cherry pick: h... — committed to honkiko/kubernetes by deleted user 5 years ago

Most upvoted comments

Note that this is fixed in 1.12, but backports to 1.11 and 1.10 are not merged yet:

clkao on Dec 18, 2018

@balajismaniam / @ConnorDoyle Do you have any update/ETA on this?

discordianfish on Aug 3, 2018

@balajismaniam / @ConnorDoyle: Is there some way I can help you debug/fix this? Seems unrelated to anything special in our setup from what I can tell. We had to disable this for all our deployments because it was causing severe issues.

discordianfish on Jun 14, 2018