kubernetes: Internal PreStartContainer hook failed: not enough cpus available to satisfy request
/kind bug What happened: Sometimes when an deployment creates new pods they get scheduled on a node successfully but the containers fail to start with:
Warning Failed 14m (x8 over 15m) kubelet, ip-172-20-162-142.ec2.internal Internal PreStartContainer hook failed: not enough cpus available to satisfy request
What you expected to happen: The scheduler shouldn’t put pods on nodes without enough resources.
How to reproduce it (as minimally and precisely as possible): I don’t know how it got into this situation. Deleting the pods and have the RS recreate them works around this issue.
Anything else we need to know?: We’re using the CPUManager with static policy which might contribute to this problem.
Environment:
- Kubernetes version (use
kubectl version):
Client Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.0-beta.2", GitCommit:"63dad40a0391b7af32c34fdbf41fa199c3b247ad", GitTreeState:"clean", BuildDate:"2018-03-07T20:42:18Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.0", GitCommit:"fc32d2f3698e36b93322a3465f63a14e9f0eaead", GitTreeState:"clean", BuildDate:"2018-03-26T16:44:10Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration: AWS
- OS (e.g. from /etc/os-release): coreos
/sig scheduling
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 24 (24 by maintainers)
Commits related to this issue
- Merge pull request #67430 from choury/cpumanager Automatic merge from submit-queue (batch tested with PRs 67430, 67550). If you want to cherry-pick this change to another branch, please follow the in... — committed to kubernetes/kubernetes by deleted user 6 years ago
- Merge branch 'shuokong/1.10.5-tke.9' into 'qcloud/v1.10.5' cpumanager: rollback state if updateContainerCPUSet failed cpumanager: rollback state if updateContainerCPUSet failed cherry pick: h... — committed to honkiko/kubernetes by deleted user 5 years ago
Note that this is fixed in 1.12, but backports to 1.11 and 1.10 are not merged yet:
@balajismaniam / @ConnorDoyle Do you have any update/ETA on this?
@balajismaniam / @ConnorDoyle: Is there some way I can help you debug/fix this? Seems unrelated to anything special in our setup from what I can tell. We had to disable this for all our deployments because it was causing severe issues.