kubernetes: Kubernetes Pods not scheduled due to "Insufficient CPU" when CPU resources are available

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.7", GitCommit:"a2cba278cba1f6881bb0a7704d9cac6fca6ed435", GitTreeState:"clean", BuildDate:"2016-09-12T23:15:30Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.7", GitCommit:"a2cba278cba1f6881bb0a7704d9cac6fca6ed435", GitTreeState:"clean", BuildDate:"2016-09-12T23:08:43Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration: AWS, masters (Count: 3, Size: m3.medium), minions (Count 5, Size m4.xlarge)
OS (e.g. from /etc/os-release): 14.04.5 LTS, Trusty Tahr
Kernel (e.g. uname -a): Master: 3.13.0-95-generic Minion: 4.4.0-38-generic
Install tools: Ansible using modified contrib playbooks: https://github.com/kubernetes/contrib/tree/master/ansible
Others:

What happened: When scheduling pods with a low resource request for CPU (15m) We recieve the message “Insufficient CPU” across all nodes attempting to schedule the pod. We are using multi container pods and running a describe pods shows nodes with available resources to schedule the pods. However k8s refuses to schedule across all nodes.
kubectl_output.txt

What you expected to happen:

How to reproduce it (as minimally and precisely as possible): Below is a sample manifest that we can use to produce the output.
manifest.txt

We end up scheduling pods up until about 10-14 pods and then we run into this problem. See graph below

About this issue

Original URL
State: closed
Created 8 years ago
Reactions: 17
Comments: 73 (6 by maintainers)

Links to this issue

kubernetes - Why does a single node cluster only have a small percentage of the cpu quota available? - Stack Overflow

Commits related to this issue

set all CPU requests for our utility containers to zero Would you believe it, but Kubernetes seem to have a longstanding bug (https://github.com/kubernetes/kubernetes/issues/33777) where the schedule... — committed to tidewise/buildbot-ci by doudou 5 years ago

Most upvoted comments

We are having the same problem and cannot restart the master since we are in GKE.

+96

rochdev on Jun 15, 2017

Kube-dashboard says that only 0.05 cpu unit is being occupied. Why pod cannot be scheduled?

i just removed resources limit and request specs, it works for the present…

Another one can’t be scheduled at this time again…

   ...
    spec:
      containers:
        - name: default-http-backend
          image: gcr.io/xxx/default-http-backend:latest
          ports:
            - containerPort: 8000
              name: http
          resources:
            requests:
              cpu: 10m

applying very little amount of resource made this work for this time.

+37

dehypnosis on Sep 22, 2017

I know that it won’t solve all the problems mentioned on the page but it my case it was a typo caused by copy-n-paste.

As you can see in the documentation:

The expression 0.1 is equivalent to the expression 100m, which can be read as “one hundred millicpu”.

But some people including me can accidentally copy values of cpu limits from memory limits and just use wrong syntax.

So the solution in this case is to replace “Mi” with “m”.

Wrong: cpu: 100Mi. Correct: cpu: 100m or cpu: 0.1.

+25

kivagant-ba on Mar 19, 2019

I had this same issue, GKE has a default LimitRange with default limits for CPU request set to 100m, this can be checked by running kubectl get limitrange -o=yaml -n default|your-namespace.

This limit is applied to every container. So for instance, if you have a 4 cores node, and assuming that each pod created has 2 containers, it will allow only for around ~20 pods to be created, at least that was what I understood about it.

The workaround is to change the default limit by changing/removing the LimitRange, and removing old pods so they are recreated with the new defaults, or specifically adding a different limit range to your pod config.

Some reading material: https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/#specify-a-cpu-request-and-a-cpu-limit https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/cpu-default-namespace/#create-a-limitrange-and-a-pod https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#how-pods-with-resource-limits-are-run https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-resource-requests-and-limits

+25

JCMais on Jun 15, 2019

Having a similar issue, there’s almost 4 whole CPUs available within the cluster, a new Pod is requesting 500m (half) & scheduler reports insufficient CPU in all nodes. 😱 Working with GKE, Kubernetes master version 1.9.2-gke.1.

+21

appelsiini on Apr 11, 2018

@nsidhaye So, turns out my problem was: We had 30 cores to use, every deployment had a default request/limit set to 1 core request, 2 cores limit. Even tho the apps weren’t consuming more than 50mi cpu, it was “locking” 1 whole core, meaning we hit the limit of 30 apps pretty quickly.

We had to redeploy most of our apps based on real resource consumption, using Prometheus/Grafana, we checked what is the average CPU consumption (and memory) for each pod, calculate how much it shoudl request and updated those values.

If you do a kubectl describe nodes you should see how much resources you already requested by node, and it should point you in the correct direction of fixing your issue.

+16

nelsonfassis on Jun 27, 2018

Is there any official update / feedback on this topic?

+13

SaschaHeyer on Mar 5, 2020

This isn’t really an ‘issue’ and I think we should close this question. See here for a further explanation: https://stackoverflow.com/a/45585916/1663462

There main issue is probably that it’s not very intuitive why one gets the error message - even though it is the intended and correct behavior.

+13

chrissound on May 22, 2018

In this scenario AWS/Azure/whatever can report incorrect CPU usage. Use kubectl describe node xxxx to check each node. You’ll probably find that the CPU usage on the node is too high (see image below - this is showing a healthy state, but you may see that this is e.g. 80% in your own case). You may need to delete some resources from the node (e.g. any unused pods that aren’t required) in order to successfully schedule new pods onto the node.

chalcrow on Apr 23, 2021

Hi,

the issue has been submitted in 2016. Any idea when it will be fixed? I have Openshift Origin 3.7 and this is killing me…

pgeorgiev333 on Jan 30, 2018

We got the same problem. Unable to create more pods Insufficient cpu while all nodes are on ~~~5 - 10% cpu load~~ 60-70% cpu limit (kubectl describe node). Restarting the Master Node seems to successfully schedule the pods.

pizzarabe on Apr 24, 2017

I had the same issue, but spending some time debugging, I found the issue. It is not a bug in my case. Try checking kubectl describe nodes {node name}, and sum the total cpu requests by other pods.

In my case, there were insufficient cpu resources left. Even though the overall cpu is not much utilised and free resources are available, the cpu requests from each pods are reserved and allocated dedicatedly in the nodes. Try reducing the cpu requests from other pod, and the pending pods will get scheduled automatically…

jmatt-cavli on Jun 11, 2020

Nothing?

antl3x on Mar 16, 2018

How do you restart the nodes? I’m using Google Cloud platform… Would I SSH into the compute instances and restart?

chrissound on Sep 12, 2017

I think the reason there is no node cpu Requests can satisfied pod cpu request. use this command check all node cpu request , kubectl describe nodes {NodeName} . if pod request cpu add current request cpu more than 100% , kube-scheduler will get a event "Kubernetes Pods not scheduled due to “Insufficient CPU” "

myonlyzzy on Feb 18, 2019

This issue is coming up on two years now, we’re seeing this and its holding us back from going to production.

SJAnderson on May 12, 2018

Same issue on GKE, just added a fresh new instance (micro) but it won’t schedule even the smallest pod on it. Eg:

   Requests:
     cpu:		1Mi
     memory:		64Mi
...
 26m		1s		95	default-scheduler			Warning		FailedScheduling	No nodes are available that match all of the following predicates:: Insufficient cpu (5).

Though on the fresh node there is enough cpu available:

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ------------	----------	---------------	-------------
  320m (34%)	100m (10%)	262Mi (44%)	414Mi (69%)

Other nodes are pretty packed at ~95% cpu allocated on each node, though even there it should schedule a 1m cpu pod.

discordianfish on Oct 21, 2017

Seeing this too on k8s bare-metal

collins-b on May 16, 2018

For me, creating all the deployments and services in a different namespace (other than default) fixed this issue. On GKE

iqbalhusen on Aug 26, 2020

@k82cn

Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  31s (x637 over 3h)  default-scheduler  No nodes are available that match all of the predicates: Insufficient cpu (7), Insufficient memory (3), PodToleratesNodeTaints (5)

My resources on one of my nodes (they are all pretty much the same)

Capacity:
 cpu:     32
 memory:  65690484Ki
 pods:    110
Allocatable:
 cpu:     32
 memory:  65588084Ki
 pods:    110

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests  CPU Limits    Memory Requests  Memory Limits
  ------------  ----------    ---------------  -------------
  26817m (83%)  30617m (95%)  12784Mi (19%)    32756Mi (51%)

It doesn’t make sense to use the POD Limit, which should kill my pod in case of memory leak or some abnormal CPU usage, to limit POD scheduling. If I have 100 pods with each of them have one limit, is very unlikely that they will all be running at its peak limit at the same time. Kubernetes should know about the used resources on my node for that.

So, either I have to create pods that will have veeery little requests and will not be able to handle any burst of requests for example, or I will waste a huge amount of resources because of it.

nelsonfassis on Jun 6, 2018

seems a really old issue, any new logs for this issue?

k82cn on May 22, 2018

Did anyone resolve this issue?

mbutan on May 22, 2018

ambition-consulting on May 10, 2018

I’m trying to install ElasticSearch on my AKS cluster using helm. I get an error telling me there isn’t enough cpu. I have prometheus on my cluster but there are no spikes in cpu that would go close to 100%.

Could someone help me understand the output of kubectl describe node? What is the relationship between requests and limits, what should the numbers look like?

ksanchez15 on Aug 25, 2020

My problem was caused by CPU limits at IAM & admin -> Quotas -> Compute Engine API (CPU), had to request for more resources available for my environment since upgraded limits my pods scale up easily.

mbutan on Jun 18, 2018

Also seeing this issue. We’re on GKE.

hrdymchl on Jun 15, 2018

Same issue, running 1.9.6-gke.1

pv93 on May 10, 2018

I might be seeing this as well (on GKE):

I have a deployment with:

[...]
        resources:
          requests:
            cpu: 1
            memory: 3G
[...]
        resources:
          requests:
            cpu: 9G
            memory: 52G
[...]

trying to deploy to a cluster that has 3 nodes with 15.89 CPU allocatable and 57.65 GB memory allocatable but getting Insufficient cpu (3), Insufficient memory (6) for scheduling.

Doing stuff like bumping the second container down to:

   requests:
      cpu:        4G
      memory:     22G

results in the same scheduling issue.

RochesterinNYC on Jan 24, 2018

Thanks @chrissound, I have another cluster with high cpu instances and indeed they haven’t shown any issues.

Not sure if related, but I recently got an email from GCP pointing to these docs https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/. GKE will start doing this after 1.7.6 upgrade.

flavianmissi on Sep 15, 2017