kubernetes: Scheduling fails with "Insufficient Memory" until restart of apiserver/master

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.): No

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): memory scheduler / memory scheduling


Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:“1”, Minor:“6”, GitVersion:“v1.6.1”, GitCommit:“b0b7a323cc5a4a2019b2e9520c21c7830b7f708e”, GitTreeState:“clean”, BuildDate:“2017-04-25T14:48:12Z”, GoVersion:“go1.8.1”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“6”, GitVersion:“v1.6.1+coreos.0”, GitCommit:“9212f77ed8c169a0afa02e58dce87913c6387b3e”, GitTreeState:“clean”, BuildDate:“2017-04-04T00:32:53Z”, GoVersion:“go1.7.5”, Compiler:“gc”, Platform:“linux/amd64”}

Environment:

  • Cloud provider or hardware configuration: Custom/mixed

  • OS (e.g. from /etc/os-release): CoreOS 1353.7.0 (stable)

  • Kernel (e.g. uname -a): Linux coreos04.kub.do.modio.se 4.9.24-coreos #1 SMP Wed Apr 26 21:44:23 UTC 2017 x86_64 Intel® Xeon® CPU E5-2650L v3 @ 1.80GHz GenuineIntel GNU/Linux

  • Install tools: Manual guide / ansible

  • Others:

What happened: Scheduling pods which have a memory limit slowly fails after a few pod deployments, until the master node is restarted, upon which it starts working again.

Pods are configured with:

          resources:
            limits:
                memory: "1Gi"
                cpu: "1"
            requests:
                cpu: "100m"
                memory: "30Mi"

While the node ouputs:

    Capacity:
     cpu:		2
     memory:	2052872Ki
     pods:		110
    Allocatable:
     cpu:		2
     memory:	1950472Ki
     pods:		110

And further down:

    Allocated resources:
      (Total limits may be over 100 percent, i.e., overcommitted.)
      CPU Requests	CPU Limits	Memory Requests	Memory Limits
      ------------	----------	---------------	-------------
      150m (7%)	1 (50%)		12Mi (0%)	128Mi (6%)
Events:

The numbers don’t add up, and manually stepping through sizes show that up to ~300Mi the nodes schedule, after that they fail. This behaviour appears consistently for us.

node.description.txt pod.description.txt

What you expected to happen: Requirements limiting the jobs on the machine to ~3 due to memory pressure

How to reproduce it (as minimally and precisely as possible): Schedule a Lot of pods with memory limits and delete them / let them complete.

For us, this is a giblab CI runner which connects to create pods for us, and after a while all our build machines stand empty, and jobs wait in scheduling tempo forever.

Anything else we need to know: pass.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 26 (12 by maintainers)

Most upvoted comments

I’m seeing this with AKS. I have nodes with 8G of ram and I schedule 1 pod per node with limits and requests for 6.5G memory. Sometimes it works fine. Othertimes it says “insufficient memory” when there is clearly enough. Unfortunately I don’t think I can restart the kube-apiserver on an AKS managed cluster

Similar issues: #33777 #34920 (with Insufficient CPU)