kubernetes: Failed to update Node Allocatable Limits on 1.8.2

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened: We’ve deployed kube 1.8.2 and are seeing following errors:

Failed to update Node Allocatable Limits "": failed to set supported cgroup subsystems for cgroup : Failed to set config for supported subsystems : failed to write 8201408512 to memory.limit_in_bytes: write /var/lib/docker/devicemapper/mnt/69641d63999364ce6bed9ff9a37e18922d1738df76d84db691a854f0e931e435/rootfs/sys/fs/cgroup/memory/memory.limit_in_bytes: invalid argument

kubelet: E1116 09:26:19.176810 10132 helpers.go:138] readString: Failed to read "/sys/fs/cgroup/memory/system.slice/docker-d0b90be4bc0f7a2dd8669b5955b3355c16140eaff3df8264cd6f3c9236218067.scope/memory.limit_in_bytes": read /sys/fs/cgroup/memory/system.slice/docker-d0b90be4bc0f7a2dd8669b5955b3355c16140eaff3df8264cd6f3c9236218067.scope/memory.limit_in_bytes: no such device

 kubelet: E1116 09:20:02.379813 15458 helpers.go:138] readString: Failed to read "/sys/fs/cgroup/memory/user.slice/user-997.slice/session-192267.scope/memory.soft_limit_in_bytes": read /sys/fs/cgroup/memory/user.slice/user-997.slice/session-192267.scope/memory.soft_limit_in_bytes: no such device

This is somehow related to https://github.com/kubernetes/kubernetes/issues/42701, but we thought it was fixed in 1.8.

The cgroup driver we use is cgroupfs.

What you expected to happen:

No errors

How to reproduce it (as minimally and precisely as possible):

Use kubespray to install kubernetes multi-master

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.2+coreos.0", GitCommit:"4c0769e81ab01f47eec6f34d7f1bb80873ae5c2b", GitTreeState:"clean", BuildDate:"2017-10-25T16:24:46Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.2+coreos.0", GitCommit:"4c0769e81ab01f47eec6f34d7f1bb80873ae5c2b", GitTreeState:"clean", BuildDate:"2017-10-25T16:24:46Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release): CentOS7
  • Kernel (e.g. uname -a): 3.10.0-693.2.2.el7.x86_64
  • Install tools: kubespray

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 36 (4 by maintainers)

Most upvoted comments

We have the same issue here, but on AKS. We are running 5 node cluster and we get the following error for all 5 nodes. Here the system infos:

System Info:
 Kernel Version:                      4.13.0-1007-azure
 OS Image:                            Debian GNU/Linux 8 (jessie)
 Operating System:                    linux
 Architecture:                        amd64
 Container Runtime Version:           docker://1.12.6
 Kubelet Version:                     v1.8.7
 Kube-Proxy Version:                  v1.8.7

and the error we get after updating von 1.8.1 to 1.8.7

Events:
  Type     Reason                            Age                  From                               Message
  ----     ------                            ----                 ----                               -------
  Warning  FailedNodeAllocatableEnforcement  2m (x1001 over 16h)  kubelet, aks-agentpool-22604214-0  Failed to update Node Allocatable Limits "": failed to set supported cgroup subsystems for cgroup : Failed to set config for supported subsystems : failed to write 7285047296 to memory.limit_in_bytes: write /var/lib/docker/overlay2/5650a1aadf9c758946073fefa1558446ab582148ddd3ee7e7cb9d269fab20f72/merged/sys/fs/cgroup/memory/memory.limit_in_bytes: invalid argument

I had the same issue. According to the docs, cgroups-per-qos is supposed to be enabled by default. This is how I resolved the issue:

Add/change --cgroups-per-qos=true --enforce-node-allocatable=pods on to your KUBELET_ARGS line located inside /etc/kubernetes/kubelet so it looks something like this:

KUBELET_ARGS="$(/etc/kubernetes/get_require_kubeconfig.sh) --pod-manifest-path=/etc/kubernetes/manifests --cadvisor-port=0 --kubeconfig /etc/kubernetes/kubelet-config.yaml --hostname-override=k8s-fa27-mqf5nkvarpmq-minion-0 --address=10.0.0.5 --port=10250 --read-only-port=0 --anonymous-auth=false --authorization-mode=Webhook --authentication-token-webhook=true --cluster_dns=10.254.0.10 --cluster_domain=cluster.local  --pod-infra-container-image=gcr.io/google_containers/pause:3.0 --client-ca-file=/etc/kubernetes/certs/ca.crt --tls-cert-file=/etc/kubernetes/certs/kubelet.crt --tls-private-key-file=/etc/kubernetes/certs/kubelet.key  --cgroup-driver=systemd --cgroups-per-qos=true --enforce-node-allocatable=pods"

Then run sudo systemctl restart kubelet.service on your worker nodes.

I have the same issue on AKS (1.8.7) in westeurope. It is woking again after restart nodes. thanks to @otaviosoares