kubernetes: Failed to start ContainerManager failed to initialise top level QOS containers

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Kubernetes version (use kubectl version): Client Version: version.Info{Major:“1”, Minor:“6”, GitVersion:“v1.6.0”, GitCommit:“fff5156092b56e6bd60fff75aad4dc9de6b6ef37”, GitTreeState:“clean”, BuildDate:“2017-03-28T16:36:33Z”, GoVersion:“go1.7.5”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“6+”, GitVersion:“v1.6.0-1+c0b74ebf3ce26e-dirty”, GitCommit:“c0b74ebf3ce26e46b5397ffb5ce71cd02f951130”, GitTreeState:“dirty”, BuildDate:“2017-03-30T08:19:21Z”, GoVersion:“go1.7.5”, Compiler:“gc”, Platform:“linux/amd64”}

Environment:

Cloud provider or hardware configuration: openstack
OS (e.g. from /etc/os-release): NAME=“CentOS Linux” VERSION=“7 (Core)” ID=“centos” ID_LIKE=“rhel fedora” VERSION_ID=“7” PRETTY_NAME=“CentOS Linux 7 (Core)”
Kernel (e.g. uname -a): Linux kube-dev-master-2-0 3.10.0-514.10.2.el7.x86_64 #1 SMP Fri Mar 3 00:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Install tools: ansible
Others:

What happened: When creating new cluster with kubernetes 1.6.0 always two of three masters are going to notready state.

kubectl get nodes NAME STATUS AGE VERSION kube-dev-master-1-0 Ready 2m v1.6.0 kube-dev-master-1-1 NotReady 2m v1.6.0 kube-dev-master-2-0 NotReady 2m v1.6.0 kube-dev-node-1-0 Ready 1m v1.6.0 kube-dev-node-2-0 Ready 1m v1.6.0

log files from kubelet is spamming: Mar 30 15:20:44 centos7 kubelet: I0330 12:20:44.297149 8724 kubelet.go:1752] skipping pod synchronization - [Failed to start ContainerManager failed to initialise top level QOS containers: failed to create top level Burstable QOS cgroup : Unit kubepods-burstable.slice already exists.]

What you expected to happen: I except that masters should join to cluster

How to reproduce it (as minimally and precisely as possible): kubelet args

ExecStart=/usr/bin/kubelet
–kubeconfig=/tmp/kubeconfig
–require-kubeconfig
–register-node=true
–hostname-override=kube-dev-master-2-0
–allow-privileged=true
–cgroup-driver=systemd
–cluster-dns=10.254.0.253
–cluster-domain=cluster.local
–pod-manifest-path=/etc/kubernetes/manifests
–v=4
–cloud-provider=openstack
–cloud-config=/etc/kubernetes/cloud-config

After restart of machines everything is working normally, but restart of machine after installation was not needed before 1.6.0

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 37 (22 by maintainers)

Commits related to this issue

fix https://github.com/kubernetes/kubernetes/issues/43856\#issuecomment-294735655 — committed to rootsongjc/follow-me-install-kubernetes-cluster by rootsongjc 7 years ago
fix https://github.com/kubernetes/kubernetes/issues/43856\#issuecomment-294735655 — committed to rootsongjc/kubernetes-handbook by rootsongjc 7 years ago
重新指定docker使用systemd的cgroupdriver，kuberntes issue https://github.com/kubernetes/kubernetes/issues/43856 — committed to rootsongjc/follow-me-install-kubernetes-cluster by rootsongjc 7 years ago
Merge pull request #44940 from sjenning/bump-runc Automatic merge from submit-queue Bump runc to d223e2a Fixes https://github.com/kubernetes/kubernetes/issues/43856 @derekwaynecarr — committed to kubernetes/kubernetes by deleted user 7 years ago
bug fix https://github.com/kubernetes/kubernetes/issues/43856 — committed to rootsongjc/kubernetes-handbook by rootsongjc 7 years ago
Merge pull request #48117 from sjenning/bump-runc-1.6 Automatic merge from submit-queue bump runc to d223e2a cherry-pick https://github.com/kubernetes/kubernetes/pull/44940 by user request https... — committed to kubernetes/kubernetes by deleted user 7 years ago
Allow ability to run Docker-in-Docker-in-OpenShift for testing A bit of a complicated PR, but essentially, after much trial-and-error, this is required in order to get OpenShift running within Docker... — committed to cdrage/container-pipeline-service by cdrage 6 years ago
Allow ability to run Docker-in-Docker-in-OpenShift for testing A bit of a complicated PR, but essentially, after much trial-and-error, this is required in order to get OpenShift running within Docker... — committed to cdrage/container-pipeline-service by cdrage 6 years ago
Allow ability to run Docker-in-Docker-in-OpenShift for testing A bit of a complicated PR, but essentially, after much trial-and-error, this is required in order to get OpenShift running within Docker... — committed to cdrage/container-pipeline-service by cdrage 6 years ago
Allow ability to run Docker-in-Docker-in-OpenShift for testing A bit of a complicated PR, but essentially, after much trial-and-error, this is required in order to get OpenShift running within Docker... — committed to cdrage/container-pipeline-service by cdrage 6 years ago
Allow ability to run Docker-in-Docker-in-OpenShift for testing A bit of a complicated PR, but essentially, after much trial-and-error, this is required in order to get OpenShift running within Docker... — committed to cdrage/container-pipeline-service by cdrage 6 years ago

Most upvoted comments

You are really looking for anything kubepod*.slice to stop which is how I resolved my issue:

Try this:

for i in $(/usr/bin/systemctl list-unit-files --no-legend --no-pager -l | grep --color=never -o .*.slice | grep kubepod);do systemctl stop $i;done

+23

youngdev on Apr 12, 2017

Yes, I’ve seen this before if the kubelet doesn’t clean up completely.

QoS level cgroups were enabled by default in 1.6 which explains why this may have not been observed earlier.

The QoS cgroups are created as transient systemd slices when using the systemd cgroup driver. If those slices already exist, then container manager currently has a issue with that.

As I recall, a workaround that doesn’t involve rebooting is systemctl stop kubepods-burstable.slice. I believe this deletes the slice and allows the kubelet to recreate it.

@derekwaynecarr can we add code to check if the QoS level slices are already running and just move along in that case?

+10

sjenning on Apr 10, 2017

@ReSearchITEng I have no further good solution by now. These two method works for me:

Reboot
add the --native.cgroupdriver=systemd in your docker.service file, make sure it is the same with kubelet cgroup driver. Maybe you can try this as @youngdev mentioned above

for i in $(systemctl list-unit-files --no-legend --no-pager -l | grep --color=never -o .*.slice | grep kubepod);
do systemctl stop $i;
done

rootsongjc on Apr 21, 2017

Seeing the same issue on Red Hat Enterprise Linux Server 7.3 Kernel: 3.10.0-514.10.2.el7.x86_64 (Same as OP, so I’m suspecting the kernel might have something to do with it)

A reboot of the machine however seem to fix it.

youngdev on Mar 30, 2017