kubernetes: Failed to start ContainerManager failed to initialise top level QOS containers
Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT
Kubernetes version (use kubectl version): Client Version: version.Info{Major:“1”, Minor:“6”, GitVersion:“v1.6.0”, GitCommit:“fff5156092b56e6bd60fff75aad4dc9de6b6ef37”, GitTreeState:“clean”, BuildDate:“2017-03-28T16:36:33Z”, GoVersion:“go1.7.5”, Compiler:“gc”, Platform:“linux/amd64”}
Server Version: version.Info{Major:“1”, Minor:“6+”, GitVersion:“v1.6.0-1+c0b74ebf3ce26e-dirty”, GitCommit:“c0b74ebf3ce26e46b5397ffb5ce71cd02f951130”, GitTreeState:“dirty”, BuildDate:“2017-03-30T08:19:21Z”, GoVersion:“go1.7.5”, Compiler:“gc”, Platform:“linux/amd64”}
Environment:
- Cloud provider or hardware configuration: openstack
- OS (e.g. from /etc/os-release): NAME=“CentOS Linux” VERSION=“7 (Core)” ID=“centos” ID_LIKE=“rhel fedora” VERSION_ID=“7” PRETTY_NAME=“CentOS Linux 7 (Core)”
- Kernel (e.g.
uname -a): Linux kube-dev-master-2-0 3.10.0-514.10.2.el7.x86_64 #1 SMP Fri Mar 3 00:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux - Install tools: ansible
- Others:
What happened: When creating new cluster with kubernetes 1.6.0 always two of three masters are going to notready state.
kubectl get nodes NAME STATUS AGE VERSION kube-dev-master-1-0 Ready 2m v1.6.0 kube-dev-master-1-1 NotReady 2m v1.6.0 kube-dev-master-2-0 NotReady 2m v1.6.0 kube-dev-node-1-0 Ready 1m v1.6.0 kube-dev-node-2-0 Ready 1m v1.6.0
log files from kubelet is spamming: Mar 30 15:20:44 centos7 kubelet: I0330 12:20:44.297149 8724 kubelet.go:1752] skipping pod synchronization - [Failed to start ContainerManager failed to initialise top level QOS containers: failed to create top level Burstable QOS cgroup : Unit kubepods-burstable.slice already exists.]
What you expected to happen: I except that masters should join to cluster
How to reproduce it (as minimally and precisely as possible): kubelet args
ExecStart=/usr/bin/kubelet
–kubeconfig=/tmp/kubeconfig
–require-kubeconfig
–register-node=true
–hostname-override=kube-dev-master-2-0
–allow-privileged=true
–cgroup-driver=systemd
–cluster-dns=10.254.0.253
–cluster-domain=cluster.local
–pod-manifest-path=/etc/kubernetes/manifests
–v=4
–cloud-provider=openstack
–cloud-config=/etc/kubernetes/cloud-config
After restart of machines everything is working normally, but restart of machine after installation was not needed before 1.6.0
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 37 (22 by maintainers)
Commits related to this issue
- fix https://github.com/kubernetes/kubernetes/issues/43856\#issuecomment-294735655 — committed to rootsongjc/follow-me-install-kubernetes-cluster by rootsongjc 7 years ago
- fix https://github.com/kubernetes/kubernetes/issues/43856\#issuecomment-294735655 — committed to rootsongjc/kubernetes-handbook by rootsongjc 7 years ago
- 重新指定docker使用systemd的cgroupdriver,kuberntes issue https://github.com/kubernetes/kubernetes/issues/43856 — committed to rootsongjc/follow-me-install-kubernetes-cluster by rootsongjc 7 years ago
- Merge pull request #44940 from sjenning/bump-runc Automatic merge from submit-queue Bump runc to d223e2a Fixes https://github.com/kubernetes/kubernetes/issues/43856 @derekwaynecarr — committed to kubernetes/kubernetes by deleted user 7 years ago
- bug fix https://github.com/kubernetes/kubernetes/issues/43856 — committed to rootsongjc/kubernetes-handbook by rootsongjc 7 years ago
- Merge pull request #48117 from sjenning/bump-runc-1.6 Automatic merge from submit-queue bump runc to d223e2a cherry-pick https://github.com/kubernetes/kubernetes/pull/44940 by user request https... — committed to kubernetes/kubernetes by deleted user 7 years ago
- Allow ability to run Docker-in-Docker-in-OpenShift for testing A bit of a complicated PR, but essentially, after much trial-and-error, this is required in order to get OpenShift running within Docker... — committed to cdrage/container-pipeline-service by cdrage 6 years ago
- Allow ability to run Docker-in-Docker-in-OpenShift for testing A bit of a complicated PR, but essentially, after much trial-and-error, this is required in order to get OpenShift running within Docker... — committed to cdrage/container-pipeline-service by cdrage 6 years ago
- Allow ability to run Docker-in-Docker-in-OpenShift for testing A bit of a complicated PR, but essentially, after much trial-and-error, this is required in order to get OpenShift running within Docker... — committed to cdrage/container-pipeline-service by cdrage 6 years ago
- Allow ability to run Docker-in-Docker-in-OpenShift for testing A bit of a complicated PR, but essentially, after much trial-and-error, this is required in order to get OpenShift running within Docker... — committed to cdrage/container-pipeline-service by cdrage 6 years ago
- Allow ability to run Docker-in-Docker-in-OpenShift for testing A bit of a complicated PR, but essentially, after much trial-and-error, this is required in order to get OpenShift running within Docker... — committed to cdrage/container-pipeline-service by cdrage 6 years ago
You are really looking for anything kubepod*.slice to stop which is how I resolved my issue:
Try this:
for i in $(/usr/bin/systemctl list-unit-files --no-legend --no-pager -l | grep --color=never -o .*.slice | grep kubepod);do systemctl stop $i;done
Yes, I’ve seen this before if the kubelet doesn’t clean up completely.
QoS level cgroups were enabled by default in 1.6 which explains why this may have not been observed earlier.
The QoS cgroups are created as transient systemd slices when using the systemd cgroup driver. If those slices already exist, then container manager currently has a issue with that.
As I recall, a workaround that doesn’t involve rebooting is
systemctl stop kubepods-burstable.slice. I believe this deletes the slice and allows the kubelet to recreate it.@derekwaynecarr can we add code to check if the QoS level slices are already running and just move along in that case?
@ReSearchITEng I have no further good solution by now. These two method works for me:
--native.cgroupdriver=systemdin your docker.service file, make sure it is the same withkubelet cgroup driver. Maybe you can try this as @youngdev mentioned aboveSeeing the same issue on Red Hat Enterprise Linux Server 7.3 Kernel: 3.10.0-514.10.2.el7.x86_64 (Same as OP, so I’m suspecting the kernel might have something to do with it)
A reboot of the machine however seem to fix it.