kubernetes: kubelet fails starting CRI-O containers (Ubuntu 18.04 + systemd cgroups driver)
What happened:
kubelet doesn’t seem to be able to manage containers started with CRI-O any more
I’m testing kubelet + CRI-O on an Ubuntu 18.04 server installation, inside a qemu/kvm VM managed by Vagrant (see https://github.com/olberger/vagrant-k8s-kubevirt-katacontainers/tree/539c9d06ffa611aa2d08da6cfa617d42fe218a40), in order to be able to test k8s + KubeVirt + KataContainers, which are all compatible with CRI-O
kubelet will obsess on starting new containers, which looks like and endless process, and the cluster never gets ready.
For instance here are some syslogs :
May 8 13:32:44 ubuntu1804 systemd[1]: Started crio-conmon-93673e8dc9ee02e7855aa616de720d15d8c009013dc9577ad303ea38c00c67a6.scope.
May 8 13:32:44 ubuntu1804 systemd[1]: Created slice libcontainer_16535_systemd_test_default.slice.
May 8 13:32:44 ubuntu1804 systemd[1]: Removed slice libcontainer_16535_systemd_test_default.slice.
May 8 13:32:44 ubuntu1804 kubelet[9399]: W0508 13:32:44.377844 9399 raw.go:87] Error while processing event ("/sys/fs/cgroup/cpu,cpuacct/libcontainer_16535_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/cpu,cpuacct/libcontainer_16535_systemd_test_default.slice: no such file or directory
May 8 13:32:44 ubuntu1804 kubelet[9399]: W0508 13:32:44.377890 9399 raw.go:87] Error while processing event ("/sys/fs/cgroup/blkio/libcontainer_16535_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/blkio/libcontainer_16535_systemd_test_default.slice: no such file or directory
May 8 13:32:44 ubuntu1804 kubelet[9399]: W0508 13:32:44.377913 9399 raw.go:87] Error while processing event ("/sys/fs/cgroup/memory/libcontainer_16535_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/memory/libcontainer_16535_systemd_test_default.slice: no such file or directory
May 8 13:32:44 ubuntu1804 kubelet[9399]: W0508 13:32:44.377993 9399 raw.go:87] Error while processing event ("/sys/fs/cgroup/devices/libcontainer_16535_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/libcontainer_16535_systemd_test_default.slice: no such file or directory
May 8 13:32:44 ubuntu1804 kubelet[9399]: W0508 13:32:44.380535 9399 container.go:409] Failed to create summary reader for "/libcontainer_16535_systemd_test_default.slice": none of the resources are being tracked.
May 8 13:32:44 ubuntu1804 kubelet[9399]: W0508 13:32:44.381180 9399 raw.go:87] Error while processing event ("/sys/fs/cgroup/memory/libcontainer_16535_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): readdirent: no such file or directory
May 8 13:32:44 ubuntu1804 kubelet[9399]: W0508 13:32:44.381358 9399 raw.go:87] Error while processing event ("/sys/fs/cgroup/devices/libcontainer_16535_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/libcontainer_16535_systemd_test_default.slice: no such file or directory
May 8 13:32:44 ubuntu1804 systemd[1]: Created slice libcontainer_16535_systemd_test_default.slice.
May 8 13:32:44 ubuntu1804 systemd[1]: Removed slice libcontainer_16535_systemd_test_default.slice.
May 8 13:32:44 ubuntu1804 systemd[1]: Started libcontainer container 93673e8dc9ee02e7855aa616de720d15d8c009013dc9577ad303ea38c00c67a6.
May 8 13:32:44 ubuntu1804 kubelet[9399]: E0508 13:32:44.391247 9399 kubelet.go:2244] node "kubernetes-vagrant-01" not found
May 8 13:32:44 ubuntu1804 kubelet[9399]: E0508 13:32:44.433374 9399 fsHandler.go:118] failed to collect filesystem stats - rootDiskErr: could not stat "/var/lib/containers/storage/overlay/93f8d93979e19d2d23dcccf4cf66d66ad7eced76f53871ef010de26816c821ac/diff" to get inode usage: stat /var/lib/containers/storage/overlay/93f8d93979e19d2d23dcccf4cf66d66ad7eced76f53871ef010de26816c821ac/diff: no such file or directory, extraDiskErr: <nil>
May 8 13:32:44 ubuntu1804 crio[8875]: open /dev/stderr: no such device or address
May 8 13:32:44 ubuntu1804 crio[8875]: NAME:
May 8 13:32:44 ubuntu1804 crio[8875]: runc - Open Container Initiative runtime
May 8 13:32:44 ubuntu1804 crio[8875]: runc is a command line client for running applications packaged according to
May 8 13:32:44 ubuntu1804 crio[8875]: the Open Container Initiative (OCI) format and is a compliant implementation of the
May 8 13:32:44 ubuntu1804 crio[8875]: Open Container Initiative specification.
What you expected to happen:
It used to work (see asciinema recording at https://www-public.imtbs-tsp.eu/~berger_o/weblog/2019/04/26/testing-kubevirt-for-running-vms-inside-kubernetes-in-a-vagrant-qemu-vm/) a few days ago, but I fear some recent changes broke it.
How to reproduce it (as minimally and precisely as possible):
Testing with vagrant up --provider=libvirt on Linux with a clone of that commit: https://github.com/olberger/vagrant-k8s-kubevirt-katacontainers/tree/539c9d06ffa611aa2d08da6cfa617d42fe218a40
vagrant ssh into the VM and ps -edf f or tail -f /var/log/syslog would do…
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
):
kubernetes-vagrant-01: [config/images] Pulled k8s.gcr.io/kube-apiserver:v1.14.1
kubernetes-vagrant-01: [config/images] Pulled k8s.gcr.io/kube-controller-manager:v1.14.1
kubernetes-vagrant-01: [config/images] Pulled k8s.gcr.io/kube-scheduler:v1.14.1
kubernetes-vagrant-01: [config/images] Pulled k8s.gcr.io/kube-proxy:v1.14.1
kubernetes-vagrant-01: [config/images] Pulled k8s.gcr.io/pause:3.1
kubernetes-vagrant-01: [config/images] Pulled k8s.gcr.io/etcd:3.3.10
kubernetes-vagrant-01: [config/images] Pulled k8s.gcr.io/coredns:1.3.1
- Cloud provider or hardware configuration: qemu/kvm VM +
crictl version
Version: 0.1.0
RuntimeName: cri-o
RuntimeVersion: 1.13.1-dev
RuntimeApiVersion: v1alpha1
- OS (e.g:
cat /etc/os-release
): Ubuntu 18.04.2 LTS - Kernel (e.g.
uname -a
): Linux kubernetes-vagrant-01 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux - Install tools: kubeadm as in https://github.com/olberger/vagrant-k8s-kubevirt-katacontainers/blob/539c9d06ffa611aa2d08da6cfa617d42fe218a40/kubernetes.sh
- Network plugin and version (if this is a network-related bug):
- Others:
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 1
- Comments: 35 (11 by maintainers)
Links to this issue
Commits related to this issue
- main: not reopen /dev/stderr commit a1460818288b8addfe9b70c8931da83864251f7a introduced a change to write to /dev/stderr by default. Do not reopen the file in this case, but use directly the fd 2. ... — committed to giuseppe/runc by giuseppe 5 years ago
- main: not reopen /dev/stderr commit a1460818288b8addfe9b70c8931da83864251f7a introduced a change to write to /dev/stderr by default. Do not reopen the file in this case, but use directly the fd 2. ... — committed to kinvolk/runc by giuseppe 5 years ago
- main: not reopen /dev/stderr commit a1460818288b8addfe9b70c8931da83864251f7a introduced a change to write to /dev/stderr by default. Do not reopen the file in this case, but use directly the fd 2. ... — committed to adrianreber/runc by giuseppe 5 years ago
- update runc to v1.0.0-rc8-92-g84373aaa (CVE-2019-16884) full diff: https://github.com/opencontainers/runc/compare/v1.0.0-rc8...3e425f80a8c931f88e6d94a8c831b9d5aa481657 - opencontainers/runc#2010 c... — committed to thaJeztah/docker by thaJeztah 5 years ago
- bump runc vendor v1.0.0-rc8-92-g84373aaa full diff: https://github.com/opencontainers/runc/compare/v1.0.0-rc8...3e425f80a8c931f88e6d94a8c831b9d5aa481657 - opencontainers/runc#2010 criu image path ... — committed to thaJeztah/docker by thaJeztah 5 years ago
- update runc to v1.0.0-rc8-92-g84373aaa (CVE-2019-16884) full diff: https://github.com/opencontainers/runc/compare/v1.0.0-rc8...3e425f80a8c931f88e6d94a8c831b9d5aa481657 - opencontainers/runc#2010 c... — committed to thaJeztah/docker by thaJeztah 5 years ago
- bump runc vendor v1.0.0-rc8-92-g84373aaa full diff: https://github.com/opencontainers/runc/compare/v1.0.0-rc8...3e425f80a8c931f88e6d94a8c831b9d5aa481657 - opencontainers/runc#2010 criu image path ... — committed to thaJeztah/docker by thaJeztah 5 years ago
- update runc to v1.0.0-rc8-92-g84373aaa (CVE-2019-16884) full diff: https://github.com/opencontainers/runc/compare/v1.0.0-rc8...3e425f80a8c931f88e6d94a8c831b9d5aa481657 - opencontainers/runc#2010 c... — committed to thaJeztah/docker by thaJeztah 5 years ago
- update runc to v1.0.0-rc8-92-g84373aaa (CVE-2019-16884) full diff: https://github.com/opencontainers/runc/compare/v1.0.0-rc8...3e425f80a8c931f88e6d94a8c831b9d5aa481657 - opencontainers/runc#2010 c... — committed to docker/docker-ce by thaJeztah 5 years ago
- bump runc vendor v1.0.0-rc8-92-g84373aaa full diff: https://github.com/opencontainers/runc/compare/v1.0.0-rc8...3e425f80a8c931f88e6d94a8c831b9d5aa481657 - opencontainers/runc#2010 criu image path ... — committed to docker/docker-ce by thaJeztah 5 years ago
- update runc to v1.0.0-rc8-92-g84373aaa (CVE-2019-16884) full diff: https://github.com/opencontainers/runc/compare/v1.0.0-rc8...3e425f80a8c931f88e6d94a8c831b9d5aa481657 - opencontainers/runc#2010 c... — committed to docker/docker-ce by thaJeztah 5 years ago
- update runc to v1.0.0-rc8-92-g84373aaa (CVE-2019-16884) full diff: https://github.com/opencontainers/runc/compare/v1.0.0-rc8...3e425f80a8c931f88e6d94a8c831b9d5aa481657 - opencontainers/runc#2010 c... — committed to docker/docker-ce by thaJeztah 5 years ago
- bump runc vendor v1.0.0-rc8-92-g84373aaa full diff: https://github.com/opencontainers/runc/compare/v1.0.0-rc8...3e425f80a8c931f88e6d94a8c831b9d5aa481657 - opencontainers/runc#2010 criu image path ... — committed to docker/docker-ce by thaJeztah 5 years ago
- update runc to v1.0.0-rc8-92-g84373aaa (CVE-2019-16884) full diff: https://github.com/opencontainers/runc/compare/v1.0.0-rc8...3e425f80a8c931f88e6d94a8c831b9d5aa481657 - opencontainers/runc#2010 c... — committed to burnMyDread/moby by thaJeztah 5 years ago
PR here: opencontainers/runc#2057