kubernetes: kubelet fails starting CRI-O containers (Ubuntu 18.04 + systemd cgroups driver)

What happened:

kubelet doesn’t seem to be able to manage containers started with CRI-O any more

I’m testing kubelet + CRI-O on an Ubuntu 18.04 server installation, inside a qemu/kvm VM managed by Vagrant (see https://github.com/olberger/vagrant-k8s-kubevirt-katacontainers/tree/539c9d06ffa611aa2d08da6cfa617d42fe218a40), in order to be able to test k8s + KubeVirt + KataContainers, which are all compatible with CRI-O

kubelet will obsess on starting new containers, which looks like and endless process, and the cluster never gets ready.

For instance here are some syslogs :

May  8 13:32:44 ubuntu1804 systemd[1]: Started crio-conmon-93673e8dc9ee02e7855aa616de720d15d8c009013dc9577ad303ea38c00c67a6.scope.
May  8 13:32:44 ubuntu1804 systemd[1]: Created slice libcontainer_16535_systemd_test_default.slice.
May  8 13:32:44 ubuntu1804 systemd[1]: Removed slice libcontainer_16535_systemd_test_default.slice.
May  8 13:32:44 ubuntu1804 kubelet[9399]: W0508 13:32:44.377844    9399 raw.go:87] Error while processing event ("/sys/fs/cgroup/cpu,cpuacct/libcontainer_16535_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/cpu,cpuacct/libcontainer_16535_systemd_test_default.slice: no such file or directory
May  8 13:32:44 ubuntu1804 kubelet[9399]: W0508 13:32:44.377890    9399 raw.go:87] Error while processing event ("/sys/fs/cgroup/blkio/libcontainer_16535_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/blkio/libcontainer_16535_systemd_test_default.slice: no such file or directory
May  8 13:32:44 ubuntu1804 kubelet[9399]: W0508 13:32:44.377913    9399 raw.go:87] Error while processing event ("/sys/fs/cgroup/memory/libcontainer_16535_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/memory/libcontainer_16535_systemd_test_default.slice: no such file or directory
May  8 13:32:44 ubuntu1804 kubelet[9399]: W0508 13:32:44.377993    9399 raw.go:87] Error while processing event ("/sys/fs/cgroup/devices/libcontainer_16535_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/libcontainer_16535_systemd_test_default.slice: no such file or directory
May  8 13:32:44 ubuntu1804 kubelet[9399]: W0508 13:32:44.380535    9399 container.go:409] Failed to create summary reader for "/libcontainer_16535_systemd_test_default.slice": none of the resources are being tracked.
May  8 13:32:44 ubuntu1804 kubelet[9399]: W0508 13:32:44.381180    9399 raw.go:87] Error while processing event ("/sys/fs/cgroup/memory/libcontainer_16535_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): readdirent: no such file or directory
May  8 13:32:44 ubuntu1804 kubelet[9399]: W0508 13:32:44.381358    9399 raw.go:87] Error while processing event ("/sys/fs/cgroup/devices/libcontainer_16535_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/libcontainer_16535_systemd_test_default.slice: no such file or directory
May  8 13:32:44 ubuntu1804 systemd[1]: Created slice libcontainer_16535_systemd_test_default.slice.
May  8 13:32:44 ubuntu1804 systemd[1]: Removed slice libcontainer_16535_systemd_test_default.slice.
May  8 13:32:44 ubuntu1804 systemd[1]: Started libcontainer container 93673e8dc9ee02e7855aa616de720d15d8c009013dc9577ad303ea38c00c67a6.
May  8 13:32:44 ubuntu1804 kubelet[9399]: E0508 13:32:44.391247    9399 kubelet.go:2244] node "kubernetes-vagrant-01" not found
May  8 13:32:44 ubuntu1804 kubelet[9399]: E0508 13:32:44.433374    9399 fsHandler.go:118] failed to collect filesystem stats - rootDiskErr: could not stat "/var/lib/containers/storage/overlay/93f8d93979e19d2d23dcccf4cf66d66ad7eced76f53871ef010de26816c821ac/diff" to get inode usage: stat /var/lib/containers/storage/overlay/93f8d93979e19d2d23dcccf4cf66d66ad7eced76f53871ef010de26816c821ac/diff: no such file or directory, extraDiskErr: <nil>
May  8 13:32:44 ubuntu1804 crio[8875]: open /dev/stderr: no such device or address
May  8 13:32:44 ubuntu1804 crio[8875]: NAME:
May  8 13:32:44 ubuntu1804 crio[8875]:    runc - Open Container Initiative runtime
May  8 13:32:44 ubuntu1804 crio[8875]: runc is a command line client for running applications packaged according to
May  8 13:32:44 ubuntu1804 crio[8875]: the Open Container Initiative (OCI) format and is a compliant implementation of the
May  8 13:32:44 ubuntu1804 crio[8875]: Open Container Initiative specification.

What you expected to happen:

It used to work (see asciinema recording at https://www-public.imtbs-tsp.eu/~berger_o/weblog/2019/04/26/testing-kubevirt-for-running-vms-inside-kubernetes-in-a-vagrant-qemu-vm/) a few days ago, but I fear some recent changes broke it.

How to reproduce it (as minimally and precisely as possible):

Testing with vagrant up --provider=libvirt on Linux with a clone of that commit: https://github.com/olberger/vagrant-k8s-kubevirt-katacontainers/tree/539c9d06ffa611aa2d08da6cfa617d42fe218a40

vagrant ssh into the VM and ps -edf f or tail -f /var/log/syslog would do…

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
    kubernetes-vagrant-01: [config/images] Pulled k8s.gcr.io/kube-apiserver:v1.14.1
    kubernetes-vagrant-01: [config/images] Pulled k8s.gcr.io/kube-controller-manager:v1.14.1
    kubernetes-vagrant-01: [config/images] Pulled k8s.gcr.io/kube-scheduler:v1.14.1
    kubernetes-vagrant-01: [config/images] Pulled k8s.gcr.io/kube-proxy:v1.14.1
    kubernetes-vagrant-01: [config/images] Pulled k8s.gcr.io/pause:3.1
    kubernetes-vagrant-01: [config/images] Pulled k8s.gcr.io/etcd:3.3.10
    kubernetes-vagrant-01: [config/images] Pulled k8s.gcr.io/coredns:1.3.1
  • Cloud provider or hardware configuration: qemu/kvm VM +
crictl version
Version:  0.1.0
RuntimeName:  cri-o
RuntimeVersion:  1.13.1-dev
RuntimeApiVersion:  v1alpha1

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 1
  • Comments: 35 (11 by maintainers)

Commits related to this issue

Most upvoted comments

cri-o-runc (1.0.0-rc6-1~dev~ubuntu18.04~ppa42) bionic; urgency=medium

* autobuilt b9b6cc6

-- Lokesh Mandvekar (Bot) <lsm5+bot@fedoraproject.org>  Wed, 15 May 2019 16:54:40 +0000

PR here: opencontainers/runc#2057