kubernetes: Kubelet not starting with CRI-O on BTRFS (could not find device)
What happened: Kubelet v1.19.0 doesn’t start on nodes with CRI-O 1.17.2 (storage_driver=“btrfs”) with BTRFS as root filesystem.
Related/previous issues: https://github.com/kubernetes/kubernetes/issues/47046 https://github.com/kubernetes/kubernetes/issues/65204
This error shows up in the journal:
07:42.061303 2703 kubelet.go:1296] Failed to start ContainerManager failed to get rootfs info: failed to get device for dir "/var/lib/kubelet": could not find device with major: 0, minor: 16 in cached partitions map
The minor: 16
is referring to the btrfs root filesystem, mounted at the time.
stat /
:
File: /
Size: 162 Blocks: 32 IO Block: 4096 directory
Device: 10h/16d Inode: 256 Links: 1
Access: (0755/drwxr-xr-x) Uid: ( 1000/ UNKNOWN) Gid: ( 1000/ UNKNOWN)
Access: 2020-08-29 21:00:48.000000000 +0000
Modify: 2020-08-10 12:05:17.000000000 +0000
Change: 2020-08-29 21:04:42.683000680 +0000
Birth: -
cat /proc/$(pidof kubelet)/mountinfo
:
12 18 0:11 / /sys rw,nosuid,nodev,noexec,relatime shared:7 - sysfs sysfs rw
13 18 0:12 / /proc rw,relatime shared:11 - proc proc rw
14 18 0:5 / /dev rw,nosuid,noexec,relatime shared:2 - devtmpfs udev rw,size=32894084k,nr_inodes=8223521,mode=755
15 14 0:13 / /dev/pts rw,nosuid,noexec,relatime shared:3 - devpts devpts rw,gid=5,mode=620,ptmxmode=000
16 18 0:14 / /run rw,nosuid,nodev,noexec,relatime shared:5 - tmpfs tmpfs rw,size=6586408k,mode=755
18 1 0:15 / / rw,noatime shared:1 - btrfs /dev/sda2 rw,degraded,ssd,discard=async,space_cache=v2,autodefrag,subvolid=5,subvol=/
17 14 0:17 / /dev/shm rw,nosuid,nodev shared:4 - tmpfs tmpfs rw
19 16 0:18 / /run/lock rw,nosuid,nodev,noexec,relatime shared:6 - tmpfs tmpfs rw,size=5120k
20 12 0:19 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime shared:8 - cgroup2 cgroup2 rw,nsdelegate
21 12 0:20 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:9 - pstore pstore rw
22 12 0:21 / /sys/fs/bpf rw,nosuid,nodev,noexec,relatime shared:10 - bpf none rw,mode=700
23 14 0:10 / /dev/mqueue rw,nosuid,nodev,noexec,relatime shared:12 - mqueue mqueue rw
246 16 0:24 / /run/user/0 rw,nosuid,nodev,relatime shared:148 - tmpfs tmpfs rw,size=6586404k,nr_inodes=1646601,mode=700
189 18 0:15 /var/lib/containers/storage/btrfs /var/lib/containers/storage/btrfs rw,noatime - btrfs /dev/sda2 rw,degraded,ssd,discard=async,space_cache=v2,autodefrag,subvolid=5,subvol=/
What you expected to happen:
Kubelet starts and detects the filesystem backing /var/lib/kubelet
, even if it is BTRFS.
How to reproduce it (as minimally and precisely as possible):
- Install kubeadm/kubectl 1.19.0
- Install CRI-O, I used 1.17.2 from:
deb [arch=amd64] http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/testing/Debian_Testing /
- Configure CRI-O to use
storage_driver = "btrfs"
- Perform “kubeadm init” creating a new cluster
- Observe journal (check for kubelet issues)
Anything else we need to know?:
A possible workaround is to make sure a bind mount exists which allows kubelet’s logic to find the backing fileystem. Eg. add the following fstab entry and then perform mount /var/lib/kubelet
:
/var/lib/kubelet /var/lib/kubelet none defaults,bind,nofail 0 0
The resulting mount should look somewhat like this:
273 18 0:15 /var/lib/kubelet /var/lib/kubelet rw,noatime shared:1 - btrfs /dev/sda2 rw,degraded,ssd,discard=async,space_cache=v2,autodefrag,subvolid=5,subvol=/
The workaround has been tested and seems to make kubelet work as expected.
Environment:
- Kubernetes version: 1.19.0
- OS: Debian Bullseye (Testing)
- Kernel: Linux 5.8.5
- Others: The
podman-rootless
package has been installed
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 10
- Comments: 19 (7 by maintainers)
Commits related to this issue
- workaround btrfs kubelet crashes with bind mount unit https://github.com/kubernetes/kubernetes/issues/94335 — committed to sorah/infra-public by sorah 4 years ago
- Fix kubernetes issue #94335 Try to get MountInfo even if the path is reached to "/" https://github.com/kubernetes/kubernetes/issues/94335 Signed-off-by: Geonju Kim <rjswn042@gmail.com> — committed to gjkim42/cadvisor by gjkim42 4 years ago
- Fix kubernetes issue #94335 - https://github.com/kubernetes/kubernetes/issues/94335 - Try to get MountInfo even if the path is reached to "/" - Add TestMountInfoFromDir Signed-off-by: Geonju Kim <rj... — committed to gjkim42/cadvisor by gjkim42 4 years ago
- Fix kubernetes issue #94335 - Issue: https://github.com/kubernetes/kubernetes/issues/94335 - Fix `GetDirFsDevice` to return correct `*DeviceInfo` when "/" is Btrfs - Refactor to call new function... — committed to gjkim42/cadvisor by gjkim42 4 years ago
- Fix kubernetes issue #94335 - Issue: https://github.com/kubernetes/kubernetes/issues/94335 - Fix `GetDirFsDevice` to return correct `*DeviceInfo` when "/" is Btrfs - Refactor to call new function... — committed to gjkim42/cadvisor by gjkim42 4 years ago
- Fix kubernetes issue #94335 - Issue: https://github.com/kubernetes/kubernetes/issues/94335 - Fix `GetDirFsDevice` to return correct `*DeviceInfo` when "/" is Btrfs - Refactor to call new function... — committed to gjkim42/cadvisor by gjkim42 4 years ago
- Fix kubernetes issue #94335 - Issue: https://github.com/kubernetes/kubernetes/issues/94335 - Fix `GetDirFsDevice` to return correct `*DeviceInfo` when "/" is Btrfs - Refactor to call new function... — committed to gjkim42/cadvisor by gjkim42 4 years ago
- Fix kubernetes issue #94335 - Issue: https://github.com/kubernetes/kubernetes/issues/94335 - Fix `GetDirFsDevice` to return correct `*DeviceInfo` when "/" is Btrfs - Refactor to call new function... — committed to gjkim42/cadvisor by gjkim42 4 years ago
- Fix kubernetes issue #94335 - Issue: https://github.com/kubernetes/kubernetes/issues/94335 - Fix `GetDirFsDevice` to return correct `*DeviceInfo` when "/" is Btrfs - Refactor to call new function... — committed to gjkim42/cadvisor by gjkim42 4 years ago
- Fix kubernetes issue #94335 - Issue: https://github.com/kubernetes/kubernetes/issues/94335 - Fix `GetDirFsDevice` to return correct `*DeviceInfo` when "/" is Btrfs - Refactor to call new function... — committed to gjkim42/cadvisor by gjkim42 4 years ago
- Fix kubernetes issue #94335 - Issue: https://github.com/kubernetes/kubernetes/issues/94335 - Fix `GetDirFsDevice` to return correct `*DeviceInfo` when "/" is Btrfs - Refactor to call new function... — committed to gjkim42/cadvisor by gjkim42 4 years ago
- Fix kubernetes issue #94335 - Issue: https://github.com/kubernetes/kubernetes/issues/94335 - Fix `GetDirFsDevice` to return correct `*DeviceInfo` when "/" is Btrfs - Refactor to call new function... — committed to gjkim42/cadvisor by gjkim42 4 years ago
- kubelet & containerd to store state on a separate LXC disk this is mainly here because k8s can't deal with BTRFS properly as the last major release (1.22) see: https://github.com/kubernetes/kubernet... — committed to rarescosma/ops.kube by rarescosma 3 years ago
New cAdvisor patch versions have been cut with the BTRFS fix:
/cc @gjkim42
@gjkim42
It sounds like for full fix we’ll need to cherrypick https://github.com/google/cadvisor/pull/2752 to be included in k8s 1.19 and 1.20. To do that the following is needed:
I’m happy to cut the new cAdvisor release from my side if you want to proceed and think this is critical enough bug that is worth cherrypicking.
I confirm the same issue shows up with
/var/lib/kubelet
on btrfs (on LVM on LUKS, Debian buster), running k3s. No btrfs subvols.The workaround works for me; for k3s I also need to bind mount
/var/lib/rancher
. I did not experience this issue with 1.18.I am encounter the same issue. @lyind thank you for your workaround. It seems to work.