k3d: [BUG] Cluster fails to start on cgroup v2
What did you do
Start a minimal cluster on Kali Linux 2020.4
* How was the cluster created?
* k3d cluster create
-
-
What did you do afterwards?
- I inspected the error and saw it had something to do with cgroups and I noticed the latest kernel update to Kali switch the cgroup file hiearchy from v1 to v2.
-
What did you expect to happen
That a minimal cluster would start
Screenshots or terminal output
{"log":"time=\"2021-02-10T15:54:15.154488575Z\" level=info msg=\"Containerd is now running\"\n","stream":"stderr","time":"2021-02-10T15:54:15.154604054Z"}
{"log":"time=\"2021-02-10T15:54:15.276436029Z\" level=info msg=\"Connecting to proxy\" url=\"wss://127.0.0.1:6443/v1-k3s/connect\"\n","stream":"stderr","time":"2021-02-10T15:54:15.276584849Z"}
{"log":"time=\"2021-02-10T15:54:15.344809810Z\" level=info msg=\"Handling backend connection request [k3d-minimal-default-server-0]\"\n","stream":"stderr","time":"2021-02-10T15:54:15.344941507Z"}
{"log":"time=\"2021-02-10T15:54:15.383483103Z\" level=warning msg=\"**Disabling CPU quotas due to missing cpu.cfs_period_us**\"\n","stream":"stderr","time":"2021-02-10T15:54:15.383600244Z"}
{"log":"time=\"2021-02-10T15:54:15.383649950Z\" level=warning msg=\"**Disabling pod PIDs limit feature due to missing cgroup pids support**\"\n","stream":"stderr","time":"2021-02-10T15:54:15.383683752Z"}
{"log":"time=\"2021-02-10T15:54:15.383773636Z\" level=info msg=\"Running kubelet --address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=cgroupfs --cgroups-per-qos=false --client-ca-file=/var/lib/rancher/k3s/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --cni-bin-dir=/bin --cni-conf-dir=/var/lib/rancher/k3s/agent/etc/cni/net.d --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --container-runtime=remote --containerd=unix:///run/k3s/containerd/containerd.sock --cpu-cfs-quota=false --enforce-node-allocatable= --eviction-hard=imagefs.available\u003c5%,nodefs.available\u003c5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --feature-gates=SupportPodPidsLimit=false --healthz-bind-address=127.0.0.1 --hostname-override=k3d-minimal-default-server-0 --kubeconfig=/var/lib/rancher/k3s/agent/kubelet.kubeconfig --node-labels= --pod-manifest-path=/var/lib/rancher/k3s/agent/pod-manifests --read-only-port=0 --resolv-conf=/tmp/k3s-resolv.conf --serialize-image-pulls=false --tls-cert-file=/var/lib/rancher/k3s/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/k3s/agent/serving-kubelet.key\"\n","stream":"stderr","time":"2021-02-10T15:54:15.383842163Z"}
{"log":"time=\"2021-02-10T15:54:15.384645964Z\" level=info msg=\"Running kube-proxy --cluster-cidr=10.42.0.0/16 --healthz-bind-address=127.0.0.1 --hostname-override=k3d-minimal-default-server-0 --kubeconfig=/var/lib/rancher/k3s/agent/kubeproxy.kubeconfig --proxy-mode=iptables\"\n","stream":"stderr","time":"2021-02-10T15:54:15.38471723Z"}
{"log":"Flag --cloud-provider has been deprecated, will be removed in 1.23, in favor of removing cloud provider code from Kubelet.\n","stream":"stderr","time":"2021-02-10T15:54:15.387483943Z"}
{"log":"Flag --containerd has been deprecated, This is a cadvisor flag that was mistakenly registered with the Kubelet. Due to legacy concerns, it will follow the standard CLI deprecation timeline before being removed.\n","stream":"stderr","time":"2021-02-10T15:54:15.387594058Z"}
{"log":"F0210 15:54:15.387923 7 server.go:181] cannot set feature gate SupportPodPidsLimit to false, feature is locked to true\n","stream":"stderr","time":"2021-02-10T15:54:15.387966646Z"}
{"log":"goroutine 3978 [running]:\n","stream":"stderr","time":"2021-02-10T15:54:15.549704084Z"}
Which OS & Architecture
* Linux Kali 2020.4 amd64 x86
Which version of k3d
* output of `k3d version`
$ k3d version
k3d version v4.2.0
k3s version v1.20.0-k3s1 (default)
Which version of docker
* output of `docker version` and `docker info`
$ docker version
Client:
Version: 20.10.2+dfsg1
API version: 1.41
Go version: go1.15.6
Git commit: 2291f61
Built: Fri Jan 8 07:08:51 2021
OS/Arch: linux/amd64
Experimental: true
Server: Engine: Version: 20.10.2+dfsg1 API version: 1.41 (minimum version 1.12) Go version: go1.15.6 Git commit: 8891c58 Built: Fri Jan 8 07:08:51 2021 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.4.3~ds1 GitCommit: 1.4.3~ds1-1+b1 runc: Version: 1.0.0-rc92+dfsgl GitCommit: 1.0.0-rc92+dfsgl-5+b1 docker-init: Version: 0.19.0 GitCommit:
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 10
- Comments: 41 (14 by maintainers)
Commits related to this issue
- Logs mj1 for https://github.com/rancher/k3d/issues/493 — committed to mj41-gdc/k3d-debug by mj41-gdc 3 years ago
- mj1 logs for https://github.com/rancher/k3d/issues/493 — committed to mj41-gdc/k3d-debug by mj41-gdc 3 years ago
- mj2 logs https://github.com/rancher/k3d/issues/493 — committed to mj41-gdc/k3d-debug by mj41-gdc 3 years ago
- mj3 for https://github.com/rancher/k3d/issues/493 — committed to mj41-gdc/k3d-debug by mj41-gdc 3 years ago
- mj4 for https://github.com/rancher/k3d/issues/493 — committed to mj41-gdc/k3d-debug by mj41-gdc 3 years ago
Works for me on Arch by executing
grub-mkconfig -o /boot/grub/grub.cfgafter adding the same to my/etc/default/grubfile.For anyone running into this on NixOS, setting
systemd.enableUnifiedCgroupHierarchy = false;in your configuration.nix ought to help. (See https://github.com/NixOS/nixpkgs/issues/111835)export K3D_FIX_CGROUPV2=1 ; k3d cluster create default -v /dev/mapper:/dev/mapperworks on Fedora 33 with Docker and cgroupv2. Great work. Thank you @iwilltry42 , @AkihiroSuda and others.
Detailed logs https://github.com/mj41-gdc/k3d-debug/tree/k3d-issues-493-mj7
Hello,
I have the same issue on Arch Linux. I also have cgroup v2
You can give this a try now on cgroupv2:
k3d cluster create test --image iwilltry42/k3s:dev-20210427.2 --verbosethe image is custom but only contains the new entrypoint from https://github.com/k3s-io/k3s/pull/3237 . There’s the discussion to move this entrypoint script’s functionality into the k3s agent, so we’ll have to wait for that.iwilltry42/k3s:dev-20210427.2is built from the currentrancher/k3s:latest(sha256-17d1cc189d289649d309169f25cee5e2c2e6e25ecf5b84026c3063c6590af9c8), which isv1.21.0+k3s1.I tested it without issues on Ubuntu 20.10 with cgroupv1 and cgroupv2 (systemd).
For cgroup v2, k3s/k3d needs to have a logic to evacuate the init process from the top-level cgroup to somewhere else, like this: https://github.com/moby/moby/blob/e0170da0dc6e660594f98bc66e7a98ce9c2abb46/hack/dind#L28-L37
Issue still persist on k3d v4.4.2 with k3s v1.20.6-k3s1
Would be good if docs listed that
k3dis not yet compatible withcgroupv2, so users would know in advance if they need to adjust kernel opts.Using Debian Sid, in the meantime, I personally switched back to
cgroup v1. I addedsystemd.unified_cgroup_hierarchy=0to myGRUB_CMDLINE_LINUX_DEFAULT(/etc/default/grub) and then ranupdate-grub.@iwilltry42 i’m able to confirm that this adds v2 support on my system. thank you!
I just created a (temporary) fix/workaround using the entrypoint script that we can use until it was fixed upstream (in k3s). See PR #579 . There’s a dev release out already: https://github.com/rancher/k3d/releases/tag/v4.4.3-dev.0 Please test it with the environment variable
K3D_FIX_CGROUPV2=1set to enable the workaround. Feedback welcome 😃@iwilltry42 I confirm that image works correctly with
cgroupv2onarchlinuxEDIT: I also confirm it works correctly with
https://github.com/rancher/k3d/releases/tag/v4.4.3-dev.0using env.@iwilltry42 Yes, thanks
For ArchLinux users that now run systemd v248+ and use systemd-boot here’s how I fixed it for my system:
vim /boot/loader/entries/arch.confThen verified with
ls /sys/fs/cgroupto see if there’s ablkio/folder among others again. As described by https://wiki.archlinux.org/index.php/cgroups#Switching_to_cgroups_v2Seems like
started but keeps restarting for some reason.
Same issue on Fedora 33:
Logs: