kind: kind create cluster fails on MacOS + docker desktop

What happened:

 ~ kind create cluster
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.25.3) 🖼
 ✓ Preparing nodes 📦
 ✓ Writing configuration 📜
 ✗ Starting control-plane 🕹️
ERROR: failed to create cluster: failed to init node with kubeadm: command "docker exec --privileged kind-control-plane kubeadm init --skip-phases=preflight --config=/kind/kubeadm.conf --skip-token-print --v=6" failed with error: exit status 1
Command Output: I0111 12:08:48.281449     132 initconfiguration.go:254] loading configuration from "/kind/kubeadm.conf"
W0111 12:08:48.282858     132 initconfiguration.go:331] [config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1beta3, Kind=JoinConfiguration
[init] Using Kubernetes version: v1.25.3
[certs] Using certificateDir folder "/etc/kubernetes/pki"
I0111 12:08:48.289153     132 certs.go:112] creating a new certificate authority for ca
[certs] Generating "ca" certificate and key
I0111 12:08:48.393822     132 certs.go:522] validating certificate period for ca certificate
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kind-control-plane kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local localhost] and IPs [10.96.0.1 172.23.0.2 127.0.0.1]
[certs] Generating "apiserver-kubelet-client" certificate and key
I0111 12:08:48.827200     132 certs.go:112] creating a new certificate authority for front-proxy-ca
[certs] Generating "front-proxy-ca" certificate and key
I0111 12:08:49.033611     132 certs.go:522] validating certificate period for front-proxy-ca certificate
[certs] Generating "front-proxy-client" certificate and key
I0111 12:08:49.219289     132 certs.go:112] creating a new certificate authority for etcd-ca
[certs] Generating "etcd/ca" certificate and key
I0111 12:08:49.319645     132 certs.go:522] validating certificate period for etcd/ca certificate
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [kind-control-plane localhost] and IPs [172.23.0.2 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [kind-control-plane localhost] and IPs [172.23.0.2 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
I0111 12:08:49.742862     132 certs.go:78] creating new public/private key files for signing service account users
[certs] Generating "sa" key and public key
I0111 12:08:49.876848     132 kubeconfig.go:103] creating kubeconfig file for admin.conf
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
I0111 12:08:50.089378     132 kubeconfig.go:103] creating kubeconfig file for kubelet.conf
[kubeconfig] Writing "kubelet.conf" kubeconfig file
I0111 12:08:50.161976     132 kubeconfig.go:103] creating kubeconfig file for controller-manager.conf
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
I0111 12:08:50.598000     132 kubeconfig.go:103] creating kubeconfig file for scheduler.conf
[kubeconfig] Writing "scheduler.conf" kubeconfig file
I0111 12:08:50.653232     132 kubelet.go:66] Stopping the kubelet
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
I0111 12:08:50.770349     132 manifests.go:99] [control-plane] getting StaticPodSpecs
I0111 12:08:50.770602     132 certs.go:522] validating certificate period for CA certificate
I0111 12:08:50.770699     132 manifests.go:125] [control-plane] adding volume "ca-certs" for component "kube-apiserver"
I0111 12:08:50.770705     132 manifests.go:125] [control-plane] adding volume "etc-ca-certificates" for component "kube-apiserver"
I0111 12:08:50.770709     132 manifests.go:125] [control-plane] adding volume "k8s-certs" for component "kube-apiserver"
I0111 12:08:50.770712     132 manifests.go:125] [control-plane] adding volume "usr-local-share-ca-certificates" for component "kube-apiserver"
I0111 12:08:50.770716     132 manifests.go:125] [control-plane] adding volume "usr-share-ca-certificates" for component "kube-apiserver"
I0111 12:08:50.773904     132 manifests.go:154] [control-plane] wrote static Pod manifest for component "kube-apiserver" to "/etc/kubernetes/manifests/kube-apiserver.yaml"
I0111 12:08:50.773964     132 manifests.go:99] [control-plane] getting StaticPodSpecs
[control-plane] Creating static Pod manifest for "kube-controller-manager"
I0111 12:08:50.774644     132 manifests.go:125] [control-plane] adding volume "ca-certs" for component "kube-controller-manager"
I0111 12:08:50.774687     132 manifests.go:125] [control-plane] adding volume "etc-ca-certificates" for component "kube-controller-manager"
I0111 12:08:50.774693     132 manifests.go:125] [control-plane] adding volume "flexvolume-dir" for component "kube-controller-manager"
I0111 12:08:50.774697     132 manifests.go:125] [control-plane] adding volume "k8s-certs" for component "kube-controller-manager"
I0111 12:08:50.774701     132 manifests.go:125] [control-plane] adding volume "kubeconfig" for component "kube-controller-manager"
I0111 12:08:50.774705     132 manifests.go:125] [control-plane] adding volume "usr-local-share-ca-certificates" for component "kube-controller-manager"
I0111 12:08:50.774709     132 manifests.go:125] [control-plane] adding volume "usr-share-ca-certificates" for component "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
I0111 12:08:50.776035     132 manifests.go:154] [control-plane] wrote static Pod manifest for component "kube-controller-manager" to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
I0111 12:08:50.776106     132 manifests.go:99] [control-plane] getting StaticPodSpecs
I0111 12:08:50.776262     132 manifests.go:125] [control-plane] adding volume "kubeconfig" for component "kube-scheduler"
I0111 12:08:50.776771     132 manifests.go:154] [control-plane] wrote static Pod manifest for component "kube-scheduler" to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
I0111 12:08:50.777457     132 local.go:65] [etcd] wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/manifests/etcd.yaml"
I0111 12:08:50.777499     132 waitcontrolplane.go:83] [wait-control-plane] Waiting for the API server to be healthy
I0111 12:08:50.777977     132 loader.go:374] Config loaded from file:  /etc/kubernetes/admin.conf
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
I0111 12:08:50.781500     132 round_trippers.go:553] GET https://kind-control-plane:6443/healthz?timeout=10s  in 1 milliseconds
I0111 12:08:51.282917     132 round_trippers.go:553] GET https://kind-control-plane:6443/healthz?timeout=10s  in 0 milliseconds
I0111 12:08:51.783460     132 round_trippers.go:553] GET https://kind-control-plane:6443/healthz?timeout=10s  in 1 milliseconds
I0111 12:08:52.283225     132 round_trippers.go:553] GET https://kind-control-plane:6443/healthz?timeout=10s  in 0 milliseconds
.....
........
...............
I0111 12:12:50.788384     132 round_trippers.go:553] GET https://kind-control-plane:6443/healthz?timeout=10s  in 0 milliseconds
couldn't initialize a Kubernetes cluster
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/init.runWaitControlPlanePhase
	cmd/kubeadm/app/cmd/phases/init/waitcontrolplane.go:108
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
	cmd/kubeadm/app/cmd/phases/workflow/runner.go:234
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
	cmd/kubeadm/app/cmd/phases/workflow/runner.go:421
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
	cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1
	cmd/kubeadm/app/cmd/init.go:154
github.com/spf13/cobra.(*Command).execute
	vendor/github.com/spf13/cobra/command.go:856
github.com/spf13/cobra.(*Command).ExecuteC
	vendor/github.com/spf13/cobra/command.go:974
github.com/spf13/cobra.(*Command).Execute
	vendor/github.com/spf13/cobra/command.go:902
k8s.io/kubernetes/cmd/kubeadm/app.Run
	cmd/kubeadm/app/kubeadm.go:50
main.main
	cmd/kubeadm/kubeadm.go:25
runtime.main
	/usr/local/go/src/runtime/proc.go:250
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1594
error execution phase wait-control-plane
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
	cmd/kubeadm/app/cmd/phases/workflow/runner.go:235
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
	cmd/kubeadm/app/cmd/phases/workflow/runner.go:421
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
	cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1
	cmd/kubeadm/app/cmd/init.go:154
github.com/spf13/cobra.(*Command).execute
	vendor/github.com/spf13/cobra/command.go:856
github.com/spf13/cobra.(*Command).ExecuteC
	vendor/github.com/spf13/cobra/command.go:974
github.com/spf13/cobra.(*Command).Execute
	vendor/github.com/spf13/cobra/command.go:902
k8s.io/kubernetes/cmd/kubeadm/app.Run
	cmd/kubeadm/app/kubeadm.go:50
main.main
	cmd/kubeadm/kubeadm.go:25
runtime.main
	/usr/local/go/src/runtime/proc.go:250
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1594

Unfortunately, an error has occurred:
	timed out waiting for the condition

This error is likely caused by:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
	- 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
	Once you have found the failing container, you can inspect its logs with:
	- 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock logs CONTAINERID'

What you expected to happen: A cluster has been created

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Additional logs attached to the issue as .zip archive. kind-control-plane.zip

Environment:

  • kind version: (use kind version):
 ~ kind --version
kind version 0.17.0
  • Runtime info: (use docker info or podman info):
  • OS (e.g. from /etc/os-release): MacOSx + docker desktop (Docker Desktop 4.15.0 (93002) is currently the newest version available.)
  • Kubernetes version: (use kubectl version):
 ~ kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.2", GitCommit:"5835544ca568b757a8ecae5c153f317e5736700e", GitTreeState:"clean", BuildDate:"2022-09-21T14:33:49Z", GoVersion:"go1.19.1", Compiler:"gc", Platform:"darwin/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-04-14T08:43:11Z", GoVersion:"go1.17.9", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.25) and server (1.23) exceeds the supported minor version skew of +/-1
 ~ docker info
Client:
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc., v0.9.1)
  compose: Docker Compose (Docker Inc., v2.13.0)
  dev: Docker Dev Environments (Docker Inc., v0.0.5)
  extension: Manages Docker extensions (Docker Inc., v0.2.16)
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc., 0.6.0)
  scan: Docker Scan (Docker Inc., v0.22.0)

Server:
 Containers: 10
  Running: 1
  Paused: 0
  Stopped: 9
 Images: 26
 Server Version: 20.10.21
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 770bd0108c32f3fb5c73ae1264f7e503fe7b2661
 runc version: v1.1.4-0-g5fd4c4d
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.15.49-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 6
 Total Memory: 7.675GiB
 Name: docker-desktop
 ID: ERYV:IPQQ:OQAQ:GX7W:XYED:ICON:W4GJ:A3V2:45F5:GTRB:OY3H:IFZZ
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  hubproxy.docker.internal:5000
  127.0.0.0/8
 Live Restore Enabled: false

  • Any proxies or other special environment settings?: nope

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 22 (12 by maintainers)

Most upvoted comments

I’ve just stumbled across the same issue (also using my daily driver mac with docker desktop) and did some debugging. One thing that immediately caught my eye is that my journal.log was full of error messages like this:

Jan 20 16:19:32 kind-control-plane kubelet[184]: E0120 16:19:32.302684 184 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"CreatePodS andbox\" for \"kube-scheduler-kind-control-plane_kube-system(6d3dda2cad9846e0d48dbd5d5b9f59fc)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"kube-scheduler-kind-control-plane_kube-system(6d3dda2cad9846e0d48dbd5d5b9f59fc)\\\": rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \\\"slice:prefix:name\\\" for systemd cgroups, got \\\"/kubelet/kubepods/burstable/pod6d3dda2cad9846e0d48dbd5d5b9f59fc/5bc1337ac55d891c743783740d686e714686e063bb37969ac965f44f2ab091de\\\" instead: unknown\"" pod="kube-system/kube-scheduler-kind-control-plane" podUID=6d3dda2cad9846e0d48dbd5d5b9f59fc

While doing some research on that error, I found this thread which explained the cause nicely: https://github.com/containerd/containerd/issues/4857#issuecomment-747238907. So next, I did what was suggested in that post - I created a custom KubeletConfiguration that explicitly sets systemd as the cgroupDriver and put that into my kind config file. And now it actually works for me again. Here is the relevant part of my kind config for reference:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    kind: KubeletConfiguration
    cgroupDriver: systemd

Hope this helps you out!

@BenTheElder can report that I also started running into this same problem where the health check fails when setting up the control plane.

Recently returned to a project, after not touching it for a few months. It was utilizing docker desktop and kind without issue on macos ARM. Docker desktop logs were reporting issues with privileged ports (which were enabled).

Tried:

  • podman, same issue
  • cgroup patch, same issue
  • upgrading kind 17.x to 18.x, same issue (old I know, just what nixpkgs supplies on the latest channels)

Then I went back to docker desktop (v4.21.1) and it worked. I suspected updating the nix channel in my project did the trick, it bumped a few deps.

  • kubectl client 1.25.4 -> 1.27.1
  • kubectl server 1.25.3 -> 1.26.3
  • kind 0.17.0 go1.19.9 darwin/arm64 -> 0.18.0 go1.20.6 darwin/arm64
  • kindest/node:1.25.3 -> kindest/node:v1.26.3

So I dropped back down and it still works. Removed the cgroup patch, still works. So I’m a bit clueless as to what resolved it. 🤷

Hopefully this is useful info. I’m unstuck for now, so no worries here.

@vallpaper that would be #2718