kind: kind create cluster fails to remove control plane taint
What happened:
I tried to create a cluster with kind create cluster
and received the error “failed to remove control plane taint”
What you expected to happen:
Successfully creating a cluster
How to reproduce it (as minimally and precisely as possible):
Install kind
version v0.14.0-arm64
and call kind create cluster
Anything else we need to know?:
I’m running kind inside a container with the hosts (MacOS M1 Max) docker socket mounted and I’m able to run other containers with docker run
.
Logs:
$ kind create cluster --loglevel=debug
WARNING: --loglevel is deprecated, please switch to -v and -q!
Creating cluster "kind" ...
DEBUG: docker/images.go:58] Image: kindest/node:v1.24.0@sha256:0866296e693efe1fed79d5e6c7af8df71fc73ae45e3679af05342239cdc5bc8e present locally
✓ Ensuring node image (kindest/node:v1.24.0) 🖼
✓ Preparing nodes 📦
DEBUG: config/config.go:96] Using the following kubeadm config for node kind-control-plane:
apiServer:
certSANs:
- localhost
- 127.0.0.1
extraArgs:
runtime-config: ""
apiVersion: kubeadm.k8s.io/v1beta3
clusterName: kind
controlPlaneEndpoint: kind-control-plane:6443
controllerManager:
extraArgs:
enable-hostpath-provisioner: "true"
kind: ClusterConfiguration
kubernetesVersion: v1.24.0
networking:
podSubnet: 10.244.0.0/16
serviceSubnet: 10.96.0.0/16
scheduler:
extraArgs: null
---
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- token: abcdef.0123456789abcdef
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 172.19.0.2
bindPort: 6443
nodeRegistration:
criSocket: unix:///run/containerd/containerd.sock
kubeletExtraArgs:
node-ip: 172.19.0.2
node-labels: ""
provider-id: kind://docker/kind/kind-control-plane
---
apiVersion: kubeadm.k8s.io/v1beta3
controlPlane:
localAPIEndpoint:
advertiseAddress: 172.19.0.2
bindPort: 6443
discovery:
bootstrapToken:
apiServerEndpoint: kind-control-plane:6443
token: abcdef.0123456789abcdef
unsafeSkipCAVerification: true
kind: JoinConfiguration
nodeRegistration:
criSocket: unix:///run/containerd/containerd.sock
kubeletExtraArgs:
node-ip: 172.19.0.2
node-labels: ""
provider-id: kind://docker/kind/kind-control-plane
---
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: systemd
cgroupRoot: /kubelet
evictionHard:
imagefs.available: 0%
nodefs.available: 0%
nodefs.inodesFree: 0%
failSwapOn: false
imageGCHighThresholdPercent: 100
kind: KubeletConfiguration
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
conntrack:
maxPerCore: 0
iptables:
minSyncPeriod: 1s
kind: KubeProxyConfiguration
mode: iptables
✓ Writing configuration 📜
DEBUG: kubeadminit/init.go:82] I0808 18:27:28.895581 126 initconfiguration.go:255] loading configuration from "/kind/kubeadm.conf"
W0808 18:27:28.896451 126 initconfiguration.go:332] [config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1beta3, Kind=JoinConfiguration
[init] Using Kubernetes version: v1.24.0
[certs] Using certificateDir folder "/etc/kubernetes/pki"
I0808 18:27:28.900057 126 certs.go:112] creating a new certificate authority for ca
[certs] Generating "ca" certificate and key
I0808 18:27:29.115670 126 certs.go:522] validating certificate period for ca certificate
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kind-control-plane kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local localhost] and IPs [10.96.0.1 172.19.0.2 127.0.0.1]
[certs] Generating "apiserver-kubelet-client" certificate and key
I0808 18:27:29.338086 126 certs.go:112] creating a new certificate authority for front-proxy-ca
[certs] Generating "front-proxy-ca" certificate and key
I0808 18:27:29.421219 126 certs.go:522] validating certificate period for front-proxy-ca certificate
[certs] Generating "front-proxy-client" certificate and key
I0808 18:27:29.554232 126 certs.go:112] creating a new certificate authority for etcd-ca
[certs] Generating "etcd/ca" certificate and key
I0808 18:27:29.615892 126 certs.go:522] validating certificate period for etcd/ca certificate
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [kind-control-plane localhost] and IPs [172.19.0.2 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [kind-control-plane localhost] and IPs [172.19.0.2 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
I0808 18:27:30.083897 126 certs.go:78] creating new public/private key files for signing service account users
[certs] Generating "sa" key and public key
I0808 18:27:30.124183 126 kubeconfig.go:103] creating kubeconfig file for admin.conf
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
I0808 18:27:30.254718 126 kubeconfig.go:103] creating kubeconfig file for kubelet.conf
[kubeconfig] Writing "kubelet.conf" kubeconfig file
I0808 18:27:30.362542 126 kubeconfig.go:103] creating kubeconfig file for controller-manager.conf
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
I0808 18:27:30.463815 126 kubeconfig.go:103] creating kubeconfig file for scheduler.conf
[kubeconfig] Writing "scheduler.conf" kubeconfig file
I0808 18:27:30.698207 126 kubelet.go:65] Stopping the kubelet
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
I0808 18:27:30.784998 126 manifests.go:99] [control-plane] getting StaticPodSpecs
I0808 18:27:30.785329 126 certs.go:522] validating certificate period for CA certificate
I0808 18:27:30.785397 126 manifests.go:125] [control-plane] adding volume "ca-certs" for component "kube-apiserver"
I0808 18:27:30.785414 126 manifests.go:125] [control-plane] adding volume "etc-ca-certificates" for component "kube-apiserver"
I0808 18:27:30.785417 126 manifests.go:125] [control-plane] adding volume "k8s-certs" for component "kube-apiserver"
I0808 18:27:30.785420 126 manifests.go:125] [control-plane] adding volume "usr-local-share-ca-certificates" for component "kube-apiserver"
I0808 18:27:30.785424 126 manifests.go:125] [control-plane] adding volume "usr-share-ca-certificates" for component "kube-apiserver"
I0808 18:27:30.786696 126 manifests.go:154] [control-plane] wrote static Pod manifest for component "kube-apiserver" to "/etc/kubernetes/manifests/kube-apiserver.yaml"
I0808 18:27:30.786710 126 manifests.go:99] [control-plane] getting StaticPodSpecs
[control-plane] Creating static Pod manifest for "kube-controller-manager"
I0808 18:27:30.786809 126 manifests.go:125] [control-plane] adding volume "ca-certs" for component "kube-controller-manager"
I0808 18:27:30.786818 126 manifests.go:125] [control-plane] adding volume "etc-ca-certificates" for component "kube-controller-manager"
I0808 18:27:30.786821 126 manifests.go:125] [control-plane] adding volume "flexvolume-dir" for component "kube-controller-manager"
I0808 18:27:30.786823 126 manifests.go:125] [control-plane] adding volume "k8s-certs" for component "kube-controller-manager"
I0808 18:27:30.786826 126 manifests.go:125] [control-plane] adding volume "kubeconfig" for component "kube-controller-manager"
I0808 18:27:30.786828 126 manifests.go:125] [control-plane] adding volume "usr-local-share-ca-certificates" for component "kube-controller-manager"
I0808 18:27:30.786830 126 manifests.go:125] [control-plane] adding volume "usr-share-ca-certificates" for component "kube-controller-manager"
I0808 18:27:30.787252 126 manifests.go:154] [control-plane] wrote static Pod manifest for component "kube-controller-manager" to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
I0808 18:27:30.787273 126 manifests.go:99] [control-plane] getting StaticPodSpecs
[control-plane] Creating static Pod manifest for "kube-scheduler"
I0808 18:27:30.787392 126 manifests.go:125] [control-plane] adding volume "kubeconfig" for component "kube-scheduler"
I0808 18:27:30.787617 126 manifests.go:154] [control-plane] wrote static Pod manifest for component "kube-scheduler" to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
I0808 18:27:30.787952 126 local.go:65] [etcd] wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/manifests/etcd.yaml"
I0808 18:27:30.787989 126 waitcontrolplane.go:83] [wait-control-plane] Waiting for the API server to be healthy
I0808 18:27:30.788334 126 loader.go:372] Config loaded from file: /etc/kubernetes/admin.conf
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
I0808 18:27:30.790085 126 round_trippers.go:553] GET https://kind-control-plane:6443/healthz?timeout=10s in 0 milliseconds
✗ Starting control-plane 🕹️
ERROR: failed to create cluster: failed to remove control plane taint: command "docker exec --privileged kind-control-plane kubectl --kubeconfig=/etc/kubernetes/admin.conf taint nodes --all node-role.kubernetes.io/control-plane- node-role.kubernetes.io/master-" failed with error: exit status 1
Command Output: The connection to the server kind-control-plane:6443 was refused - did you specify the right host or port?
Stack Trace:
sigs.k8s.io/kind/pkg/errors.WithStack
sigs.k8s.io/kind/pkg/errors/errors.go:59
sigs.k8s.io/kind/pkg/exec.(*LocalCmd).Run
sigs.k8s.io/kind/pkg/exec/local.go:124
sigs.k8s.io/kind/pkg/cluster/internal/providers/docker.(*nodeCmd).Run
sigs.k8s.io/kind/pkg/cluster/internal/providers/docker/node.go:146
sigs.k8s.io/kind/pkg/cluster/internal/create/actions/kubeadminit.(*action).Execute
sigs.k8s.io/kind/pkg/cluster/internal/create/actions/kubeadminit/init.go:140
sigs.k8s.io/kind/pkg/cluster/internal/create.Cluster
sigs.k8s.io/kind/pkg/cluster/internal/create/create.go:135
sigs.k8s.io/kind/pkg/cluster.(*Provider).Create
sigs.k8s.io/kind/pkg/cluster/provider.go:182
sigs.k8s.io/kind/pkg/cmd/kind/create/cluster.runE
sigs.k8s.io/kind/pkg/cmd/kind/create/cluster/createcluster.go:80
sigs.k8s.io/kind/pkg/cmd/kind/create/cluster.NewCommand.func1
sigs.k8s.io/kind/pkg/cmd/kind/create/cluster/createcluster.go:55
github.com/spf13/cobra.(*Command).execute
github.com/spf13/cobra@v1.4.0/command.go:856
github.com/spf13/cobra.(*Command).ExecuteC
github.com/spf13/cobra@v1.4.0/command.go:974
github.com/spf13/cobra.(*Command).Execute
github.com/spf13/cobra@v1.4.0/command.go:902
sigs.k8s.io/kind/cmd/kind/app.Run
sigs.k8s.io/kind/cmd/kind/app/main.go:53
sigs.k8s.io/kind/cmd/kind/app.Main
sigs.k8s.io/kind/cmd/kind/app/main.go:35
main.main
sigs.k8s.io/kind/main.go:25
runtime.main
runtime/proc.go:250
runtime.goexit
runtime/asm_arm64.s:1263
Environment:
- kind version: (use
kind version
): kind v0.14.0 go1.18.2 linux/arm64 - Kubernetes version: (use
kubectl version
): Client Version: v1.24.3 Kustomize Version: v4.5.4 - Docker version: (use
docker info
):
Client:
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc., 0.8.2+azure-1)
compose: Docker Compose (Docker Inc., 2.9.0+azure-1)
Server:
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 4
Server Version: 20.10.17
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
runc version: v1.1.2-0-ga916309
init version: de40ad0
Security Options:
seccomp
Profile: default
cgroupns
Kernel Version: 5.10.104-linuxkit
Operating System: Docker Desktop
OSType: linux
Architecture: aarch64
CPUs: 5
Total Memory: 14.62GiB
Name: docker-desktop
ID: DWAP:AOR6:N5DU:HCAK:GC35:RRZ6:4YMP:4JVL:UJ66:GKCY:N6RR:VAAL
Docker Root Dir: /var/lib/docker
Debug Mode: false
HTTP Proxy: http.docker.internal:3128
HTTPS Proxy: http.docker.internal:3128
No Proxy: hubproxy.docker.internal
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
hubproxy.docker.internal:5000
127.0.0.0/8
Live Restore Enabled: false
- OS (e.g. from
/etc/os-release
):
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 20 (7 by maintainers)
Tentatively, that’s just patching over one particular symptom of the networking being broken.
Well yes, we can’t very well run a functional cluster with broken networking.
I don’t think that’s reasonable after the API server is up, this is a local API call executed itself on one of the control plane nodes itself. We already have an exponential retry waiting for the api server to be ready in kubeadm. Perhaps one retry, but again, this should not flake, it should be a very cheap local call, if it’s failing, it’s a symptom of the cluster being in a bad state of some sort.
I suggested a possible solution above, but I’d like to understand what / why this is actually broken before I jump on making any changes.
There’s no reason resolving the container names should fail, docker is responsible for this and I’ve
Yes, but it only seems to be failing when the docker socket is mounted when using kind, and it seems to be related to DNS issues, which makes me think mounting the docker socket when creating the cluster is leading to somewhat broken DNS in the cluster, which doesn’t make sense given my understanding of how docker implements DNS, but none of this makes sense … The dns response for the node name should be local from docker and should be quick and reliable ™️
So far we’ve had no reports of this with standard local docker socket without containerizing kind itself of using docker over TCP, though I can’t fathom why those are relevant.
Unfortunately, without a way to replicate this, I’m reliant on you all to identify why docker containers are not reliably able to resolve themselves or what else is making this call fail.
After some investigating. The problem is, the name resolution in the control-plane doesn’t work immediately when the container is started. Adding a sleep before running the remove taint command works as a hacky fix.
IDK why this is the case with our environments. We are both running
docker-from-docker
on Apple M1, not sure how much is coincidence.Thoughts @BenTheElder ?