kind: [docker installed with snap] HA Creation - Error: failed to create cluster: failed to copy certificate ca.crt: exit status 1
What happened: Running kind to create an HA cluster like the one found here (except with 2 control-planes instead of 3)
kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
- role: control-plane
- role: control-plane
- role: worker
- role: worker
- role: worker
What you expected to happen: Cluster to get created and come up
How to reproduce it (as minimally and precisely as possible):
kind create cluster --retain --loglevel trace --config "./kind-cluster.yaml" --wait 5m;
Anything else we need to know?: Creating a single control-plane cluster works fine on this machine. Deleted and recreated several times to verify
Debug logging output:
[addons] Applied essential addon: kube-proxy
I0719 17:37:45.225992 142 loader.go:359] Config loaded from file: /etc/kubernetes/admin.conf
I0719 17:37:45.226805 142 loader.go:359] Config loaded from file: /etc/kubernetes/admin.conf
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:
kubeadm join 172.17.0.3:6443 --token <value withheld> \
--discovery-token-ca-cert-hash sha256:e8f007ca6d45412c838744e330cb1516774f0dac8593f1588b90b33d3a248a57 \
--experimental-control-plane
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 172.17.0.3:6443 --token <value withheld> \
--discovery-token-ca-cert-hash sha256:e8f007ca6d45412c838744e330cb1516774f0dac8593f1588b90b33d3a248a57
DEBU[12:37:45] Running: /snap/bin/docker [docker inspect -f {{(index (index .NetworkSettings.Ports "6443/tcp") 0).HostPort}} kind-external-load-balancer]
DEBU[12:37:45] Running: /snap/bin/docker [docker exec --privileged kind-control-plane cat /etc/kubernetes/admin.conf]
✓ Starting control-plane 🕹️
DEBU[12:37:45] Running: /snap/bin/docker [docker exec --privileged kind-control-plane cat /kind/manifests/default-cni.yaml]
DEBU[12:37:45] Running: /snap/bin/docker [docker exec --privileged -i kind-control-plane kubectl create --kubeconfig=/etc/kubernetes/admin.conf -f -]
✓ Installing CNI 🔌
DEBU[12:37:46] Running: /snap/bin/docker [docker exec --privileged -i kind-control-plane kubectl --kubeconfig=/etc/kubernetes/admin.conf apply -f -]
✓ Installing StorageClass 💾
DEBU[12:37:46] Running: /snap/bin/docker [docker exec --privileged kind-control-plane2 mkdir -p /etc/kubernetes/pki/etcd]
DEBU[12:37:46] Running: /snap/bin/docker [docker cp kind-control-plane:/etc/kubernetes/pki/ca.crt /tmp/864842991/ca.crt]
✗ Joining more control-plane nodes 🎮
Error: failed to create cluster: failed to copy certificate ca.crt: exit status 1
Environment:
- kind version: (use
kind version
): v0.4.0 - Kubernetes version: (use
kubectl version
):
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-21T13:09:06Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-19T16:32:14Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
- Docker version: (use
docker info
):
Containers: 5
Running: 5
Paused: 0
Stopped: 0
Images: 19
Server Version: 18.06.1-ce
Storage Driver: aufs
Root Dir: /var/snap/docker/common/var-lib-docker/aufs
Backing Filesystem: extfs
Dirs: 129
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: N/A (expected: 69663f0bd4b60df09991c08812a60108003fa340)
init version: 949e6fa (expected: fec3683)
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.0.0-20-generic
Operating System: Ubuntu Core 16
OSType: linux
Architecture: x86_64
CPUs: 64
Total Memory: 62.84GiB
Name: codor
ID: RR2C:GHT4:VNPO:ZXHC:RWW4:YYDG:OPA3:Y53B:WTBZ:23C3:AHLB:UWLN
Docker Root Dir: /var/snap/docker/common/var-lib-docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 53
Goroutines: 62
System Time: 2019-07-19T12:47:49.623878746-05:00
EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
hub.home.local
ubuntu:32000
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
- OS (e.g. from
/etc/os-release
):
NAME="Ubuntu"
VERSION="19.04 (Disco Dingo)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 19.04"
VERSION_ID="19.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=disco
UBUNTU_CODENAME=disco
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 18 (13 by maintainers)
Ah, yep, setting TEMPDIR is super simple so I’ll just set that up in my scripts… thanks.
And yeah, I had looked at the known issues but I was fixating on the error message and the way docker in snap is documented on the Known Issues page it didn’t jump out at me. Maybe it would be helpful to add some detail to the error message in this case, like below? In the current form the error doesn’t mention the temp path (though the debug output does).
Thanks for the quick help on this and feel free to close this issue at your convenience.
I tracked it down but will need some guidance on how to fix it correctly if you’d like a pull request. As mentioned, my go skills are near non-existent so if it’s easier for one of you to make this change, I won’t be offended, hopefully the below helps…
TL;DR: I’m running docker from a snap so docker doesn’t have access to the host’s /tmp directory that kind uses to copy around certs, etc… so the
docker cp <container>:/... /tmp/...
fails. It looks like kind needs to detect if docker is installed as a snap and if so, use a different temp directory.I found some helpful info about snaps and directories here, including this command:
snap run --shell docker.docker
to get a shell where you can get the SNAP variables withenv | grep SNAP
.I
hackedskillfully changed the temp directory as follows and was able to spin up a cluster with 2 or 3 control planes (though I’m guessing the 2 control planes aren’t really HA since etcd uses raft):I am pretty sure 2 cp 3w kind cluster worked for me recently. It should create fine and work, but etcd cannot make decisions. You need 3 cp for that. On Jul 20, 2019 00:13, “Jon Stelly” notifications@github.com wrote: