kind: [docker installed with snap] HA Creation - Error: failed to create cluster: failed to copy certificate ca.crt: exit status 1

What happened: Running kind to create an HA cluster like the one found here (except with 2 control-planes instead of 3)

kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
- role: control-plane
- role: control-plane
- role: worker
- role: worker
- role: worker

What you expected to happen: Cluster to get created and come up

How to reproduce it (as minimally and precisely as possible): kind create cluster --retain --loglevel trace --config "./kind-cluster.yaml" --wait 5m;

Anything else we need to know?: Creating a single control-plane cluster works fine on this machine. Deleted and recreated several times to verify

Debug logging output:

[addons] Applied essential addon: kube-proxy
I0719 17:37:45.225992     142 loader.go:359] Config loaded from file:  /etc/kubernetes/admin.conf
I0719 17:37:45.226805     142 loader.go:359] Config loaded from file:  /etc/kubernetes/admin.conf

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of control-plane nodes by copying certificate authorities 
and service account keys on each node and then running the following as root:

  kubeadm join 172.17.0.3:6443 --token <value withheld> \
    --discovery-token-ca-cert-hash sha256:e8f007ca6d45412c838744e330cb1516774f0dac8593f1588b90b33d3a248a57 \
    --experimental-control-plane          

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 172.17.0.3:6443 --token <value withheld> \
    --discovery-token-ca-cert-hash sha256:e8f007ca6d45412c838744e330cb1516774f0dac8593f1588b90b33d3a248a57  
DEBU[12:37:45] Running: /snap/bin/docker [docker inspect -f {{(index (index .NetworkSettings.Ports "6443/tcp") 0).HostPort}} kind-external-load-balancer] 
DEBU[12:37:45] Running: /snap/bin/docker [docker exec --privileged kind-control-plane cat /etc/kubernetes/admin.conf] 
 ✓ Starting control-plane 🕹️ 
DEBU[12:37:45] Running: /snap/bin/docker [docker exec --privileged kind-control-plane cat /kind/manifests/default-cni.yaml] 
DEBU[12:37:45] Running: /snap/bin/docker [docker exec --privileged -i kind-control-plane kubectl create --kubeconfig=/etc/kubernetes/admin.conf -f -] 
 ✓ Installing CNI 🔌 
DEBU[12:37:46] Running: /snap/bin/docker [docker exec --privileged -i kind-control-plane kubectl --kubeconfig=/etc/kubernetes/admin.conf apply -f -] 
 ✓ Installing StorageClass 💾 
DEBU[12:37:46] Running: /snap/bin/docker [docker exec --privileged kind-control-plane2 mkdir -p /etc/kubernetes/pki/etcd] 
DEBU[12:37:46] Running: /snap/bin/docker [docker cp kind-control-plane:/etc/kubernetes/pki/ca.crt /tmp/864842991/ca.crt] 
 ✗ Joining more control-plane nodes 🎮 
Error: failed to create cluster: failed to copy certificate ca.crt: exit status 1

Environment:

  • kind version: (use kind version): v0.4.0
  • Kubernetes version: (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-21T13:09:06Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-19T16:32:14Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
  • Docker version: (use docker info):
Containers: 5
 Running: 5
 Paused: 0
 Stopped: 0
Images: 19
Server Version: 18.06.1-ce
Storage Driver: aufs
 Root Dir: /var/snap/docker/common/var-lib-docker/aufs
 Backing Filesystem: extfs
 Dirs: 129
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: N/A (expected: 69663f0bd4b60df09991c08812a60108003fa340)
init version: 949e6fa (expected: fec3683)
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 5.0.0-20-generic
Operating System: Ubuntu Core 16
OSType: linux
Architecture: x86_64
CPUs: 64
Total Memory: 62.84GiB
Name: codor
ID: RR2C:GHT4:VNPO:ZXHC:RWW4:YYDG:OPA3:Y53B:WTBZ:23C3:AHLB:UWLN
Docker Root Dir: /var/snap/docker/common/var-lib-docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 53
 Goroutines: 62
 System Time: 2019-07-19T12:47:49.623878746-05:00
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 hub.home.local
 ubuntu:32000
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support
  • OS (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="19.04 (Disco Dingo)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 19.04"
VERSION_ID="19.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=disco
UBUNTU_CODENAME=disco

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 18 (13 by maintainers)

Most upvoted comments

Ah, yep, setting TEMPDIR is super simple so I’ll just set that up in my scripts… thanks.

And yeah, I had looked at the known issues but I was fixating on the error message and the way docker in snap is documented on the Known Issues page it didn’t jump out at me. Maybe it would be helpful to add some detail to the error message in this case, like below? In the current form the error doesn’t mention the temp path (though the debug output does).

Thanks for the quick help on this and feel free to close this issue at your convenience.

                tmpPath := filepath.Join(tmpDir, fileName)
                // copies from bootstrap control plane node to tmp area
                if err := controlPlaneHandle.CopyFrom(containerPath, tmpPath); err != nil {
-                       return errors.Wrapf(err, "failed to copy certificate %s", fileName)
+                       return errors.Wrapf(err, "failed to copy certificate %s:%s from %s to %s", controlPlaneHandle, fileName, containerPath, tmpPath)                                
                }
                // copies from tmp area to joining node
                if err := node.CopyTo(tmpPath, containerPath); err != nil {
-                       return errors.Wrapf(err, "failed to copy certificate %s", fileName)
+                       return errors.Wrapf(err, "failed to copy certificate %s:%s from %s to %s", node, fileName, tmpPath, containerPath)                                              
                }
        }

I tracked it down but will need some guidance on how to fix it correctly if you’d like a pull request. As mentioned, my go skills are near non-existent so if it’s easier for one of you to make this change, I won’t be offended, hopefully the below helps…

TL;DR: I’m running docker from a snap so docker doesn’t have access to the host’s /tmp directory that kind uses to copy around certs, etc… so the docker cp <container>:/... /tmp/... fails. It looks like kind needs to detect if docker is installed as a snap and if so, use a different temp directory.

I found some helpful info about snaps and directories here, including this command: snap run --shell docker.docker to get a shell where you can get the SNAP variables with env | grep SNAP.

I hackedskillfully changed the temp directory as follows and was able to spin up a cluster with 2 or 3 control planes (though I’m guessing the 2 control planes aren’t really HA since etcd uses raft):

diff --git a/pkg/cluster/internal/create/actions/kubeadmjoin/join.go b/pkg/cluster/internal/create/actions/kubeadmjoin/join.go                                                          
index 5283d98..b1b26e2 100644
--- a/pkg/cluster/internal/create/actions/kubeadmjoin/join.go
+++ b/pkg/cluster/internal/create/actions/kubeadmjoin/join.go
@@ -31,7 +31,6 @@ import (
        "sigs.k8s.io/kind/pkg/cluster/nodes"
        "sigs.k8s.io/kind/pkg/concurrent"
        "sigs.k8s.io/kind/pkg/exec"
-       "sigs.k8s.io/kind/pkg/fs"
 )

 // Action implements action for creating the kubeadm join
@@ -145,13 +144,9 @@ func runKubeadmJoinControlPlane(

        // creates a temporary folder on the host that should acts as a transit area
        // for moving necessary cluster certificates
-       tmpDir, err := fs.TempDir("", "")
-       if err != nil {
-               return err
-       }
-       defer os.RemoveAll(tmpDir)
+       var tmpDir = "/home/jon/snap/docker/current/tmp"

-       err = os.MkdirAll(filepath.Join(tmpDir, "/etcd"), os.ModePerm)
+       var err = os.MkdirAll(filepath.Join(tmpDir, "/etcd"), os.ModePerm)
        if err != nil {
                return err
        }
@@ -170,11 +165,11 @@ func runKubeadmJoinControlPlane(
                tmpPath := filepath.Join(tmpDir, fileName)
                // copies from bootstrap control plane node to tmp area
                if err := controlPlaneHandle.CopyFrom(containerPath, tmpPath); err != nil {
-                       return errors.Wrapf(err, "failed to copy certificate %s", fileName)
+                       return errors.Wrapf(err, "failed to copy certificate %s:%s from %s to %s", controlPlaneHandle, fileName, containerPath, tmpPath)                                
                }
                // copies from tmp area to joining node
                if err := node.CopyTo(tmpPath, containerPath); err != nil {
-                       return errors.Wrapf(err, "failed to copy certificate %s", fileName)
+                       return errors.Wrapf(err, "failed to copy certificate %s:%s from %s to %s", node, fileName, tmpPath, containerPath)                                              
                }
        }

I am pretty sure 2 cp 3w kind cluster worked for me recently. It should create fine and work, but etcd cannot make decisions. You need 3 cp for that. On Jul 20, 2019 00:13, “Jon Stelly” notifications@github.com wrote:

Ah, I hadn’t seen that in the documentation yet… this was my first attempt to spin up an HA cluster. I’ll try it again with 3 control planes like the example and make sure that works.

Assuming that’s my problem, it would be nice to validate the configuration and throw a friendly error when people try to do this (assuming I’m not the only one to ever do it). I’m not really a go guy but I may see if I can figure something out and submit a pull request as penance for my not reading the documentation.

Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kubernetes-sigs/kind/issues/724?email_source=notifications&email_token=AACRATABVKR36DHGBTHRYPDQAIU7VA5CNFSM4IFJOC32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2MY5ZI#issuecomment-513380069, or mute the thread https://github.com/notifications/unsubscribe-auth/AACRATG4HVUSCDTM26QPV7TQAIU7VANCNFSM4IFJOC3Q .