k3d: [BUG] Error: Failed waiting for log message 'Wrote kubeconfig' from node 'k3d-testing-server-1'

What did you do?

  • How was the cluster created?

    • k3d cluster create testing --servers 2 --agents 3
  • What did you do afterwards?

    • wait & observe

What did you expect to happen?

Some info messages similar to the demo video on k3d.io and success.

Screenshots or terminal output Recording on Asciinema

tobtobxx@MacBook-Air ~ % k3d cluster create testing --servers 2 --agents 3
INFO[0000] Created network 'k3d-testing'                
INFO[0000] Created volume 'k3d-testing-images'          
INFO[0000] Creating initializing server node            
INFO[0000] Creating node 'k3d-testing-server-0'         
INFO[0012] Creating node 'k3d-testing-server-1'         
INFO[0012] Creating node 'k3d-testing-agent-0'          
INFO[0015] Creating node 'k3d-testing-agent-1'          
INFO[0017] Creating node 'k3d-testing-agent-2'          
INFO[0020] Creating LoadBalancer 'k3d-testing-serverlb' 
ERRO[0021] Failed waiting for log message 'Wrote kubeconfig' from node 'k3d-testing-server-1' 
ERRO[0021] Failed to start container                    
ERRO[0021] Failed to create node 'k3d-testing-serverlb' 
ERRO[0021] Failed to create loadbalancer                
ERRO[0021] error during connect: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/01fb21fd8a32b0ae561ea51dfedd44f3a3bb477307ff36fdb62d240fceb32b13/start": context canceled 
ERRO[0021] Failed to create cluster >>> Rolling Back    
INFO[0021] Deleting cluster 'testing'                   
INFO[0022] Deleted k3d-testing-server-0                 
INFO[0022] Deleted k3d-testing-server-1                 
INFO[0022] Deleted k3d-testing-agent-0                  
INFO[0023] Deleted k3d-testing-agent-1                  
INFO[0023] Deleted k3d-testing-agent-2                  
INFO[0023] Deleted k3d-testing-serverlb                 
INFO[0023] Deleting cluster network '84ac3cec99c71cfe2ac12f097bf9f6194bfd20594fa2539b07f0aec6fcede58e' 
FATA[0023] Cluster creation FAILED, all changes have been rolled back! 

Which OS & Architecture? MacOS Catalina (10.15.5)

Which version of k3d?

tobtobxx@MacBook-Air ~ % k3d version
k3d version v3.0.0
k3s version latest (default)

(installed via homebrew)

Which version of docker?

tobtobxx@MacBook-Air ~ % k3d version
k3d version v3.0.0
k3s version latest (default)
tobtobxx@MacBook-Air ~ % docker version
Client: Docker Engine - Community
 Azure integration  0.1.7
 Version:           19.03.12
 API version:       1.40
 Go version:        go1.13.10
 Git commit:        48a66213fe
 Built:             Mon Jun 22 15:41:33 2020
 OS/Arch:           darwin/amd64
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          19.03.12
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.10
  Git commit:       48a66213fe
  Built:            Mon Jun 22 15:49:27 2020
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          v1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 5
  • Comments: 38 (28 by maintainers)

Most upvoted comments

I could imagine, that the environment comes into play here, as the CPU maxes out even when starting only a single server. SO I could imagine, that the leader election fails due to resource limitations, making the whole cluster crash đŸ€”

This is a testable statement. So I went ahead and tested it. I have some spare credits on vultr, thus I used this provider.

I made the following install script:

#!/bin/sh

snap install docker && \
curl -s https://raw.githubusercontent.com/rancher/k3d/main/install.sh | bash && \
k3d cluster create testing --servers 3 --agents 3 && \
shutdown -h now

Then I made the following VPSes and specified above as a start script:

  • 1CPU, 1024MB – ❌ (was still up)
  • 1CPU, 2048MB – ❌
  • 2CPU, 4096MB – ✅ (shut down)
  • 4CPU, 8192MB – ✅
  • 6CPU, 16384MB – ✅

This assertion might be correct, and if I can make another thesis on top: One CPU isn’t enough.

Test run on @rogeliodh server:

I figured, that it will also fail sometimes, if you only create a single-server cluster (with the --cluster-init flag). However, I found that

  • ivps231673:~$ k3d cluster create test --verbose --image rancher/k3s:v1.17.6-k3s1 --no-rollback --k3s-server-arg '--cluster-init' worked, while
  • k3d cluster create test --verbose --image rancher/k3s:v1.17.6-k3s1 --no-rollback --servers 3 fails.

I could imagine, that the environment comes into play here, as the CPU maxes out even when starting only a single server. SO I could imagine, that the leader election fails due to resource limitations, making the whole cluster crash đŸ€”

@sdghchj If you could test the above scripts but don’t install Docker via snap, but rather their PPA or something other native. Run the tests again and report if (what) you have different. That would be helpful!

@Filius-Patris I seriously have no clue, why you’re experiencing this. I just tried your setup step by step and I don’t experience this problem on the first try (without rebooting the droplet). Here’s my full output: https://pastebin.com/BvA8xmXc

EDIT: just installed kubectl afterwards to verify that the cluster is actually up:

root@ubuntu-s-4vcpu-8gb-fra1-01:~# kubectl get nodes
NAME                   STATUS   ROLES    AGE     VERSION
k3d-testing-server-0   Ready    master   3m8s    v1.18.6+k3s1
k3d-testing-agent-0    Ready    <none>   3m1s    v1.18.6+k3s1
k3d-testing-agent-1    Ready    <none>   3m      v1.18.6+k3s1
k3d-testing-agent-2    Ready    <none>   2m58s   v1.18.6+k3s1
k3d-testing-server-1   Ready    master   2m49s   v1.18.6+k3s1
k3d-testing-server-2   Ready    master   2m45s   v1.18.6+k3s1

I’ve assigned Docker 4 CPUs and 8GB RAM and I get this error on 3 or more servers (--servers 3).

SO I could imagine, that the leader election fails due to resource limitations, making the whole cluster crash

@iwilltry42 If you are right with your assumption, this is a bug in k3s isn’t it? thinking Because k3s was designed exactly for edge cases and shouldn’t crash when only limited resources are available. Or is it not related to k3s and merely a docker issue?

You’re right, that k3s shouldn’t have an issue with limited resources, so I would assume that it’s the additional layer (docker) here that introduces those issues. Though I cannot verify this right now


Off-topic: @iwilltry42 Wie es scheint arbeitest du fĂŒr @trivago aber maintainst ein Projekt von @rancher. Machst du das ehrenamtlich oder hast du oder trivago vertragliche Vereinbarungen mit rancher? Schöne GrĂŒĂŸe aus Australien. grinning

Das ist richtig 😁 k3d habe ich als mein eigenes Projekt gestartet (inspiriert durch ein Problem bei meiner Arbeit bei trivago). Das wurde dann auf GitHub durch Rancher ĂŒbernommen, damit es nĂ€her beim “Schwesterprojekt” k3s ist und mehr Reichweite erlangt. Das habe ich eine ganze Zeit lang privat gemacht, aber mittlerweile unterstĂŒtzt mich Rancher in dieser Rolle und zukĂŒnftig werde ich meine Arbeit als Freelancer fortsetzen. k3d ist und bleibt aber ein Community-Projekt 👍 Schöne GrĂŒĂŸe zurĂŒck rund um die Welt 😉 Du kannst mich gerne ĂŒber Twitter o.Ă€. anschreiben, wenn du mehr wissen möchtest, sodass wir nicht die Issues hier zuspammen 😁

Another piece of information btw:

The docker daemon logs (or in my case, terminal output, I started it manually) show an error in the docker engine:

...
INFO[2020-09-10T20:21:46.246721265Z] API listen on /var/run/docker.sock           
INFO[2020-09-10T20:23:20.015412014Z] ignoring event                                module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
ERRO[2020-09-10T20:23:20.359166510Z] Not continuing with pull after error: context canceled 
INFO[2020-09-10T20:23:20.644963854Z] ignoring event                                module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
INFO[2020-09-10T20:23:20.923477965Z] ignoring event                                module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
...

Can’t tell if this is cause or symptom, but I figured documenting doesn’t hurt.

Oh, btw, I just realized you can run the cluster create command with --verbose. Here the relevant parts:

INFO[0026] Creating LoadBalancer 'k3d-testing-serverlb' 
DEBU[0026] Creating node from spec
&{Name:k3d-testing-serverlb Role:loadbalancer Image:docker.io/rancher/k3d-proxy:v3.0.0 Volumes:[] Env:[SERVERS=k3d-testing-server-0,k3d-testing-server-1,k3d-testing-server-2 PORTS=6443] Cmd:[] Args:[] Ports:[0.0.0.0:52151:6443/tcp] Restart:false Labels:map[app:k3d k3d.cluster:testing] Network:1bab04850309e79f344f7a08246c12395eb95cc8e63ed9dc3371caf53d09e248 ServerOpts:{IsInit:false ExposeAPI:{Host: HostIP: Port:}} AgentOpts:{}} 
DEBU[0026] Volumes: []                                  
DEBU[0027] Creating docker container with translated config
&{ContainerConfig:{Hostname:k3d-testing-serverlb Domainname: User: AttachStdin:false AttachStdout:false AttachStderr:false ExposedPorts:map[6443/tcp:{}] Tty:false OpenStdin:false StdinOnce:false Env:[SERVERS=k3d-testing-server-0,k3d-testing-server-1,k3d-testing-server-2 PORTS=6443 K3S_KUBECONFIG_OUTPUT=/output/kubeconfig.yaml] Cmd:[] Healthcheck:<nil> ArgsEscaped:false Image:docker.io/rancher/k3d-proxy:v3.0.0 Volumes:map[] WorkingDir: Entrypoint:[] NetworkDisabled:false MacAddress: OnBuild:[] Labels:map[app:k3d k3d.cluster:testing k3d.role:loadbalancer] StopSignal: StopTimeout:<nil> Shell:[]} HostConfig:{Binds:[] ContainerIDFile: LogConfig:{Type: Config:map[]} NetworkMode: PortBindings:map[6443/tcp:[{HostIP:0.0.0.0 HostPort:52151}]] RestartPolicy:{Name: MaximumRetryCount:0} AutoRemove:false VolumeDriver: VolumesFrom:[] CapAdd:[] CapDrop:[] Capabilities:[] CgroupnsMode: DNS:[] DNSOptions:[] DNSSearch:[] ExtraHosts:[] GroupAdd:[] IpcMode: Cgroup: Links:[] OomScoreAdj:0 PidMode: Privileged:true PublishAllPorts:false ReadonlyRootfs:false SecurityOpt:[] StorageOpt:map[] Tmpfs:map[/run: /var/run:] UTSMode: UsernsMode: ShmSize:0 Sysctls:map[] Runtime: ConsoleSize:[0 0] Isolation: Resources:{CPUShares:0 Memory:0 NanoCPUs:0 CgroupParent: BlkioWeight:0 BlkioWeightDevice:[] BlkioDeviceReadBps:[] BlkioDeviceWriteBps:[] BlkioDeviceReadIOps:[] BlkioDeviceWriteIOps:[] CPUPeriod:0 CPUQuota:0 CPURealtimePeriod:0 CPURealtimeRuntime:0 CpusetCpus: CpusetMems: Devices:[] DeviceCgroupRules:[] DeviceRequests:[] KernelMemory:0 KernelMemoryTCP:0 MemoryReservation:0 MemorySwap:0 MemorySwappiness:<nil> OomKillDisable:<nil> PidsLimit:<nil> Ulimits:[] CPUCount:0 CPUPercent:0 IOMaximumIOps:0 IOMaximumBandwidth:0} Mounts:[] MaskedPaths:[] ReadonlyPaths:[] Init:0xc000580790} NetworkingConfig:{EndpointsConfig:map[1bab04850309e79f344f7a08246c12395eb95cc8e63ed9dc3371caf53d09e248:0xc0003ba240]}} 
ERRO[0027] Failed waiting for log message 'Wrote kubeconfig' from node 'k3d-testing-server-2' 
DEBU[0027] FIXME: Got an status-code for which error does not match any expected type!!!: -1  module=api status_code=-1
DEBU[0027] FIXME: Got an status-code for which error does not match any expected type!!!: -1  module=api status_code=-1
ERRO[0027] Failed to pull image 'docker.io/rancher/k3d-proxy:v3.0.0' 
ERRO[0027] Failed to create container 'k3d-testing-serverlb' 
ERRO[0027] Failed to create node 'k3d-testing-serverlb' 
ERRO[0027] Failed to create loadbalancer                
ERRO[0027] Failed to get logs from node 'k3d-testing-server-1' (container '49cdc67b0b25bb7f8795df2f3b10cfc4f6ca57c4a1794dc53d10094b6b6efdaf') 
ERRO[0027] Failed waiting for log message 'Wrote kubeconfig' from node 'k3d-testing-server-1' 
ERRO[0027] error during connect: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.40/images/create?fromImage=rancher%2Fk3d-proxy&tag=v3.0.0": context canceled 
ERRO[0027] Failed to create cluster >>> Rolling Back    
INFO[0027] Deleting cluster 'testing'                   
DEBU[0027] Cluster Details: &{Name:testing Network:{Name:1bab04850309e79f344f7a08246c12395eb95cc8e63ed9dc3371caf53d09e248 External:false} Token:bnCpprfSgmGyeiHthiER Nodes:[0xc000457680 0xc0004577a0 0xc0004578c0 0xc0004579e0 0xc000457b00 0xc000457c20 0xc0007fe6c0] InitNode:0xc000457680 ExternalDatastore:{Endpoint: CAFile: CertFile: KeyFile: Network:} CreateClusterOpts:0xc000375630 ExposeAPI:{Host:0.0.0.0 HostIP:0.0.0.0 Port:52151} ServerLoadBalancer:0xc000457560 ImageVolume:} 
INFO[0028] Deleted k3d-testing-server-0                 
INFO[0028] Deleted k3d-testing-server-1                 
INFO[0028] Deleted k3d-testing-server-2                 
INFO[0029] Deleted k3d-testing-agent-0                  
INFO[0029] Deleted k3d-testing-agent-1                  
INFO[0030] Deleted k3d-testing-agent-2                  
ERRO[0030] Failed to delete container 'k3d-testing-serverlb' 
WARN[0030] Failed to delete node 'k3d-testing-serverlb': Try to delete it manually 
INFO[0030] Deleting cluster network '1bab04850309e79f344f7a08246c12395eb95cc8e63ed9dc3371caf53d09e248' 
ERRO[0030] Failed to delete 1 nodes: Try to delete them manually 
FATA[0030] Cluster creation FAILED, also FAILED to rollback changes!