kind: docker readiness timeout may be too low on overloaded machines

sometimes there has a panic, when using kind to create a HA cluster.

[zhang@localhost kind]$ kind create cluster  --config kind-config-ha.yaml           
Creating cluster "kind" ...
 βœ“ Ensuring node image (kindest/node:v1.13.4) πŸ–Ό
 βœ— Preparing nodes πŸ“¦πŸ“¦πŸ“¦πŸ“¦πŸ“¦πŸ“¦
ERRO[17:36:55] timed out waiting for docker to be ready on node kind-control-plane                       
panic: send on closed channel

goroutine 11 [running]:
sigs.k8s.io/kind/pkg/cluster/internal/create.createNodeContainers.func1(0xc00008a300, 0xc000397300, 0x1d,
0xc0002c0000, 0xc000117bc0)
        /home/zhang/go/src/sigs.k8s.io/kind/pkg/cluster/internal/create/nodes.go:114 +0x9c             
created by sigs.k8s.io/kind/pkg/cluster/internal/create.createNodeContainers                             
        /home/zhang/go/src/sigs.k8s.io/kind/pkg/cluster/internal/create/nodes.go:105 +0x2e0

maybe the channel has been closed.

https://github.com/kubernetes-sigs/kind/blob/master/pkg/cluster/internal/create/nodes.go#L96-L135

/cc @BenTheElder

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 18 (18 by maintainers)

Commits related to this issue

Most upvoted comments

perhaps let’s bump it to 60s for now, add a TODO, and come back to it then πŸ€”

hah I agree with you. Today I may test its time on different configured machines, I hope to provide some advice.

-1 to more flags! πŸ˜›

setting this in either is going to be brittle since the value is not portable. the only reason we have a bound at all is to avoid indefinite hang, at some point this value will become quite unreasonable πŸ˜… (EG 1 hour would be pretty ridiculous)

do you experience this with single node clusters?

[zhang@localhost ~]$ time kind create cluster --name moelove
Creating cluster "moelove" ...
 βœ“ Ensuring node image (kindest/node:v1.13.4) πŸ–Ό
 βœ“ Preparing nodes πŸ“¦ 
 βœ“ Creating kubeadm config πŸ“œ 
 βœ“ Starting control-plane πŸ•ΉοΈ 
Cluster creation complete. You can now use the cluster with:

export KUBECONFIG="$(kind get kubeconfig-path --name="moelove")"
kubectl cluster-info

real    0m52.172s
user    0m0.729s
sys     0m0.564s

In fact, my intention to mention the issue is the problem of panic, not the timeout, although this is also a problem. 😸

Should not panic anymore at least. Timeout still needs thought / changes.

What values are working for your usage?

can you please share your config file and the system specification?

The config file:


kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
- role: control-plane
- role: control-plane
- role: control-plane
- role: worker
- role: worker

The system info:

[zhang@localhost kind]$ uname -a
Linux localhost 3.10.0-957.5.1.el7.x86_64 #1 SMP Fri Feb 1 14:54:57 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[zhang@localhost kind]$ cat /etc/redhat-release 
CentOS Linux release 7.5.1804 (Core)
[zhang@localhost kind]$ free -h
              total        used        free      shared  buff/cache   available
Mem:           7.5G        621M        1.9G        121M        5.0G        6.3G
Swap:          7.7G        153M        7.6G
[zhang@localhost kind]$ uptime 
 18:28:02 up 34 days,  4:18,  1 user,  load average: 0.00, 0.05, 0.15
[zhang@localhost kind]$ docker version
Client:
 Version:           18.09.2
 API version:       1.39
 Go version:        go1.10.6
 Git commit:        6247962
 Built:             Sun Feb 10 04:13:27 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.2
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       6247962
  Built:            Sun Feb 10 03:47:25 2019
  OS/Arch:          linux/amd64
  Experimental:     false