kind: Failed to get IPs for node: kind-control-plane: file should only be one line, got 2 lines

What happened: I’m seeing the same issue as described in #1149. Here is what I’m seeing:


+ kind create cluster --wait 30m --image kindest/node:v1.14.9@sha256:bdd3731588fa3ce8f66c7c22f25351362428964b6bca13048659f68b9e665b72
--
  | Creating cluster "kind" ...
  | ✓ Ensuring node image (kindest/node:v1.14.9) 🖼
  | ✓ Preparing nodes 📦
  | ✗ Writing configuration 📜
  | ERROR: failed to create cluster: failed to get IPs for node: kind-control-plane: file should only be one line, got 2 lines

What you expected to happen: The cluster starts up normally.

How to reproduce it (as minimally and precisely as possible): I haven’t been able to get it to repro reliably, but once it happens, it’s stuck that way.

Anything else we need to know?: The solution proposed in #1149 of blowing away the .docker directory doesn’t work for us. It’s occurring during CI, so needing to babysit the agents and kill them once this happens is not an option. It seems to be a problem with the construction of the base image

Environment:

  • kind version: (use kind version): kind v0.6.0 go1.13.4 linux/amd64
  • Kubernetes version: (use kubectl version): Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:54Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
  • Docker version: (use docker info):
Client:
--
  | Debug Mode: false
  |  
  | Server:
  | Containers: 0
  | Running: 0
  | Paused: 0
  | Stopped: 0
  | Images: 14
  | Server Version: 18.09.6
  | Storage Driver: overlay2
  | Backing Filesystem: extfs
  | Supports d_type: true
  | Native Overlay Diff: true
  | Logging Driver: json-file
  | Cgroup Driver: cgroupfs
  | Plugins:
  | Volume: local
  | Network: bridge host macvlan null overlay
  | Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
  | Swarm: inactive
  | Runtimes: runc
  | Default Runtime: runc
  | Init Binary: docker-init
  | containerd version: b34a5c8af56e510852c35414db4c1f4fa6172339
  | runc version: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
  | init version: fec3683
  | Security Options:
  | apparmor
  | seccomp
  | Profile: default
  | Kernel Version: 4.15.0-1052-gcp
  | Operating System: Ubuntu 16.04.6 LTS
  | OSType: linux
  | Architecture: x86_64
  | CPUs: 16
  | Total Memory: 58.97GiB
  | Name: bk-6dd1738aeec68778a4e320f52a0193781e61e8d2-3vkq
  | ID: NWBL:2PDV:FRB2:TXXZ:4LS5:FER2:FOLR:QG5T:4QF2:FEQJ:7ERT:6MTY
  | Docker Root Dir: /var/lib/docker
  | Debug Mode: false
  | Registry: https://index.docker.io/v1/
  | Labels:
  | Experimental: false
  | Insecure Registries:
  | 127.0.0.0/8
  | Live Restore Enabled: false
  | Product License: Community Engine
  • OS (e.g. from /etc/os-release):
NAME="Ubuntu"
--
  | VERSION="16.04.6 LTS (Xenial Xerus)"
  | ID=ubuntu
  | ID_LIKE=debian
  | PRETTY_NAME="Ubuntu 16.04.6 LTS"
  | VERSION_ID="16.04"
  | HOME_URL="http://www.ubuntu.com/"
  | SUPPORT_URL="http://help.ubuntu.com/"
  | BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
  | VERSION_CODENAME=xenial
  | UBUNTU_CODENAME=xenial

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 18 (12 by maintainers)

Most upvoted comments

We’ve since found that there are circumstances where docker spits out an error on ~all commands (e.g. due to bad ownership of docker config), you should check that, but we’ve already filed https://github.com/kubernetes-sigs/kind/pull/1415 which is merged into master to not read from stderr.

@aojea I’ve reverted back to v0.6.0 and added the v7 flag and have been running builds all day. I haven’t hit the issue yet and I don’t think we’ve hit it since updating to v0.7.0. Unfortunately this might be a case of “No Repro”, but I’ll keep trying and let you know if I hit anything. Thanks all for jumping on this.

FWIW I have this error when using a kind v0.7.0 in docker:19.03.8-dind image in a GitHub Actions workflow which is using docker version:

Client:
 Version:           3.0.10+azure
 API version:       1.40
 Go version:        go1.12.14
 Git commit:        99c5edceb48d64c1aa5d09b8c9c499d431d98bb9
 Built:             Tue Nov  5 00:55:15 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          3.0.10+azure
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.14
  Git commit:       ea84732a77
  Built:            Fri Jan 24 20:08:11 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.2.11
  GitCommit:        f772c10a585ced6be8f86e8c58c2b998412dd963
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

… installing and running kind in the workflow steps directly and the cluster is created without any issues. No issues when running the kind in dind environment on my PC (Ubuntu 18.04 with docker 19.03.7) so I’m suspecting the issue is with the hosts docker version.

going to close for now as not reproducible but please /reopen with more information if you spot this again!

More specifically, we pass a format to docker that includes no newlines, so multiple lines should not be possible under normal circumstances.

@aojea it’s NOT a docker inspect error in that the command did succeed and exit 0. The output is just unexpected. There shouldn’t be more lines.