kind: Cluster doesn't restart when docker restarts

When docker restarts or stop/start (for any reason), the kind node containers remain stopped and aren’t restarted properly. When I tried to run docker restart <node container id> the cluster didn’t start either.

The only solution seems to recreate the cluster at this point.

/kind bug

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 84
Comments: 97 (58 by maintainers)

Commits related to this issue

Merge pull request #148 from nehaLohia27/fix_ci fix kubeconfig 2 — committed to yankay/kind by k8s-ci-robot 3 years ago
[EOS-11510] Refinar Permisos AWS Unmanaged (#148) * Add AWs Unmanaged Perms * Add DeleteInternetGateway * Remove comments * Add info — committed to stg-0/kind by iamjanr a year ago

Most upvoted comments

👍 for the new restart cluster command!

+73

hjacobs on Mar 23, 2019

I’ve been using kind locally (using Docker for Mac) and when docker reboots or stops, the cluster has to be deleted and recreated. I’m perfectly fine with it, just thought this might be something we should look into.

The use case was to keep the cluster around even after I reboot or shut down my machine / docker.

+49

vincepri on Dec 5, 2018

for the impatient, this seems to work for now after docker restarts:

docker start kind-1-control-plane && docker exec kind-1-control-plane sh -c 'mount -o remount,ro /sys; kill -USR1 1'

FixMounts has a few mount --make-shared, not sure if they are really required.

+47

clkao on Feb 17, 2019

v0.8.0 will ship after follow up for this, I’m re-targeting for monday ideally.

+38

BenTheElder on Apr 25, 2020

it’s coming! the next PR is out 😃

+24

BenTheElder on Mar 18, 2020

Hi, I am working on this but I’ve had to spend the past week oncall for the Kubernetes test-infra and handling a few high impact Kubernetes testing bugs #1248 #1331 …

Please use github’s native +1 mechanism to +1 so we can use the issue for discussion of the solution:

+17

BenTheElder on Feb 14, 2020

What is the use case for this?

+1 to this question.

docker restart in this case will act like a power grid restart on a bunch of bare metal machines. so while those bare metal machines might come back up, not sure if we want to support this for kind. for that to work i think some sort of state has to be stored somewhere…

+11

neolit123 on Dec 5, 2018

FTR: The latest releases should have clusters that come back up on docker restart, always, including multi node.

+10

BenTheElder on Oct 3, 2022

I will sent a PR next week. (

tao12345666333 on Mar 23, 2019

/lifecycle active thanks @tao12345666333

neolit123 on Dec 13, 2018

the “last” PR is now out. it needs some more cleanup and more validation, but the basic implementation is more or less good enough now and in an open PR.

this will be ready before we ship kind v0.8.0

BenTheElder on Apr 24, 2020

The restart cluster command will make kind the top of his class. Without it, it’s a painful process to build test envs upon since restarting the whole process means re-downloading all the docker images from scratch, a lengthy process.

amwais on Mar 23, 2019

Next batch of PRs will be going out shortly. I had some other disruptions again (especially with kubernetes v1.18 code freeze PR reviews…), but I believe I have a workable approach for docker based nodes (which all current users are using, won’t work with podman though!) inbound.

BenTheElder on Mar 15, 2020

restart seems it fits well with the other create/delete cluster commands, what’s the idea you had? Wondering if it actually fits the restart word or it’s something more.

vincepri on Dec 6, 2018

first small required fix https://github.com/kubernetes-sigs/kind/pull/1353

BenTheElder on Feb 21, 2020

@tao12345666333 I think ephemeral clusters are good but not in 100% of use cases. If you organise for example a workshop or a meetup, you would like to prepare everything in advance (some days before) and at the moment of the event, just spin up the cluster and that’s it. Like I did many times with minikube. Another example would be doing experiments. If I’m working for example with Calico, Cilium, Istio or else I don’t want to deploy them every time I need to run a simple test. It would be way easier to have many clusters and a time and spin up which you need and then stop it again. Do my samples make sense?

bygui86 on Sep 2, 2019

It should roughly be:

list the containers matching the cluster name
for each …
- docker {re}start
- run the pre-boot fixes (mounts)
- signal the entrypoint to boot
optionally --wait for the control-plane like create

It’ll look similar to create but skip a lot of steps and swap creating the containers for list & {re}start

We can also eventually have a very similar command like kind restart node

BenTheElder on Dec 6, 2018

As i understand it, the project has never supported multi-node clusters (only single nodes) but the documentation should really clearly specify this so that we aren’t spending a lot of time doing complex multi-node work to find it doesn’t survive a reboot or restart of docker. https://github.com/kubernetes-sigs/kind/issues/1689#issuecomment-889607041

mohclips on Apr 21, 2022

@BenTheElder – Many thanks! this will make our lives easier!!!. I was troubleshooting a weird Azure issue for the last couple of weeks, so had no time for anything else. But this is awesome news

nrapopor on Apr 26, 2020

@BenTheElder Is this going to have only internal support for restarting the cluster if the docker daemon restarts, or is it also going to have some type of support from the CLI (e.g. kind stop cluster/kind start cluster or kind pause cluster/kind unpause cluster)?

jorgemoralespou on Mar 16, 2020

As a partial workaround to speed up pods creation in a re-created cluster I mount containerd as volume to host machine, thus it survives cluster recreation and docker images are not downloaded every time after restart. e.g. I use following config for cluster creation:

kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
- role: control-plane
  extraMounts:
  - containerPath: /var/lib/containerd
    hostPath: /home/me/.kind/cache/containerd

gagara on Oct 22, 2019

Running docker start minio-demo-control-plane && docker exec minio-demo-control-plane sh -c ‘mount -o remount,ro /sys; kill -USR1 1’ worked for me 👍

On a recent version ( >= 0.3.0) it should just be docker start <node-name>. The rest is handled in the entrypoint.

Please add restart.

We’d like to, but it’s not quite this simple to do correctly. 🙃 That snippet doesn’t work for multi-node clusters (see previous discussion around IP allocation etc.). For single node clusters currently it would just be an alias to docker start $NODE_NAME. It’s being worked on but is a bit lower priority than some Kubernetes testing concerns, ephemeral clusters are still recommended.

BenTheElder on Jun 23, 2019

We have needed to restart a cluster several times, so I spent the time writing a script to restart the cluster and update the config accordingly

#!/usr/bin/env bash
KIND_CLUSTER="test"
KIND_CTX="kind-${KIND_CLUSTER}"

for container in $(kind get nodes --name ${KIND_CLUSTER}); do
      [[ $(docker inspect -f '{{.State.Running}}' $container) == "true" ]] || docker start $container
done
sleep 1
docker exec ${KIND_CLUSTER}-control-plane sh -c 'mount -o remount,ro /sys; kill -USR1 1'
kubectl config set clusters.${KIND_CTX}.server $(kind get kubeconfig --name ${KIND_CLUSTER} -q | yq read -j - | jq -r '.clusters[].cluster.server')
kubectl config set clusters.${KIND_CTX}.certificate-authority-data $(kind get kubeconfig --name ${KIND_CLUSTER} -q | yq read -j - | jq -r '.clusters[].cluster."certificate-authority-data"')
kubectl config set users.${KIND_CTX}.client-certificate-data $(kind get kubeconfig --name ${KIND_CLUSTER} -q | yq read -j - | jq -r '.users[].user."client-certificate-data"')
kubectl config set users.${KIND_CTX}.client-key-data $(kind get kubeconfig --name ${KIND_CLUSTER} -q | yq read -j - | jq -r '.users[].user."client-key-data"')

The client-cert and client-key shouldn’t change but since I was already updating the port, which changes whenever the control-plane is restarted, it was just a safety check to update all of them

slimm609 on Mar 25, 2020

+1 lets please get this done. We use KIND as a local SDK in a multi-node cluster that has been configured to match our higher environments in terms of setup and security. The process is phenomenal until a developer restarts and the entire cluster is rendered useless. I understand this use case isn’t exactly the one KIND is designed for, but shifting-left with such low overhead afforded to us with KIND has been a game-changer and we would hate to have to rever to a single minikube node.

ewassef on Feb 14, 2020

Local storage is fixed, working on this one again. /assign /lifecycle active

BenTheElder on Dec 11, 2019

@carlisia As Ben said, we still recommend ephemeral clusters.

#408 is processing the command to add restart command, but before that, we need to deal with some network related issues #484

tao12345666333 on Jun 23, 2019

Thanks for the data point @janwillies. This is definitely not actually supported properly (yet?) and would/will require a number of fixes, some of which are in progress. In the mean time we’ve continued to push to make it cheaper to create / delete and test with clean clusters. When 0.4 releases we expect kubernetes 1.14.X to start in ~20s if the image is warm locally.

BenTheElder on Jun 17, 2019

#461 removed the SIGUSR1 and mount fix commands, docker start should ~work for single-node clusters, multi-node will require an updated #408 😅

BenTheElder on May 1, 2019

tentatively tracking for 0.3

BenTheElder on Mar 25, 2019

Was the “restart” functionality ever shipped? I am using version 0.14.0 and dont see “restart” option in help message.

hitosatish on Aug 12, 2022

I can’t figure out a way to restart my cluster:

╰─λ kind --help
kind creates and manages local Kubernetes clusters using Docker container 'nodes'

Usage:
kind [command]

Available Commands:
build       Build one of [node-image]
completion  Output shell completion code for the specified shell (bash, zsh or fish)
create      Creates one of [cluster]
delete      Deletes one of [cluster]
export      Exports one of [kubeconfig, logs]
get         Gets one of [clusters, nodes, kubeconfig]
help        Help about any command
load        Loads images into nodes
version     Prints the kind CLI version

Flags:
-h, --help              help for kind
--loglevel string   DEPRECATED: see -v instead
-q, --quiet             silence all stderr output
-v, --verbosity int32   info log verbosity, higher value produces more output
--version           version for kind

Use "kind [command] --help" for more information about a command.

I am on Arch Linux.

caniko on Mar 20, 2022

@aojea Ok, then we await a better solution. 😃

gaui on Mar 17, 2020

This works:

$ docker ps -aq --filter 'label=io.x-k8s.kind.cluster' | awk '{print $1}' | xargs docker start
389fbc7f27c0
8234fdc273f5

That works ONLY if the container gets assigned the same IP it had before it was stopped. Docker uses the IPAM implemented in libnetwork and it doesn’t guarantee the container will get the same IP.

aojea on Mar 17, 2020

This works:

$ docker ps -aq --filter 'label=io.x-k8s.kind.cluster' | awk '{print $1}' | xargs docker start
389fbc7f27c0
8234fdc273f5

gaui on Mar 17, 2020

To start with I’m focusing solely on having it automatically restart correctly, but once those fixes are in place I expect stop / start / pause / unpause will make sense as a future step.

BenTheElder on Mar 16, 2020

Just installed kind from default branch, and one node kind cluster works well after container restart. Have tried kill + start, and docker daemon restart. Thank you!

skipor on Mar 16, 2020

This is on my radar, we’ve just had some other pressing changes to tackle (mostly around testing kubernetes, the stated #1 priority and original reason for the project) and nobody has proposed a maintainable solution to the network issues yet. I’ll look at this more this cycle.

BenTheElder on Nov 19, 2019

The main problem is that the container is not guaranteed to take the same IP that was assigned before the reboot, and that will break the cluster.

However, one user reported a working method in the slack channel https://kubernetes.slack.com/archives/CEKK1KTN2/p1565109268365000

cscetbon 6:34 PM
@Gustavo Sousa what I use :
alias kpause='kind get nodes|xargs docker pause'
alias kunpause='kind get nodes|xargs docker unpause'
(edited)

aojea on Nov 18, 2019

I would like to add two things:

Running docker start minio-demo-control-plane && docker exec minio-demo-control-plane sh -c 'mount -o remount,ro /sys; kill -USR1 1' worked for me 👍

Please add restart.

carlisia on Jun 23, 2019