kops: OpenStack: etcd-manager does not always mount volumes resulting in invalid cluster creation

1. What kops version are you running? The command kops version, will display this information. 1.19.1

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag. 1.19.9

3. What cloud provider are you using? OpenStack

4. What commands did you run? What is the simplest way to reproduce this issue? kops create cluster, kops update cluster --yes

5. What happened after the commands executed? One out of two times one (only one) of the masters does not become healthy. It appears one of the two etcd volumes is not being formatted and mounted to the master node by etcd-manager.

6. What did you expect to happen? Every cluster creation should result in every master having two etcd volumes resulting in a valid cluster.

9. Anything else do we need to know? This issue started with kops 1.19, with kops 1.18 it was neverthere. By hardcoding other etcd-manager versions into my kops binary and hosting etcd-manager docker images myself, I was able to pinpoint the start of the issue between etcd-manager versions 3.0.20200531 (kops 1.18) and 3.0.20210122 (kops 1.19) to exact version 3.0.20201117 which starts creating failed clusters.

I’m having a hard time finding out which exact commit in etcd-manager 3.0.20201117 introduced the problem though. I’ve reverted some commits and did some testing, these are the ones I was suspecting but ended up not being the troublemakers:

It’s quite a big list of changes between 3.0.20200531 and 3.0.20201117 😉 Anyone got an idea what’s going on here?

Cheers, kciredor

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 35 (17 by maintainers)

Most upvoted comments

@kciredor There’s a new etcd-manager version that was released last week. Could you use that for further debugging, as it has various updates that may help? https://github.com/kubernetes/kops/pull/11098