kops: etcd is not started on CoreOS-based masters since 1688.4.0

Thanks for submitting an issue! Please fill in as much of the template below as you can.

------------- BUG REPORT TEMPLATE --------------------

  1. What kops version are you running? The command kops version, will display this information.

Version 1.9.0-alpha.3 (git-ad210dc4b)

  1. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag.

1.9.3

  1. What cloud provider are you using?

AWS

  1. What commands did you run? What is the simplest way to reproduce this issue?

Build two clusters:

Cluster A (CoreOS-stable 1632.3.0)

kops create cluster     \
     --ssh-public-key=~/.ssh/my_ssh_key.pub \
     --authorization RBAC   \
     --node-count 3   \
     --zones "us-east-1a,us-east-1b,us-east-1c" \
     --master-zones "us-east-1a,us-east-1b,us-east-1c" \
     --node-size t2.large   \
     --master-size t2.medium  \
     --topology public  \
     --network-cidr=10.25.0.0/16 \
     --networking canal \
     --name coreos-1632-3-0.us-east-1.kube.redacted.com

Cluster B (CoreOS-stable 1688.4.0)

kops create cluster     \
     --ssh-public-key=~/.ssh/my_ssh_key.pub \
     --authorization RBAC   \
     --node-count 3   \
     --zones "us-east-1a,us-east-1b,us-east-1c" \
     --master-zones "us-east-1a,us-east-1b,us-east-1c" \
     --node-size t2.large   \
     --master-size t2.medium  \
     --topology public  \
     --network-cidr=10.25.0.0/16 \
     --networking canal \
     --name coreos-1688-4-0.us-east-1.kube.redacted.com

Edit the IGs for cluster A’s masters to utilize the 595879546273/CoreOS-stable-1632.3.0-hvm image kops edit ig --name=coreos-1632-3-0.us-east-1.kube.redacted.com master-us-east-1a kops edit ig --name=coreos-1632-3-0.us-east-1.kube.redacted.com master-us-east-1c kops edit ig --name=coreos-1632-3-0.us-east-1.kube.redacted.com master-us-east-1b

Edit the IGs for cluster B’s masters to utilize the 595879546273/CoreOS-stable-1688.4.0-hvm image kops edit ig --name=coreos-1688-4-0.us-east-1.kube.redacted.com master-us-east-1a kops edit ig --name=coreos-1688-4-0.us-east-1.kube.redacted.com master-us-east-1c kops edit ig --name=coreos-1688-4-0.us-east-1.kube.redacted.com master-us-east-1b

Instantiate the clusters: kops update cluster coreos-1632-3-0.us-east-1.kube.redacted.com --yes kops update cluster coreos-1688-4-0.us-east-1.kube.redacted.com --yes

In ten minutes, validate both clusters and observe that 1632.3.0 is healthy but 1688.4.0 never validates. This is because etcd is not started under 1688.4.0.

  1. What happened after the commands executed?

The 1688.4.0 cluster never validates; the 1632.3.0 comes up and is healthy.

  1. What did you expect to happen?

1688.4.0 comes alive just like previous versions did.

  1. Please provide your cluster manifest. Execute

Available upon request.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 7
  • Comments: 18 (13 by maintainers)

Commits related to this issue

Most upvoted comments

@chrissnell thanks for beating me to it. I ran into the same issue.

Can we get this fix back-ported to 1.8?

This is fixed in master following PR #4849 and made it into the latest release (1.9.0-beta.2), thanks @justinsb!

I’ve raised PR #4909 in relation to disabling the update-engine by default for CoreOS.

Rolling update using new CoreOS image triggered this bug for our clusters as well.