kops: etcd is not started on CoreOS-based masters since 1688.4.0
Thanks for submitting an issue! Please fill in as much of the template below as you can.
------------- BUG REPORT TEMPLATE --------------------
- What
kopsversion are you running? The commandkops version, will display this information.
Version 1.9.0-alpha.3 (git-ad210dc4b)
- What Kubernetes version are you running?
kubectl versionwill print the version if a cluster is running or provide the Kubernetes version specified as akopsflag.
1.9.3
- What cloud provider are you using?
AWS
- What commands did you run? What is the simplest way to reproduce this issue?
Build two clusters:
Cluster A (CoreOS-stable 1632.3.0)
kops create cluster \
--ssh-public-key=~/.ssh/my_ssh_key.pub \
--authorization RBAC \
--node-count 3 \
--zones "us-east-1a,us-east-1b,us-east-1c" \
--master-zones "us-east-1a,us-east-1b,us-east-1c" \
--node-size t2.large \
--master-size t2.medium \
--topology public \
--network-cidr=10.25.0.0/16 \
--networking canal \
--name coreos-1632-3-0.us-east-1.kube.redacted.com
Cluster B (CoreOS-stable 1688.4.0)
kops create cluster \
--ssh-public-key=~/.ssh/my_ssh_key.pub \
--authorization RBAC \
--node-count 3 \
--zones "us-east-1a,us-east-1b,us-east-1c" \
--master-zones "us-east-1a,us-east-1b,us-east-1c" \
--node-size t2.large \
--master-size t2.medium \
--topology public \
--network-cidr=10.25.0.0/16 \
--networking canal \
--name coreos-1688-4-0.us-east-1.kube.redacted.com
Edit the IGs for cluster A’s masters to utilize the 595879546273/CoreOS-stable-1632.3.0-hvm image
kops edit ig --name=coreos-1632-3-0.us-east-1.kube.redacted.com master-us-east-1a
kops edit ig --name=coreos-1632-3-0.us-east-1.kube.redacted.com master-us-east-1c
kops edit ig --name=coreos-1632-3-0.us-east-1.kube.redacted.com master-us-east-1b
Edit the IGs for cluster B’s masters to utilize the 595879546273/CoreOS-stable-1688.4.0-hvm image
kops edit ig --name=coreos-1688-4-0.us-east-1.kube.redacted.com master-us-east-1a
kops edit ig --name=coreos-1688-4-0.us-east-1.kube.redacted.com master-us-east-1c
kops edit ig --name=coreos-1688-4-0.us-east-1.kube.redacted.com master-us-east-1b
Instantiate the clusters:
kops update cluster coreos-1632-3-0.us-east-1.kube.redacted.com --yes
kops update cluster coreos-1688-4-0.us-east-1.kube.redacted.com --yes
In ten minutes, validate both clusters and observe that 1632.3.0 is healthy but 1688.4.0 never validates. This is because etcd is not started under 1688.4.0.
- What happened after the commands executed?
The 1688.4.0 cluster never validates; the 1632.3.0 comes up and is healthy.
- What did you expect to happen?
1688.4.0 comes alive just like previous versions did.
- Please provide your cluster manifest. Execute
Available upon request.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 7
- Comments: 18 (13 by maintainers)
@chrissnell thanks for beating me to it. I ran into the same issue.
Can we get this fix back-ported to 1.8?
This is fixed in master following PR #4849 and made it into the latest release (
1.9.0-beta.2), thanks @justinsb!I’ve raised PR #4909 in relation to disabling the update-engine by default for CoreOS.
Rolling update using new CoreOS image triggered this bug for our clusters as well.