kops: Couldn't find key etcd_endpoints in ConfigMap kube-system/calico-config
**1. What kops version are you running? The command kops version, will display
this information.**
kops 1.12.1
2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
Upgrading from v1.11.10 to 1.12.8
3. What cloud provider are you using? AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
kops rolling-update cluster --cloudonly --master-interval=1s --node-interval=1s --yes
5. What happened after the commands executed?
master not healthy after update, stopping rolling-update: “error validating cluster after removing a node: cluster did not validate within a duration of "5m0s"”
6. What did you expect to happen?
Validation to complete successfully
9. Anything else do we need to know?
I clearly messed up the upgrade from v1.11.10 to 1.12.8
I originally ran
kops update...
kops rolling-update cluster --yes
Above failed on first master with
master not healthy after update, stopping rolling-update: "error validating cluster after removing a node: cluster did not validate within a duration of \"5m0s\""
Validation failing due to
Pod kube-system/calico-complete-upgrade-v331-mz6z9 kube-system pod "calico-complete-upgrade-v331-mz6z9" is pending
Warning Failed XXXXX Error: Couldn't find key etcd_endpoints in ConfigMap kube-system/calico-config
I then ran the following as per offical docs
kops rolling-update cluster --cloudonly --master-interval=1s --node-interval=1s --yes
Which upgraded all the nodes but the validation is still a failure due to error above.
Can I terminate the master the which originally failed?
Any help is appreciated
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 3
- Comments: 15 (3 by maintainers)
@cn-5p1ke I still dont think the errors for me are 100% resolved, but I was able to validate the cluster in that given time. PS, I didnt have RBAC enabled on this test cluster
In my case, I manually added the field
etcd_endpointsin the configmap and also changed thelast-applied-configuration, which was a pain. I was able to get thecalico-kube-controllerspod working at that given time (which was in failed/pending state) due to which my cluster was not getting validated.However, I still see differences between my other clusters compared to this one. For example, I see:
etcd-manager-events-ipin test clusteretcd-server-events-ipin other clusters.etcd-server-ippods at all in test cluster at allCluster seem to be running okay for now (inter pod communication), but I believe I will have to troubleshoot something real soon.
Its because of this here. etcd2-3 migration is disruptive to masters (I am on etcd 3). I will try to upgrade to 1.13 to see if this resolves the issue (since now its stable)