kubernetes: Cluster recovering failure
What happened: after a midnight power down, I restarted my cluster’s host machine and all of it’s nodes. But seems like working nodes WRONGLY tried to reach the master node (and especially the API) through the CNI interface, which was not ready yet. This behavior was even on CNI-plugin pods (I use kube-router because of this Flannel issue).
After that, obviously non of spawned pods were able to access even API nor networking.
What you expected to happen: expected to be a regular cluster start, this means that all nodes SHOULD get to the API only through their physical network interfaces and not through the virtual, created by CNI plugin
How to reproduce it (as minimally and precisely as possible): executing the poweroff on any Kubernetes working cluster of ALL it’s components, then restarting.
Anything else we need to know?:
- My intallation is described in this Kubernetes issue. Please note that
kube-proxyandconntrackare obviously unrelated to main cluster running possibility (I deleted thekube-proxyfrom my cluster completely and it works normally as expected). - And one another thing - I was able to get the Kubernetes up and running again by stopping of the working nodes, waiting for some time (5-10 mins) and then starting them again. That means that somewhere inside of Kubernetes netwroking subsystem is hidden a timeout, which allow the obtaining the correct API endpoint through the physical network after some time, which is expected to be, instead of unexpected virtual interface. I suggest to eliminate the node network interaction throught any CNI interface, and allowing it only through physical interfaces, because this will guarantee the easier disaster-recovering.
- In order to confirm all above, here is some logs that I retrieved right before my tries of restarting the nodes and theirs machines (please note that I tooked this before the
kube-routerand other system pods started to crash with same errors asmetallbandprometheus):
root@s1-c2:~# kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
app-qr1demo app-qr1demo-deployment-6d757fcc54-295m9 1/1 Running 1 10d 10.244.2.34 s1-c4 <none> <none>
app-qr1demo app-qr1demo-deployment-6d757fcc54-5mvtf 1/1 Running 1 10d 10.244.1.23 s1-c3 <none> <none>
app-qr1demo app-qr1demo-deployment-6d757fcc54-9s84w 1/1 Running 1 10d 10.244.2.32 s1-c4 <none> <none>
gitlab-managed-apps prometheus-kube-state-metrics-6b5764b4-ttftw 0/1 CrashLoopBackOff 5 12d 10.244.1.21 s1-c3 <none> <none>
gitlab-managed-apps prometheus-prometheus-server-7d499465c7-5th9t 0/2 Pending 0 12d <none> <none> <none> <none>
gitlab-managed-apps runner-gitlab-runner-864c6cb898-xg26z 1/1 Running 1 12d 10.244.1.22 s1-c3 <none> <none>
gitlab-managed-apps tiller-deploy-7b74cd5dc5-7btw8 1/1 Running 1 12d 10.244.2.36 s1-c4 <none> <none>
ingress-controller haproxy-ingress-jtc5x 1/1 Running 2 12d 192.168.1.101 s1-c2 <none> <none>
kube-system coredns-66bff467f8-d5rr6 0/1 Running 1 12d 10.244.2.33 s1-c4 <none> <none>
kube-system coredns-66bff467f8-qvp8q 0/1 Running 1 12d 10.244.1.24 s1-c3 <none> <none>
kube-system etcd-s1-c2 1/1 Running 3 13d 192.168.1.101 s1-c2 <none> <none>
kube-system kube-apiserver-s1-c2 1/1 Running 3 13d 192.168.1.101 s1-c2 <none> <none>
kube-system kube-controller-manager-s1-c2 1/1 Running 3 13d 192.168.1.101 s1-c2 <none> <none>
kube-system kube-router-9m76z 1/1 Running 1 12d 192.168.1.103 s1-c4 <none> <none>
kube-system kube-router-b22rx 1/1 Running 1 12d 192.168.1.101 s1-c2 <none> <none>
kube-system kube-router-pp2kf 1/1 Running 1 12d 192.168.1.102 s1-c3 <none> <none>
kube-system kube-scheduler-s1-c2 1/1 Running 3 13d 192.168.1.101 s1-c2 <none> <none>
metallb-system controller-57f648cb96-s9n8h 1/1 Running 1 11d 10.244.2.35 s1-c4 <none> <none>
metallb-system speaker-9k7np 0/1 CrashLoopBackOff 3 11d 192.168.1.102 s1-c3 <none> <none>
metallb-system speaker-dmkxp 1/1 Running 1 11d 192.168.1.101 s1-c2 <none> <none>
metallb-system speaker-lttdf 0/1 CrashLoopBackOff 5 11d 192.168.1.103 s1-c4 <none> <none>
root@s1-c2:~# kubectl logs -n gitlab-managed-apps prometheus-kube-state-metrics-6b5764b4-ttftw
I0429 08:51:51.220896 1 main.go:85] Using default collectors
I0429 08:51:51.220970 1 main.go:93] Using all namespace
I0429 08:51:51.220984 1 main.go:129] metric white-blacklisting: blacklisting the following items:
W0429 08:51:51.221016 1 client_config.go:549] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0429 08:51:51.222834 1 main.go:169] Testing communication with server
F0429 08:51:51.223271 1 main.go:137] Failed to create client: ERROR communicating with apiserver: Get https://10.96.0.1:443/version?timeout=32s: dial tcp 10.96.0.1:443: connect: connection refused
root@s1-c2:~# kubectl logs -n metallb-system pod/speaker-9k7np
{"branch":"HEAD","caller":"main.go:82","commit":"v0.9.3","msg":"MetalLB speaker starting version 0.9.3 (commit v0.9.3, branch HEAD)","ts":"2020-04-29T08:51:18.505299536Z","version":"0.9.3"}
{"caller":"announcer.go:103","event":"createARPResponder","interface":"docker0","msg":"created ARP responder for interface","ts":"2020-04-29T08:51:18.506740749Z"}
{"caller":"announcer.go:103","event":"createARPResponder","interface":"kube-bridge","msg":"created ARP responder for interface","ts":"2020-04-29T08:51:18.507003658Z"}
{"caller":"announcer.go:112","event":"createNDPResponder","interface":"kube-bridge","msg":"created NDP responder for interface","ts":"2020-04-29T08:51:18.605647653Z"}
{"caller":"main.go:186","msg":"Node event","node addr":"192.168.1.102","node event":"NodeJoin","node name":"s1-c3","ts":"2020-04-29T08:51:18.606305659Z"}
{"caller":"main.go:187","msg":"Call Force Sync","ts":"2020-04-29T08:51:18.606396231Z"}
{"caller":"announcer.go:103","event":"createARPResponder","interface":"eth0","msg":"created ARP responder for interface","ts":"2020-04-29T08:51:18.606571914Z"}
{"caller":"main.go:159","error":"Get https://10.96.0.1:443/api/v1/namespaces/metallb-system/pods?labelSelector=app%3Dmetallb%2Ccomponent%3Dspeaker: dial tcp 10.96.0.1:443: connect: connection refused","msg":"failed to get PodsIPs","op":"startup","ts":"2020-04-29T08:51:18.60671253Z"}
root@s1-c2:~#
Environment:
- Kubernetes version:
1.18.1 - Cloud provider or hardware configuration: LXC containers with Ubuntu
18.04on a bare metal server with Ubuntu18.04 - OS: Ubuntu
18.04 - Kernel:
4.15.0-96-generic #97-Ubuntu SMP Wed Apr 1 03:25:46 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux - Install tools: lxc, net-tools, bridge-utils
- Network plugin and version:
kube-router - Others: if there are need in additional info about my hypervisor or LXC container I’ll post it in discussion
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 26 (10 by maintainers)
Hi @prasannjeet ! Sorry for late answer. I tried to put the following rules in the
systemddaemon of thekubelet:The newer versions of Kubernetes seems to be delivered with correct order of the
Afterrules.I used to maintain the kube-router project so I might be able to offer some insight here that could be useful.
One of the problems here is that kube-router relies on addresses in the Node object to properly configure it’s overlays. In the absence of a cloud provider, kubelet does a best-effort guess on what the address of the node is on start-up based on the network interfaces it sees. But sometimes kubelet can guess wrong (rare though IME) and it will definitely guess wrong if the desired network interface is not up yet. It sounds like there are two things you should do to address this: