kubernetes: Cluster recovering failure

What happened: after a midnight power down, I restarted my cluster’s host machine and all of it’s nodes. But seems like working nodes WRONGLY tried to reach the master node (and especially the API) through the CNI interface, which was not ready yet. This behavior was even on CNI-plugin pods (I use kube-router because of this Flannel issue). After that, obviously non of spawned pods were able to access even API nor networking.

What you expected to happen: expected to be a regular cluster start, this means that all nodes SHOULD get to the API only through their physical network interfaces and not through the virtual, created by CNI plugin

How to reproduce it (as minimally and precisely as possible): executing the poweroff on any Kubernetes working cluster of ALL it’s components, then restarting.

Anything else we need to know?:

  1. My intallation is described in this Kubernetes issue. Please note that kube-proxy and conntrack are obviously unrelated to main cluster running possibility (I deleted the kube-proxy from my cluster completely and it works normally as expected).
  2. And one another thing - I was able to get the Kubernetes up and running again by stopping of the working nodes, waiting for some time (5-10 mins) and then starting them again. That means that somewhere inside of Kubernetes netwroking subsystem is hidden a timeout, which allow the obtaining the correct API endpoint through the physical network after some time, which is expected to be, instead of unexpected virtual interface. I suggest to eliminate the node network interaction throught any CNI interface, and allowing it only through physical interfaces, because this will guarantee the easier disaster-recovering.
  3. In order to confirm all above, here is some logs that I retrieved right before my tries of restarting the nodes and theirs machines (please note that I tooked this before the kube-router and other system pods started to crash with same errors as metallb and prometheus):
root@s1-c2:~# kubectl get pods --all-namespaces -o wide
NAMESPACE             NAME                                            READY   STATUS             RESTARTS   AGE   IP              NODE     NOMINATED NODE   READINESS GATES
app-qr1demo           app-qr1demo-deployment-6d757fcc54-295m9         1/1     Running            1          10d   10.244.2.34     s1-c4    <none>           <none>
app-qr1demo           app-qr1demo-deployment-6d757fcc54-5mvtf         1/1     Running            1          10d   10.244.1.23     s1-c3    <none>           <none>
app-qr1demo           app-qr1demo-deployment-6d757fcc54-9s84w         1/1     Running            1          10d   10.244.2.32     s1-c4    <none>           <none>
gitlab-managed-apps   prometheus-kube-state-metrics-6b5764b4-ttftw    0/1     CrashLoopBackOff   5          12d   10.244.1.21     s1-c3    <none>           <none>
gitlab-managed-apps   prometheus-prometheus-server-7d499465c7-5th9t   0/2     Pending            0          12d   <none>          <none>   <none>           <none>
gitlab-managed-apps   runner-gitlab-runner-864c6cb898-xg26z           1/1     Running            1          12d   10.244.1.22     s1-c3    <none>           <none>
gitlab-managed-apps   tiller-deploy-7b74cd5dc5-7btw8                  1/1     Running            1          12d   10.244.2.36     s1-c4    <none>           <none>
ingress-controller    haproxy-ingress-jtc5x                           1/1     Running            2          12d   192.168.1.101   s1-c2    <none>           <none>
kube-system           coredns-66bff467f8-d5rr6                        0/1     Running            1          12d   10.244.2.33     s1-c4    <none>           <none>
kube-system           coredns-66bff467f8-qvp8q                        0/1     Running            1          12d   10.244.1.24     s1-c3    <none>           <none>
kube-system           etcd-s1-c2                                      1/1     Running            3          13d   192.168.1.101   s1-c2    <none>           <none>
kube-system           kube-apiserver-s1-c2                            1/1     Running            3          13d   192.168.1.101   s1-c2    <none>           <none>
kube-system           kube-controller-manager-s1-c2                   1/1     Running            3          13d   192.168.1.101   s1-c2    <none>           <none>
kube-system           kube-router-9m76z                               1/1     Running            1          12d   192.168.1.103   s1-c4    <none>           <none>
kube-system           kube-router-b22rx                               1/1     Running            1          12d   192.168.1.101   s1-c2    <none>           <none>
kube-system           kube-router-pp2kf                               1/1     Running            1          12d   192.168.1.102   s1-c3    <none>           <none>
kube-system           kube-scheduler-s1-c2                            1/1     Running            3          13d   192.168.1.101   s1-c2    <none>           <none>
metallb-system        controller-57f648cb96-s9n8h                     1/1     Running            1          11d   10.244.2.35     s1-c4    <none>           <none>
metallb-system        speaker-9k7np                                   0/1     CrashLoopBackOff   3          11d   192.168.1.102   s1-c3    <none>           <none>
metallb-system        speaker-dmkxp                                   1/1     Running            1          11d   192.168.1.101   s1-c2    <none>           <none>
metallb-system        speaker-lttdf                                   0/1     CrashLoopBackOff   5          11d   192.168.1.103   s1-c4    <none>           <none>
root@s1-c2:~# kubectl logs -n gitlab-managed-apps prometheus-kube-state-metrics-6b5764b4-ttftw
I0429 08:51:51.220896       1 main.go:85] Using default collectors
I0429 08:51:51.220970       1 main.go:93] Using all namespace
I0429 08:51:51.220984       1 main.go:129] metric white-blacklisting: blacklisting the following items:
W0429 08:51:51.221016       1 client_config.go:549] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0429 08:51:51.222834       1 main.go:169] Testing communication with server
F0429 08:51:51.223271       1 main.go:137] Failed to create client: ERROR communicating with apiserver: Get https://10.96.0.1:443/version?timeout=32s: dial tcp 10.96.0.1:443: connect: connection refused
root@s1-c2:~# kubectl logs -n metallb-system pod/speaker-9k7np
{"branch":"HEAD","caller":"main.go:82","commit":"v0.9.3","msg":"MetalLB speaker starting version 0.9.3 (commit v0.9.3, branch HEAD)","ts":"2020-04-29T08:51:18.505299536Z","version":"0.9.3"}
{"caller":"announcer.go:103","event":"createARPResponder","interface":"docker0","msg":"created ARP responder for interface","ts":"2020-04-29T08:51:18.506740749Z"}
{"caller":"announcer.go:103","event":"createARPResponder","interface":"kube-bridge","msg":"created ARP responder for interface","ts":"2020-04-29T08:51:18.507003658Z"}
{"caller":"announcer.go:112","event":"createNDPResponder","interface":"kube-bridge","msg":"created NDP responder for interface","ts":"2020-04-29T08:51:18.605647653Z"}
{"caller":"main.go:186","msg":"Node event","node addr":"192.168.1.102","node event":"NodeJoin","node name":"s1-c3","ts":"2020-04-29T08:51:18.606305659Z"}
{"caller":"main.go:187","msg":"Call Force Sync","ts":"2020-04-29T08:51:18.606396231Z"}
{"caller":"announcer.go:103","event":"createARPResponder","interface":"eth0","msg":"created ARP responder for interface","ts":"2020-04-29T08:51:18.606571914Z"}
{"caller":"main.go:159","error":"Get https://10.96.0.1:443/api/v1/namespaces/metallb-system/pods?labelSelector=app%3Dmetallb%2Ccomponent%3Dspeaker: dial tcp 10.96.0.1:443: connect: connection refused","msg":"failed to get PodsIPs","op":"startup","ts":"2020-04-29T08:51:18.60671253Z"}
root@s1-c2:~#

Environment:

  • Kubernetes version: 1.18.1
  • Cloud provider or hardware configuration: LXC containers with Ubuntu 18.04 on a bare metal server with Ubuntu 18.04
  • OS: Ubuntu 18.04
  • Kernel: 4.15.0-96-generic #97-Ubuntu SMP Wed Apr 1 03:25:46 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: lxc, net-tools, bridge-utils
  • Network plugin and version: kube-router
  • Others: if there are need in additional info about my hypervisor or LXC container I’ll post it in discussion

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 26 (10 by maintainers)

Most upvoted comments

Hi @prasannjeet ! Sorry for late answer. I tried to put the following rules in the systemd daemon of the kubelet:

[Unit]
Description=Kubernetes
After=syslog.target
After=network.target

The newer versions of Kubernetes seems to be delivered with correct order of the After rules.

I used to maintain the kube-router project so I might be able to offer some insight here that could be useful.

One of the problems here is that kube-router relies on addresses in the Node object to properly configure it’s overlays. In the absence of a cloud provider, kubelet does a best-effort guess on what the address of the node is on start-up based on the network interfaces it sees. But sometimes kubelet can guess wrong (rare though IME) and it will definitely guess wrong if the desired network interface is not up yet. It sounds like there are two things you should do to address this:

  1. Configure kubelet to start after your networking stack is properly configured. You mentioned this isn’t present in the “default” installation, which installation are you referring to exactly?
  2. Set --node-ip on kubelet to the address of the physical network interface you want the node to use. This ensures that the advertise address of the node is set properly, even before the network interface is up. Though ideally the network interface should be up anyways.