kube-router: route missing upon pod restart

When a kube-router pod is restarted it appears to lose at least one and sometimes several routes at either shutdown or startup. I have several k8s clusters, both using kube-router v0.1.0, and both have encountered this problem over the past few days. CoreOS auto-updates cause the host, and therefore kube-router, to restart – that seems to trigger the problem.

The host with the missing routes is ip-172-21-91-59.us-west-2.compute.internal – it auto-updated to CoreOS 1688.5.3. Before the update it was running CoreOS 1.6.32.2.1, the same version as other hosts.

Here’s the cluster after the node auto-updated (note the last line, it’s the host that updated):

$ kubectl get nodes -o wide
NAME                                          STATUS    ROLES     AGE       VERSION   EXTERNAL-IP   OS-IMAGE                                        KERNEL-VERSION   CONTAINER-RUNTIME
ip-172-21-46-101.us-west-2.compute.internal   Ready     node      1d        v1.8.9    <none>        Container Linux by CoreOS 1632.2.1 (Ladybug)    4.14.16-coreos   docker://17.9.1
ip-172-21-55-253.us-west-2.compute.internal   Ready     node      1d        v1.8.9    <none>        Container Linux by CoreOS 1632.2.1 (Ladybug)    4.14.16-coreos   docker://17.9.1
ip-172-21-58-9.us-west-2.compute.internal     Ready     master    24d       v1.8.9    <none>        Container Linux by CoreOS 1632.3.0 (Ladybug)    4.14.19-coreos   docker://17.9.1
ip-172-21-62-29.us-west-2.compute.internal    Ready     node      1d        v1.8.9    <none>        Container Linux by CoreOS 1632.2.1 (Ladybug)    4.14.16-coreos   docker://17.9.1
ip-172-21-71-225.us-west-2.compute.internal   Ready     node      1d        v1.8.9    <none>        Container Linux by CoreOS 1632.2.1 (Ladybug)    4.14.16-coreos   docker://17.9.1
ip-172-21-91-59.us-west-2.compute.internal    Ready     node      1d        v1.8.9    <none>        Container Linux by CoreOS 1688.5.3 (Rhyolite)   4.14.32-coreos   docker://17.12.1-ce

k8s shows the kube-router pod restarted (the first line):

$ kubectl -n kube-system get pods | grep kube-router
kube-router-2gjg5    1/1       Running   1          23h       172.21.91.59    ip-172-21-91-59.us-west-2.compute.internal
kube-router-9qrbn    1/1       Running   0          23h       172.21.62.29    ip-172-21-62-29.us-west-2.compute.internal
kube-router-lqnh5    1/1       Running   0          23h       172.21.71.225   ip-172-21-71-225.us-west-2.compute.internal
kube-router-ntkwq    1/1       Running   0          23h       172.21.46.101   ip-172-21-46-101.us-west-2.compute.internal
kube-router-q4fhq    1/1       Running   0          23h       172.21.58.9     ip-172-21-58-9.us-west-2.compute.internal
kube-router-vfqw2    1/1       Running   0          23h       172.21.55.253   ip-172-21-55-253.us-west-2.compute.internal

After being updated (and kube-router restarting, obviously) the host’s routing table looked like this:

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         172.21.64.1     0.0.0.0         UG    1024   0        0 eth0
100.96.5.0      0.0.0.0         255.255.255.0   U     0      0        0 tun-1722146101
100.96.6.0      172.21.71.225   255.255.255.0   UG    0      0        0 eth0
100.96.7.0      0.0.0.0         255.255.255.0   U     0      0        0 kube-bridge
100.96.8.0      0.0.0.0         255.255.255.0   U     0      0        0 tun-1722155253
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
172.21.64.0     0.0.0.0         255.255.224.0   U     0      0        0 eth0
172.21.64.1     0.0.0.0         255.255.255.255 UH    1024   0        0 eth0

It’s missing routes for 100.96.0.0 and 100.96.9.0.

Deleting the kube-router pod causes a new pod to start, and it fixes the routing table, which now looks like this:

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         172.21.64.1     0.0.0.0         UG    1024   0        0 eth0
100.96.0.0      0.0.0.0         255.255.255.0   U     0      0        0 tun-17221589
100.96.5.0      0.0.0.0         255.255.255.0   U     0      0        0 tun-1722146101
100.96.6.0      172.21.71.225   255.255.255.0   UG    0      0        0 eth0
100.96.7.0      0.0.0.0         255.255.255.0   U     0      0        0 kube-bridge
100.96.8.0      0.0.0.0         255.255.255.0   U     0      0        0 tun-1722155253
100.96.9.0      0.0.0.0         255.255.255.0   U     0      0        0 tun-172216229
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
172.21.64.0     0.0.0.0         255.255.224.0   U     0      0        0 eth0
172.21.64.1     0.0.0.0         255.255.255.255 UH    1024   0        0 eth0

I neglected to fetch the BGP peering info when the problem occurred this time, however when the problem occurred several days ago I did check BGP and nodes were peering correctly (there are snippets from the previous occurrence in #kube-router).

Since this problem occurred in one of our clusters, I disabled auto-updates there but left them enabled in our second cluster, for debugging. I increased kube-router’s log verbosity (--v=3) anticipating it would occur in our second cluster, which it did. The attached logs are from the pod (kube-router-2gjg5) before the host restarted, the same pod after it restarted, and logs from the new pod (the old pod was deleted, the new pod fixed the routes).

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 24 (13 by maintainers)

Most upvoted comments

@murali-reddy we solved this using @bush-niel solution. We are using CoreOS

Container Linux by CoreOS 1967.6.0 (Rhyolite)
Kernel: 4.14.96-coreos-r1

We created a file 50-kube-router.network in the directory /etc/systemd/network/ with the contents

[Match]
Name=tun* kube-bridge kube-dummy-if

[Link]
Unmanaged=yes

It stopped all the race condition caused by networkd. You can replicate the problem by using the above ami and having kube-router a ds. You can use this command to find the race condition errors cause even networkd is trying to manage the interfaces. journalctl -u systemd-networkd. Also when you use the command sudo systemctl restart systemd-networkd networkd closes all the tunnels. After adding the networkd file and typing sudo systemctl restart systemd-networkd doesn’t close the tunnel. You will need to restart networkd after adding the file.

So me and @roffe have been testing this issue independently. I was using below image which is latest core os stable release image.

  "us-west-2": {
    "hvm": "ami-b41377cc",
    "pv": "ami-f81c7880"
  }

I ran ip monitor command while the node is booting up i.e) first time kube-router is starting. I see that route get added by subsequently deleted.

172.20.89.123 dev tun-1722089123 table 77 scope link
100.96.1.0/24 dev tun-1722089123 proto 17 src 172.20.52.43
Deleted 100.96.1.0/24 dev tun-1722089123 proto 17 src 172.20.52.43
100.96.0.0/24 via 172.20.34.210 dev eth0 proto 17
100.96.2.0/24 via 172.20.33.115 dev eth0 proto 17

Once i restart the kube-router pod on the node route gets added but is not deleted.

172.20.33.115 dev eth0 lladdr 06:8d:17:26:d1:f8 REACHABLE
100.96.1.0/24 dev tun-1722089123 proto 17 src 172.20.52.43
172.20.33.115 dev eth0 lladdr 06:8d:17:26:d1:f8 STALE

So clerly there is netlink call to delete the route. Now question is who is removing the entry?

I will investigate further to trace the source.