kube-router: route missing upon pod restart

When a kube-router pod is restarted it appears to lose at least one and sometimes several routes at either shutdown or startup. I have several k8s clusters, both using kube-router v0.1.0, and both have encountered this problem over the past few days. CoreOS auto-updates cause the host, and therefore kube-router, to restart – that seems to trigger the problem.

The host with the missing routes is ip-172-21-91-59.us-west-2.compute.internal – it auto-updated to CoreOS 1688.5.3. Before the update it was running CoreOS 1.6.32.2.1, the same version as other hosts.

Here’s the cluster after the node auto-updated (note the last line, it’s the host that updated):

$ kubectl get nodes -o wide
NAME                                          STATUS    ROLES     AGE       VERSION   EXTERNAL-IP   OS-IMAGE                                        KERNEL-VERSION   CONTAINER-RUNTIME
ip-172-21-46-101.us-west-2.compute.internal   Ready     node      1d        v1.8.9    <none>        Container Linux by CoreOS 1632.2.1 (Ladybug)    4.14.16-coreos   docker://17.9.1
ip-172-21-55-253.us-west-2.compute.internal   Ready     node      1d        v1.8.9    <none>        Container Linux by CoreOS 1632.2.1 (Ladybug)    4.14.16-coreos   docker://17.9.1
ip-172-21-58-9.us-west-2.compute.internal     Ready     master    24d       v1.8.9    <none>        Container Linux by CoreOS 1632.3.0 (Ladybug)    4.14.19-coreos   docker://17.9.1
ip-172-21-62-29.us-west-2.compute.internal    Ready     node      1d        v1.8.9    <none>        Container Linux by CoreOS 1632.2.1 (Ladybug)    4.14.16-coreos   docker://17.9.1
ip-172-21-71-225.us-west-2.compute.internal   Ready     node      1d        v1.8.9    <none>        Container Linux by CoreOS 1632.2.1 (Ladybug)    4.14.16-coreos   docker://17.9.1
ip-172-21-91-59.us-west-2.compute.internal    Ready     node      1d        v1.8.9    <none>        Container Linux by CoreOS 1688.5.3 (Rhyolite)   4.14.32-coreos   docker://17.12.1-ce

k8s shows the kube-router pod restarted (the first line):

$ kubectl -n kube-system get pods | grep kube-router
kube-router-2gjg5    1/1       Running   1          23h       172.21.91.59    ip-172-21-91-59.us-west-2.compute.internal
kube-router-9qrbn    1/1       Running   0          23h       172.21.62.29    ip-172-21-62-29.us-west-2.compute.internal
kube-router-lqnh5    1/1       Running   0          23h       172.21.71.225   ip-172-21-71-225.us-west-2.compute.internal
kube-router-ntkwq    1/1       Running   0          23h       172.21.46.101   ip-172-21-46-101.us-west-2.compute.internal
kube-router-q4fhq    1/1       Running   0          23h       172.21.58.9     ip-172-21-58-9.us-west-2.compute.internal
kube-router-vfqw2    1/1       Running   0          23h       172.21.55.253   ip-172-21-55-253.us-west-2.compute.internal

After being updated (and kube-router restarting, obviously) the host’s routing table looked like this:

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         172.21.64.1     0.0.0.0         UG    1024   0        0 eth0
100.96.5.0      0.0.0.0         255.255.255.0   U     0      0        0 tun-1722146101
100.96.6.0      172.21.71.225   255.255.255.0   UG    0      0        0 eth0
100.96.7.0      0.0.0.0         255.255.255.0   U     0      0        0 kube-bridge
100.96.8.0      0.0.0.0         255.255.255.0   U     0      0        0 tun-1722155253
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
172.21.64.0     0.0.0.0         255.255.224.0   U     0      0        0 eth0
172.21.64.1     0.0.0.0         255.255.255.255 UH    1024   0        0 eth0

It’s missing routes for 100.96.0.0 and 100.96.9.0.

Deleting the kube-router pod causes a new pod to start, and it fixes the routing table, which now looks like this:

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         172.21.64.1     0.0.0.0         UG    1024   0        0 eth0
100.96.0.0      0.0.0.0         255.255.255.0   U     0      0        0 tun-17221589
100.96.5.0      0.0.0.0         255.255.255.0   U     0      0        0 tun-1722146101
100.96.6.0      172.21.71.225   255.255.255.0   UG    0      0        0 eth0
100.96.7.0      0.0.0.0         255.255.255.0   U     0      0        0 kube-bridge
100.96.8.0      0.0.0.0         255.255.255.0   U     0      0        0 tun-1722155253
100.96.9.0      0.0.0.0         255.255.255.0   U     0      0        0 tun-172216229
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
172.21.64.0     0.0.0.0         255.255.224.0   U     0      0        0 eth0
172.21.64.1     0.0.0.0         255.255.255.255 UH    1024   0        0 eth0

I neglected to fetch the BGP peering info when the problem occurred this time, however when the problem occurred several days ago I did check BGP and nodes were peering correctly (there are snippets from the previous occurrence in #kube-router).

Since this problem occurred in one of our clusters, I disabled auto-updates there but left them enabled in our second cluster, for debugging. I increased kube-router’s log verbosity (--v=3) anticipating it would occur in our second cluster, which it did. The attached logs are from the pod (kube-router-2gjg5) before the host restarted, the same pod after it restarted, and logs from the new pod (the old pod was deleted, the new pod fixed the routes).

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 24 (13 by maintainers)

Most upvoted comments

@murali-reddy we solved this using @bush-niel solution. We are using CoreOS

Container Linux by CoreOS 1967.6.0 (Rhyolite)
Kernel: 4.14.96-coreos-r1

We created a file 50-kube-router.network in the directory /etc/systemd/network/ with the contents

[Match]
Name=tun* kube-bridge kube-dummy-if

[Link]
Unmanaged=yes

It stopped all the race condition caused by networkd. You can replicate the problem by using the above ami and having kube-router a ds. You can use this command to find the race condition errors cause even networkd is trying to manage the interfaces. journalctl -u systemd-networkd. Also when you use the command sudo systemctl restart systemd-networkd networkd closes all the tunnels. After adding the networkd file and typing sudo systemctl restart systemd-networkd doesn’t close the tunnel. You will need to restart networkd after adding the file.

+19

lomkju on Feb 15, 2019

So me and @roffe have been testing this issue independently. I was using below image which is latest core os stable release image.

  "us-west-2": {
    "hvm": "ami-b41377cc",
    "pv": "ami-f81c7880"
  }

I ran ip monitor command while the node is booting up i.e) first time kube-router is starting. I see that route get added by subsequently deleted.

172.20.89.123 dev tun-1722089123 table 77 scope link
100.96.1.0/24 dev tun-1722089123 proto 17 src 172.20.52.43
Deleted 100.96.1.0/24 dev tun-1722089123 proto 17 src 172.20.52.43
100.96.0.0/24 via 172.20.34.210 dev eth0 proto 17
100.96.2.0/24 via 172.20.33.115 dev eth0 proto 17

Once i restart the kube-router pod on the node route gets added but is not deleted.

172.20.33.115 dev eth0 lladdr 06:8d:17:26:d1:f8 REACHABLE
100.96.1.0/24 dev tun-1722089123 proto 17 src 172.20.52.43
172.20.33.115 dev eth0 lladdr 06:8d:17:26:d1:f8 STALE

So clerly there is netlink call to delete the route. Now question is who is removing the entry?

I will investigate further to trace the source.

murali-reddy on May 18, 2018