cilium: systemd 245 breaks cilium pod to out-of-node traffic

Bug report

General Information Updating systemd 244.2-2 on Arch to systemd 245.2-1 and 245-3 break pod to out-of-node ipv4 traffic. Reverting to 244.2-2 and rebooting fixes the problem. (ipv6 keeps working on all versions)

I did a sysctl -a diff with 244 vs 245 with cilium running (ready):

< net.ipv4.conf.all.promote_secondaries = 1
> net.ipv4.conf.all.promote_secondaries = 0
< net.ipv4.conf.cilium_host.accept_source_route = 1
> net.ipv4.conf.cilium_host.accept_source_route = 0
< net.ipv4.conf.cilium_host.promote_secondaries = 0
> net.ipv4.conf.cilium_host.promote_secondaries = 1
< net.ipv4.conf.cilium_host.rp_filter = 0
> net.ipv4.conf.cilium_host.rp_filter = 2
< net.ipv4.conf.cilium_net.accept_source_route = 1
> net.ipv4.conf.cilium_net.accept_source_route = 0
< net.ipv4.conf.cilium_net.promote_secondaries = 0
> net.ipv4.conf.cilium_net.promote_secondaries = 1
< net.ipv4.conf.default.accept_source_route = 1
> net.ipv4.conf.default.accept_source_route = 0
< net.ipv4.conf.default.promote_secondaries = 0
> net.ipv4.conf.default.promote_secondaries = 1
< net.ipv4.conf.default.rp_filter = 0
> net.ipv4.conf.default.rp_filter = 2
< net.ipv4.conf.ens192.accept_source_route = 1
> net.ipv4.conf.ens192.accept_source_route = 0
< net.ipv4.conf.ens192.promote_secondaries = 0
> net.ipv4.conf.ens192.promote_secondaries = 1
< net.ipv4.conf.ens192.rp_filter = 0
> net.ipv4.conf.ens192.rp_filter = 2
< net.ipv4.conf.lo.accept_source_route = 1
> net.ipv4.conf.lo.accept_source_route = 0
< net.ipv4.conf.lo.promote_secondaries = 0
> net.ipv4.conf.lo.promote_secondaries = 1
< net.ipv4.conf.lo.rp_filter = 0
> net.ipv4.conf.lo.rp_filter = 2

Cilium version (run cilium version) 1.7.1
Kernel version (run uname -a) Linux k8s22 5.5.10-arch1-1 #1 SMP PREEMPT Wed, 18 Mar 2020 08:40:35 +0000 x86_64 GNU/Linux
Orchestration system version in use (e.g. kubectl version, Mesos, …) Kubernetes 1.17.4
Upload a system dump (run curl -sLO https://github.com/cilium/cilium-sysdump/releases/latest/download/cilium-sysdump.zip && python cilium-sysdump.zip and then attach the generated zip file)

cilium-sysdump-20200319-221054.zip

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 4
Comments: 41 (30 by maintainers)

Commits related to this issue

cilium: set net.ipv4.conf.*.rp_filter to 0 See https://github.com/cilium/cilium/issues/10645 Signed-off-by: Alexander Trost <galexrt@googlemail.com> — committed to vanillastack/vanillastack by galexrt 4 years ago
Disable endpoint routes and address rp_filter issue directly instead Due to a systemd issue with rp_filter (see cilium/cilium#10645), it's been decided to use endpoint routes with OpenShitft; however... — committed to cilium/cilium-olm by errordeveloper 3 years ago
enable cilium CNI option (#72) * enable cilium CNI option Signed-off-by: Ryan Holt <ryan@ryanholt.net> * fix some variables Signed-off-by: Ryan Holt <ryan@ryanholt.net> * change variable ... — committed to raspbernetes/k8s-cluster-installation by carpenike 3 years ago
https://github.com/cilium/cilium/issues/10645 — committed to alvistack/ansible-role-kube_cilium by hswong3i 2 years ago
https://github.com/cilium/cilium/issues/10645 — committed to alvistack/ansible-role-containerd by hswong3i 2 years ago
https://github.com/cilium/cilium/issues/10645 — committed to alvistack/ansible-role-docker by hswong3i 2 years ago
https://github.com/cilium/cilium/issues/10645 — committed to alvistack/ansible-role-cri_o by hswong3i 2 years ago
https://github.com/cilium/cilium/issues/10645 — committed to alvistack/ansible-role-kube_kubeadm by hswong3i 2 years ago
Fix cilium issue with systemd > 145 See https://github.com/cilium/cilium/issues/10645 for details — committed to PhilippeChepy/platform by PhilippeChepy 2 years ago
datapath: Create sysctl `rp_filter` overwrite config on agent init SystemD versions greater than 245 will create sysctl config which sets the `rp_filter` value for all network interfaces to 1. This c... — committed to dylandreimerink/cilium by dylandreimerink 2 years ago
datapath: Create sysctl `rp_filter` overwrite config on agent init SystemD versions greater than 245 will create sysctl config which sets the `rp_filter` value for all network interfaces to 1. This c... — committed to dylandreimerink/cilium by dylandreimerink 2 years ago
Fix cilium issue with systemd > 145 See https://github.com/cilium/cilium/issues/10645 for details — committed to PhilippeChepy/platform by PhilippeChepy 2 years ago
datapath: Create sysctl `rp_filter` overwrite config on agent init SystemD versions greater than 245 will create sysctl config which sets the `rp_filter` value for all network interfaces to 1. This c... — committed to dylandreimerink/cilium by dylandreimerink 2 years ago
datapath: Create sysctl `rp_filter` overwrite config on agent init SystemD versions greater than 245 will create sysctl config which sets the `rp_filter` value for all network interfaces to 1. This c... — committed to cilium/cilium by dylandreimerink 2 years ago
datapath: Create sysctl `rp_filter` overwrite config on agent init SystemD versions greater than 245 will create sysctl config which sets the `rp_filter` value for all network interfaces to 1. This c... — committed to dylandreimerink/cilium by dylandreimerink 2 years ago
datapath: Create sysctl `rp_filter` overwrite config on agent init [ upstream commit 6432558898aa893d8641cd70bdfdc23b31c6e3ee ] SystemD versions greater than 245 will create sysctl config which sets... — committed to cilium/cilium by dylandreimerink 2 years ago
datapath: Create sysctl `rp_filter` overwrite config on agent init [ upstream commit 6432558898aa893d8641cd70bdfdc23b31c6e3ee ] SystemD versions greater than 245 will create sysctl config which sets... — committed to cilium/cilium by dylandreimerink 2 years ago

Most upvoted comments

I’ve hit the same problem Ubuntu 20.04.

For future googlers on hetzner systems: Check /etc/sysctl.d/99-hetzner.conf, they set net.ipv4.conf.all.rp_filter=1 there.

+10

mmack on Sep 6, 2020

Same issue with systemd 248 (248.3-1ubuntu8) on Ubuntu 21.10 (Impish) with Cilium 1.10-rc0. It was very hard to debug and it would probably be wise to document the rp_filter setting in the official installation docs so new users don’t run into this issue until there is a proper fix in place.

sfrode on Oct 19, 2021

The breaking change is in /usr/lib/sysctl.d/50-default.conf https://github.com/systemd/systemd/commit/5d4fc0e665a3639f92ac880896c56f9533441307#diff-7816eed8ca6324f23a690cc5f58e6bf7

a minimal fix for 245 is:

echo 'net.ipv4.conf.lxc*.rp_filter = 0' | sudo tee -a /etc/sysctl.d/90-override.conf && sudo systemctl start systemd-sysctl

nberlee on Mar 19, 2020

When I added this to 1.10 feature candidates, I was meaning #14955 should resolve this. It’s actually clearer to track #14955 in the project so I’ve removed this one from 1.10. This doesn’t change which release we intend to address this issue more generically.

joestringer on Feb 12, 2021

I have encountered this on OpenShift, which uses CoreOS. I can confirm that following two solutions worked well.

Either write /etc/sysctl.d/99-override_cilium_rp_filter.conf with the following contents:

net.ipv4.conf.lxc*.rp_filter = 0
net.ipv4.conf.cilium_*.rp_filter = 0

Or use enable-endpoint-routes: "true", however if you are using tunnelling mode, you will require either Cilium 1.8.5 (not yet released due to be released soon), or 1.9.0 (also due to be released) (see https://github.com/cilium/cilium/pull/13346).

errordeveloper on Oct 12, 2020

@kkourt I create cilium/cilium-cli#594 as you say. 😃

soroshsabz on Oct 25, 2021

In case it helps anyone, I can confirm this still happening with systemd 249 (249.3-1-arch), systemd-networkd enabled, and cilium 1.10.3. The sysctl override workaround fixed the issue for me (after recreating all pods).

roobre on Aug 15, 2021

Just to weigh in here for anyone else who might be searching for it, this affected my systems on NixOS 21.03 (and may indeed affect previous releases). The sysctl configuration mentioned above can be implemented using the boot.kernel.sysctl configuration option

cmacrae on Feb 17, 2021

could detect systemd version and write out configs for each interface, how would that sound to people?

Long-term, everything will trend towards newer systemd so unless we expect a significant portion of users to be running non-systemd hosts, I don’t think the “detect systemd version” piece makes any meaningful difference; I’d skip that to keep things simple.

Coordinating with systemd seems like the right solution, at the moment my understanding is that the initial lifecycle of the device from creation to configuration by {cilium,systemd} is not done in a coordinated manner, which means that we end up arguing with each other and the last one to perform configuration is the one that wins. Seems like that’s systemd. Therefore, I think we either need to tell systemd how to configure the devices, or we need a mechanism to know when systemd will no longer mess with the configuration at which point we know it’s safe for us to do so. The former approach seems more viable.

If we were to just write one file with wildcard as documented, there is a good chance it can get overwritten, deleted or shadowed by another config file. But having a config file for each interface will reduce the chances of collision.

As long as we’re happy with the notion of writing up to say ~100 of these on a given node, and re-writing them on the filesystem every time a new pod is deployed (perhaps dozens of times per minute). I suspect this is probably not noticeable from a performance perspective, given how small the configuration would be. Alternative is we write one configuration file and then just have some background monitor that double-checks that the configuration is still OK (for instance, by validating that a random endpoint’s sysctl configuration matches what we expect). Just an idea, we do similar sorts of things via pkg/controller today.

joestringer on Jan 29, 2021

It looks like network interface sysctls are applied to new interfaces.

From the docs:

The settings configured with sysctl.d files will be applied early on boot. The network interface-specific options will also be applied individually for each network interface as it shows up in the system. (More specifically, net.ipv4.conf.*, net.ipv6.conf.*, net.ipv4.neigh.* and net.ipv6.neigh.*).

I had a look at the code also, but not quite sure how exactly configuration is applied to each new device, but per above it’s clear that some part of systemd does it for sure (albeit not systemd-sysctl itself).

It looks like the best fix would be to make either node init or CNI plugin installer drop a config file, and in addition to that all attempt where Cilium tries to write sysctls should check the value after writing and at least log an error/warning in case the intended value doesn’t persist.

Aside from that, another question to ask is - is there a way to enforce sysctls from an eBPF program? Just seems like it could a better way to continuously ensure rp_filter setting is correct.

errordeveloper on Jan 29, 2021

A good workaround for this is to enable endpoint-routes --enable-endpoint-routes. It enforces symmetric routing. https://github.com/cilium/cilium/pull/13346 is going to fix endpoint-routes in combination with tunneling.

tgraf on Sep 30, 2020

@mvisonneau OK great, would you mind filing a separate bug for that to help track fixing the regression? The output from your last couple of comments on this thread would be a great start for such a bug.

joestringer on Sep 10, 2020