cilium: systemd 245 breaks cilium pod to out-of-node traffic
Bug report
General Information Updating systemd 244.2-2 on Arch to systemd 245.2-1 and 245-3 break pod to out-of-node ipv4 traffic. Reverting to 244.2-2 and rebooting fixes the problem. (ipv6 keeps working on all versions)
I did a sysctl -a diff with 244 vs 245 with cilium running (ready):
< net.ipv4.conf.all.promote_secondaries = 1
> net.ipv4.conf.all.promote_secondaries = 0
< net.ipv4.conf.cilium_host.accept_source_route = 1
> net.ipv4.conf.cilium_host.accept_source_route = 0
< net.ipv4.conf.cilium_host.promote_secondaries = 0
> net.ipv4.conf.cilium_host.promote_secondaries = 1
< net.ipv4.conf.cilium_host.rp_filter = 0
> net.ipv4.conf.cilium_host.rp_filter = 2
< net.ipv4.conf.cilium_net.accept_source_route = 1
> net.ipv4.conf.cilium_net.accept_source_route = 0
< net.ipv4.conf.cilium_net.promote_secondaries = 0
> net.ipv4.conf.cilium_net.promote_secondaries = 1
< net.ipv4.conf.default.accept_source_route = 1
> net.ipv4.conf.default.accept_source_route = 0
< net.ipv4.conf.default.promote_secondaries = 0
> net.ipv4.conf.default.promote_secondaries = 1
< net.ipv4.conf.default.rp_filter = 0
> net.ipv4.conf.default.rp_filter = 2
< net.ipv4.conf.ens192.accept_source_route = 1
> net.ipv4.conf.ens192.accept_source_route = 0
< net.ipv4.conf.ens192.promote_secondaries = 0
> net.ipv4.conf.ens192.promote_secondaries = 1
< net.ipv4.conf.ens192.rp_filter = 0
> net.ipv4.conf.ens192.rp_filter = 2
< net.ipv4.conf.lo.accept_source_route = 1
> net.ipv4.conf.lo.accept_source_route = 0
< net.ipv4.conf.lo.promote_secondaries = 0
> net.ipv4.conf.lo.promote_secondaries = 1
< net.ipv4.conf.lo.rp_filter = 0
> net.ipv4.conf.lo.rp_filter = 2
- Cilium version (run
cilium version
) 1.7.1 - Kernel version (run
uname -a
) Linux k8s22 5.5.10-arch1-1 #1 SMP PREEMPT Wed, 18 Mar 2020 08:40:35 +0000 x86_64 GNU/Linux - Orchestration system version in use (e.g.
kubectl version
, Mesos, …) Kubernetes 1.17.4 - Upload a system dump (run
curl -sLO https://github.com/cilium/cilium-sysdump/releases/latest/download/cilium-sysdump.zip && python cilium-sysdump.zip
and then attach the generated zip file)
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 4
- Comments: 41 (30 by maintainers)
Commits related to this issue
- cilium: set net.ipv4.conf.*.rp_filter to 0 See https://github.com/cilium/cilium/issues/10645 Signed-off-by: Alexander Trost <galexrt@googlemail.com> — committed to vanillastack/vanillastack by galexrt 4 years ago
- Disable endpoint routes and address rp_filter issue directly instead Due to a systemd issue with rp_filter (see cilium/cilium#10645), it's been decided to use endpoint routes with OpenShitft; however... — committed to cilium/cilium-olm by errordeveloper 3 years ago
- enable cilium CNI option (#72) * enable cilium CNI option Signed-off-by: Ryan Holt <ryan@ryanholt.net> * fix some variables Signed-off-by: Ryan Holt <ryan@ryanholt.net> * change variable ... — committed to raspbernetes/k8s-cluster-installation by carpenike 3 years ago
- https://github.com/cilium/cilium/issues/10645 — committed to alvistack/ansible-role-kube_cilium by hswong3i 2 years ago
- https://github.com/cilium/cilium/issues/10645 — committed to alvistack/ansible-role-containerd by hswong3i 2 years ago
- https://github.com/cilium/cilium/issues/10645 — committed to alvistack/ansible-role-docker by hswong3i 2 years ago
- https://github.com/cilium/cilium/issues/10645 — committed to alvistack/ansible-role-cri_o by hswong3i 2 years ago
- https://github.com/cilium/cilium/issues/10645 — committed to alvistack/ansible-role-kube_kubeadm by hswong3i 2 years ago
- Fix cilium issue with systemd > 145 See https://github.com/cilium/cilium/issues/10645 for details — committed to PhilippeChepy/platform by PhilippeChepy 2 years ago
- datapath: Create sysctl `rp_filter` overwrite config on agent init SystemD versions greater than 245 will create sysctl config which sets the `rp_filter` value for all network interfaces to 1. This c... — committed to dylandreimerink/cilium by dylandreimerink 2 years ago
- datapath: Create sysctl `rp_filter` overwrite config on agent init SystemD versions greater than 245 will create sysctl config which sets the `rp_filter` value for all network interfaces to 1. This c... — committed to dylandreimerink/cilium by dylandreimerink 2 years ago
- Fix cilium issue with systemd > 145 See https://github.com/cilium/cilium/issues/10645 for details — committed to PhilippeChepy/platform by PhilippeChepy 2 years ago
- datapath: Create sysctl `rp_filter` overwrite config on agent init SystemD versions greater than 245 will create sysctl config which sets the `rp_filter` value for all network interfaces to 1. This c... — committed to dylandreimerink/cilium by dylandreimerink 2 years ago
- datapath: Create sysctl `rp_filter` overwrite config on agent init SystemD versions greater than 245 will create sysctl config which sets the `rp_filter` value for all network interfaces to 1. This c... — committed to cilium/cilium by dylandreimerink 2 years ago
- datapath: Create sysctl `rp_filter` overwrite config on agent init SystemD versions greater than 245 will create sysctl config which sets the `rp_filter` value for all network interfaces to 1. This c... — committed to dylandreimerink/cilium by dylandreimerink 2 years ago
- datapath: Create sysctl `rp_filter` overwrite config on agent init [ upstream commit 6432558898aa893d8641cd70bdfdc23b31c6e3ee ] SystemD versions greater than 245 will create sysctl config which sets... — committed to cilium/cilium by dylandreimerink 2 years ago
- datapath: Create sysctl `rp_filter` overwrite config on agent init [ upstream commit 6432558898aa893d8641cd70bdfdc23b31c6e3ee ] SystemD versions greater than 245 will create sysctl config which sets... — committed to cilium/cilium by dylandreimerink 2 years ago
I’ve hit the same problem Ubuntu 20.04.
For future googlers on hetzner systems: Check /etc/sysctl.d/99-hetzner.conf, they set net.ipv4.conf.all.rp_filter=1 there.
Same issue with
systemd 248 (248.3-1ubuntu8)
on Ubuntu 21.10 (Impish) withCilium 1.10-rc0
. It was very hard to debug and it would probably be wise to document therp_filter
setting in the official installation docs so new users don’t run into this issue until there is a proper fix in place.The breaking change is in /usr/lib/sysctl.d/50-default.conf https://github.com/systemd/systemd/commit/5d4fc0e665a3639f92ac880896c56f9533441307#diff-7816eed8ca6324f23a690cc5f58e6bf7
a minimal fix for 245 is:
When I added this to 1.10 feature candidates, I was meaning #14955 should resolve this. It’s actually clearer to track #14955 in the project so I’ve removed this one from 1.10. This doesn’t change which release we intend to address this issue more generically.
I have encountered this on OpenShift, which uses CoreOS. I can confirm that following two solutions worked well.
Either write
/etc/sysctl.d/99-override_cilium_rp_filter.conf
with the following contents:Or use
enable-endpoint-routes: "true"
, however if you are using tunnelling mode, you will require either Cilium 1.8.5 (not yet released due to be released soon), or 1.9.0 (also due to be released) (see https://github.com/cilium/cilium/pull/13346).@kkourt I create cilium/cilium-cli#594 as you say. 😃
In case it helps anyone, I can confirm this still happening with
systemd 249 (249.3-1-arch)
,systemd-networkd
enabled, andcilium 1.10.3
. The sysctl override workaround fixed the issue for me (after recreating all pods).Just to weigh in here for anyone else who might be searching for it, this affected my systems on NixOS 21.03 (and may indeed affect previous releases). The
sysctl
configuration mentioned above can be implemented using theboot.kernel.sysctl
configuration optionLong-term, everything will trend towards newer systemd so unless we expect a significant portion of users to be running non-systemd hosts, I don’t think the “detect systemd version” piece makes any meaningful difference; I’d skip that to keep things simple.
Coordinating with systemd seems like the right solution, at the moment my understanding is that the initial lifecycle of the device from creation to configuration by {cilium,systemd} is not done in a coordinated manner, which means that we end up arguing with each other and the last one to perform configuration is the one that wins. Seems like that’s systemd. Therefore, I think we either need to tell systemd how to configure the devices, or we need a mechanism to know when systemd will no longer mess with the configuration at which point we know it’s safe for us to do so. The former approach seems more viable.
As long as we’re happy with the notion of writing up to say ~100 of these on a given node, and re-writing them on the filesystem every time a new pod is deployed (perhaps dozens of times per minute). I suspect this is probably not noticeable from a performance perspective, given how small the configuration would be. Alternative is we write one configuration file and then just have some background monitor that double-checks that the configuration is still OK (for instance, by validating that a random endpoint’s
sysctl
configuration matches what we expect). Just an idea, we do similar sorts of things viapkg/controller
today.It looks like network interface sysctls are applied to new interfaces.
From the docs:
I had a look at the code also, but not quite sure how exactly configuration is applied to each new device, but per above it’s clear that some part of systemd does it for sure (albeit not
systemd-sysctl
itself).It looks like the best fix would be to make either node init or CNI plugin installer drop a config file, and in addition to that all attempt where Cilium tries to write sysctls should check the value after writing and at least log an error/warning in case the intended value doesn’t persist.
Aside from that, another question to ask is - is there a way to enforce sysctls from an eBPF program? Just seems like it could a better way to continuously ensure
rp_filter
setting is correct.A good workaround for this is to enable endpoint-routes
--enable-endpoint-routes
. It enforces symmetric routing. https://github.com/cilium/cilium/pull/13346 is going to fixendpoint-routes
in combination with tunneling.@mvisonneau OK great, would you mind filing a separate bug for that to help track fixing the regression? The output from your last couple of comments on this thread would be a great start for such a bug.