cilium: Host network broken after one of the underlying interfaces of a bond goes down
Is there an existing issue for this?
- I have searched the existing issues
What happened?
On Equinix Metal the network setup is a bond of two NICs using LACP. When Cilium is used as CNI for Kubernetes on Flatcar Container Linux, and one of the two NIC interfaces goes down, the host network is broken and remains broken even if the underlying interface goes up again.
The network looks normal but ping 1.1 leads to no outgoing packets visible in tcpdump -i bond0 (ping reports 100% packet loss) and ping 127.0.0.1 leads to ping: sendmsg: Operation not permitted.
We could not restore the network even after terminating the Pods and kubelet on the node, flushing nft and deleting the Cilium interfaces (maybe BPF programs are still loaded and not cleaned up?)
IPv6 is not affected, ping6 2606:4700:4700::1111 works.
Cilium Version
1.9, 1.10, 1.11
Kernel Version
from 5.10.52 to 5.10.84
Kubernetes Version
1.22
Sysdump
🔍 Collecting Kubernetes nodes
failed to create sysdump collector: failed to collect Kubernetes nodes: Get "https://136.144.49.47:6443/api/v1/nodes": dial tcp 136.144.49.47:6443: i/o timeout
Relevant log output
Kernel
Feb 04 15:50:06 kernel: bond0: (slave enp2s0f0np0): link status down for interface, disabling it in 200 ms
Feb 04 15:50:06 kernel: bond0: (slave enp2s0f0np0): link status down for interface, disabling it in 200 ms
Feb 04 15:50:06 kernel: bond0: (slave enp2s0f0np0): link status down for interface, disabling it in 200 ms
Feb 04 15:50:06 kernel: bond0: (slave enp2s0f0np0): link status down for interface, disabling it in 200 ms
Feb 04 15:50:06 kernel: bond0: (slave enp2s0f0np0): invalid new link 1 on slave
Feb 04 15:50:06 kernel: mlx5_core 0000:02:00.0: modify lag map port 1:2 port 2:2
Feb 04 15:50:06 kernel: bond0: (slave enp2s0f0np0): link status definitely down, disabling slave
Feb 04 15:51:09 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): lxc_health: link becomes ready
Feb 04 15:51:10 kernel: lxc_health: Caught tx_queue_len zero misconfig
Feb 04 15:51:33 kernel: mlx5_core 0000:02:00.0 enp2s0f0np0: Link down
Feb 04 15:51:33 kernel: mlx5_core 0000:02:00.0 enp2s0f0np0: Link up
Feb 04 15:51:33 kernel: bond0: (slave enp2s0f0np0): link status up again after 200 ms
Feb 04 15:51:33 kernel: bond0: (slave enp2s0f0np0): link status definitely up, 10000 Mbps full duplex
Feb 04 15:51:35 kernel: mlx5_core 0000:02:00.0: modify lag map port 1:1 port 2:2
Anything else?
Flatcar releases 2905.x.y to 3033.x.y are affected, running systemd from 247 to 249 (maybe relevant because systemd-networkd is used)
Flatcar releases 2764.x.y are not affected (kernel 5.10.43, systemd 247)
Reproduce it by provisioning an Equinix Metal machine with Flatcar Stable (used c3.small.x86).
Ensure it is on the latest version:
update_engine_client -update
sudo rm -f /etc/systemd/system/containerd.service.d/10-use-cgroupfs.conf
sudo sed -i 's/systemd.unified_cgroup_hierarchy=0 systemd.legacy_systemd_cgroup_controller//' /usr/share/oem/grub.cfg
sudo systemctl reboot
Set up a one-node Cilium cluster, using the script contents at the end:
sudo ./install.sh
Then, this action is valid and should not any harm, but now does:
sudo ip link set enp2s0f0np0 down
(and sudo ip link set enp2s0f0np0 up does not help)
The install.sh script used above:
#!/bin/bash
set -xe
systemctl enable --now docker
modprobe br_netfilter
cat <<EOF | tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF
cat <<EOF | tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sysctl --system
CNI_VERSION="v0.8.2"
CRICTL_VERSION="v1.17.0"
RELEASE_VERSION="v0.4.0"
DOWNLOAD_DIR=/opt/bin
RELEASE="$(curl -sSL https://dl.k8s.io/release/stable.txt)"
mkdir -p /opt/cni/bin
mkdir -p /etc/systemd/system/kubelet.service.d
curl() {
command curl -sSfL "$@"
}
curl "https://github.com/containernetworking/plugins/releases/download/${CNI_VERSION}/cni-plugins-linux-amd64-${CNI_VERSION}.tgz" | tar -C /opt/cni/bin -xz
curl "https://github.com/kubernetes-sigs/cri-tools/releases/download/${CRICTL_VERSION}/crictl-${CRICTL_VERSION}-linux-amd64.tar.gz" | tar -C $DOWNLOAD_DIR -xz
curl "https://raw.githubusercontent.com/kubernetes/release/${RELEASE_VERSION}/cmd/kubepkg/templates/latest/deb/kubelet/lib/systemd/system/kubelet.service" | sed "s:/usr/bin:${DOWNLOAD_DIR}:g" | tee /etc/systemd/system/kubelet.service
curl "https://raw.githubusercontent.com/kubernetes/release/${RELEASE_VERSION}/cmd/kubepkg/templates/latest/deb/kubeadm/10-kubeadm.conf" | sed "s:/usr/bin:${DOWNLOAD_DIR}:g" | tee /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
curl --remote-name-all https://storage.googleapis.com/kubernetes-release/release/${RELEASE}/bin/linux/amd64/{kubeadm,kubelet,kubectl}
curl -L --remote-name-all https://github.com/cilium/cilium-cli/releases/latest/download/cilium-linux-amd64.tar.gz{,.sha256sum}
sha256sum --check cilium-linux-amd64.tar.gz.sha256sum
sudo tar xzvfC cilium-linux-amd64.tar.gz /opt/bin
rm cilium-linux-amd64.tar.gz{,.sha256sum}
chmod +x {kubeadm,kubelet,kubectl}
mv {kubeadm,kubelet,kubectl} $DOWNLOAD_DIR/
systemctl enable --now kubelet
#systemctl status kubelet
cat <<EOF | tee kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
controllerManager:
extraArgs:
flex-volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"
networking:
podSubnet: "192.168.254.0/24"
EOF
# For explicit cgroupdriver selection
# ---
# kind: KubeletConfiguration
# apiVersion: kubelet.config.k8s.io/v1beta1
# cgroupDriver: systemd
# For containerd
# apiVersion: kubeadm.k8s.io/v1beta2
# kind: InitConfiguration
# nodeRegistration:
# criSocket: "unix:///run/containerd/containerd.sock
export PATH=$PATH:$DOWNLOAD_DIR
kubeadm config images pull
kubeadm init --config kubeadm-config.yaml
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
kubectl create -f https://raw.githubusercontent.com/cilium/cilium/v1.9.4/install/kubernetes/quick-install.yaml
kubectl taint nodes --all node-role.kubernetes.io/master-
kubectl get pods -A
kubectl get nodes -o wide
kubectl apply -f https://k8s.io/examples/application/deployment.yaml
kubectl expose deployment.apps/nginx-deployment
curl -L --remote-name-all https://github.com/cilium/cilium-cli/releases/latest/download/cilium-linux-amd64.tar.gz{,.sha256sum}
tar xzvfC cilium-linux-amd64.tar.gz /opt/bin
rm cilium-linux-amd64.tar.gz{,.sha256sum}
Code of Conduct
- I agree to follow this project’s Code of Conduct
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 44 (15 by maintainers)
Commits related to this issue
- systemd: disable foreign route management While systemd-networkd follows the principle of a declarative network configuration and thus needs a way to ensure that unwanted routes or routing policy rul... — committed to flatcar/init by pothos 2 years ago
- systemd: disable foreign route management While systemd-networkd follows the principle of a declarative network configuration and thus needs a way to ensure that unwanted routes or routing policy rul... — committed to flatcar/init by pothos 2 years ago
- systemd: disable foreign route management While systemd-networkd follows the principle of a declarative network configuration and thus needs a way to ensure that unwanted routes or routing policy rul... — committed to flatcar/init by pothos 2 years ago
- init.sh: install ip rules with proto kernel In order to workaround systemd's bad recent changes where they decided to manage "foreign" rules and to flush them on certain events (e.g. device flap), we... — committed to cilium/cilium by deleted user a year ago
- egressgw: use proto kernel for fib routes and rules Use RTPROT_KERNEL (proto kernel) when installing routes and rules in egress gateway to make sure systemd doesn't play with them. For more informati... — committed to cilium/cilium by deleted user a year ago
- datapath/loader: use proto kernel for ENI fib rules Use RTPROT_KERNEL (proto kernel) when installing ENI fib rules to make sure systemd doesn't play with them. For more information see [1]. [1] http... — committed to cilium/cilium by deleted user a year ago
- datapath/linux/routing: use proto kernel for fib routes and rules Use RTPROT_KERNEL (proto kernel) when installing fib routes and rules to make sure systemd doesn't play with them. The migration code... — committed to cilium/cilium by deleted user a year ago
- datapath/linux/node: use proto kernel for fib rules and routes Use RTPROT_KERNEL (proto kernel) when installing fib rules and routes to make sure systemd doesn't play with them. Note that the patch d... — committed to cilium/cilium by deleted user a year ago
- egressgw: use proto kernel for fib routes and rules Use RTPROT_KERNEL (proto kernel) when installing routes and rules in egress gateway to make sure systemd doesn't play with them. For more informati... — committed to cilium/cilium by deleted user a year ago
- datapath/loader: use proto kernel for ENI fib rules and routes Use RTPROT_KERNEL (proto kernel) when installing ENI fib rules and routes to make sure systemd doesn't play with them. For more informat... — committed to cilium/cilium by deleted user a year ago
- datapath/linux/routing: use proto kernel for fib routes and rules Use RTPROT_KERNEL (proto kernel) when installing fib routes and rules to make sure systemd doesn't play with them. The migration code... — committed to cilium/cilium by deleted user a year ago
- datapath/linux/node: use proto kernel for fib rules and routes Use RTPROT_KERNEL (proto kernel) when installing fib rules and routes to make sure systemd doesn't play with them. Note that the patch d... — committed to cilium/cilium by deleted user a year ago
- egressgw: use proto kernel for fib routes and rules Use RTPROT_KERNEL (proto kernel) when installing routes and rules in egress gateway to make sure systemd doesn't play with them. For more informati... — committed to cilium/cilium by deleted user a year ago
I faced this today on a very simple Ubuntu 22.04 install - simply running
sudo netplan applymanually was enough to break the host network after Cilium had been running. Settingin
/etc/systemd/networkd.confdid indeed fix it, so I wonder if we should document that?We also hit the same issue with flatcar 3033.2.1 and systemd 249.4. It seems that systemd-networkd removed routing policy rules including local and the host couldn’t recognize localhost because of it. Cilium moves the policy rule for local when L7Poxy is enabled and systemd-networkd regards this local rule as a foreign routing policy and removes it.
network: drop unnecessary routing policy rules https://github.com/systemd/systemd/commit/0b81225e5791f660506f7db0ab88078cf296b771
For Flatcar Stable this works in
/etc/systemd/networkd.conf:Then the additional
IgnoreCarrierLoss=yesandKeepConfiguration=yesworkarounds mentioned in https://github.com/cilium/cilium/issues/18706#issuecomment-1031470456 are not needed.This is part of
quay.io/cilium/cilium:v1.14.0-snapshot.3for testing and will go into 1.14.0.https://github.com/cilium/cilium/releases/tag/v1.14.0-snapshot.3
I just ran into this problem, and it broke both IPv4 and IPv6. The solution in https://github.com/cilium/cilium/issues/18706#issuecomment-1031572546 did fix IPv4. It would have saved me a lot of time if this had been documented in an obvious place. Please consider documenting it in the installation manual, or better yet, fix it automatically.
yes it seems to have. That’s the default configuration into
/etc/systemd/network.confI do
Then setup cilium again and no more host network breaking during cilium network device creation
Ok, this manually brought it to working state:
Not sure if it’s easy to create a networkd unit that does the same. (If it is possible it would be a way to avoid requiring manual tweaking of global networkd settings if Cilium writes a networkd unit into the host’s
/runto play well with networkd by default.)You can use
ManageForeignRoutingPolicyRules=noto protect policies. https://github.com/systemd/systemd/commit/d94dfe7053d49fa62c4bfc07b7f3fc2227c10affHowever, things are a little bit complicated because of this bug. This bug is backported to systems v249.5. With systemd v249.4, you need to set both IgnoreCarrierLoss=yes and KeepConfiguration=yes as well and never use
networkctl reconfigureornetworkctl reload. Or you need to create policy rules before networkd starts. I think the latter is difficult in our case.Only
ManageForeignRoutingPolicyRules=nois sufficient if you can use systemd v249.5.Having same problem with:
on Ubuntu Server 22.04.2 LTS:
with 1.14.0.snapshot.2
once I stopped k3s with k3s-killall.sh - network access is lost, but host still replies to pings
I was thinking of a start-up check. The issue is we faced was when switch maintenance, we lost an entire Kubernetes cluster because every node had lost host networking, and restarting was the fastest fix to restore service. In effect, these
systemd-networkd+ciliumconfigurations are dangerous for a production environment where I anticipated to have LACP up and working.The
networkddrop-ins do resolve my particular use case, however, I can’t be the onlyciliumuser on a similar setup.Thanks @ysksuzuki for the findings
Right, with
Unmanaged=yesthe rules are ignored and the only way for Cilium to tell networkd to preserve them is by putting them into an active.networkunit (e.g., for a dummy interface)…For reference, this is the translation into the networkd syntax:
I think the networkd people had their reasons already from a “server” profile perspective. The way forward is that Flatcar, as a distro that expects Cilium to be running, would predefine the global setting like we do for similar cases. Maybe it makes sense for Cilium to add a check in the
cilium installphase whether networkd is used and then recommending to change the default (plus havingUnmanaged=yesis also a good idea even though it is only needed if people try to match too generically in their own networkd units as happend with Flatcar’s default unit).On the topic of the disappearing
loentry withoutManageForeignRoutingPolicyRules=no(on Flatcar Alpha), without Cilium running the down/up action of the underlying device has no impact and0: from all lookup localstays in the list. With Cilium the rule is gone… Edit: Now I readCilium moves the policy rule for local when L7Poxy is enabled and systemd-networkd regards this local rule as a foreign routing policy and removes it.again, that explains it.These are the local addresses, pinging them gives packet loss while pinging 127.0.0.1 gives the permission denied issue as mentioned above.
Sure, I realize now that the quick-install.yaml used a hardcoded older rversion, will do it again.