rancher: Rancher 2.5.9 uses flannel:v0.13.0 which causes iptables segfault in RHEL 8.4
Rancher Server Setup
- Rancher version: 2.5.9
- Installation option: Helm Chart, RKE1, but the cluster we observe the issue in was entirely created in the Rancher UI
Information about the Cluster
- Kubernetes version: 1.20.9
- Cluster Type (Local/Downstream): Downstream, Custom
Describe the bug Nodes running RHEL 8.4 4.18.0-305.12.1.el8_4.x86_64 have the following in dmesg
[Wed Sep 1 16:52:49 2021] iptables-save[999039]: segfault at 80 ip 00007ff2b2f4d964 sp 00007ffff48cca18 error 4 in libnftnl.so.11.2.0[7ff2b2f48000+19000]
[Wed Sep 1 16:52:49 2021] Code: 83 c4 08 5b 5d 41 5c 41 5d c3 0f 1f 40 00 48 83 c4 08 31 c0 5b 5d 41 5c 41 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa <48> 8b 87 80 00 00 00 48 83 ef 80 48 39 f8 74 1b 85 f6 75 0c eb 18
and can not pass overlay traffic.
To Reproduce
- Use RKE1 to create a local cluster, use Helm to install Rancher 2.5.9 into local cluster
- Use the Rancher UI to create a downstream cluster with Canal CNI and Project Isolation enabled, add RHEL 8.4 nodes with a minimal install and Docker CE 20.10.8
- Deploy workloads to the RHEL 8.4 nodes that pass traffic via the overlay
- Observe that nodes can not communicate via overlay network
- Observe that you can not use the Rancher UI to read the logs of the
calico-node
andkube-flannel
containers on the RHEL 8.4 nodes (the UI says “disconnected”) - Observe that when using
docker logs k8s_kube-flannel_canal-XXXX
to read the container logs on the RHEL 8.4 nodes you see:
E0901 16:46:29.953388 1 iptables.go:115] Failed to ensure iptables rules: Error checking rule existence: failed to check rule existence: running [/sbin/iptables -t nat -C POSTROUTING ! -s 10.42.0.0/16 -d 10.42.8.0/24 -j RETURN --wait]: exit status -1:
E0901 17:02:11.148676 1 iptables.go:115] Failed to ensure iptables rules: Error checking rule existence: failed to check rule existence: running [/sbin/iptables -t nat -C POSTROUTING -s 10.42.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE --random-fully --wait]: exit status -1:
E0901 17:42:27.498460 1 iptables.go:115] Failed to ensure iptables rules: Error checking rule existence: failed to check rule existence: running [/sbin/iptables -t filter -C FORWARD -d 10.42.0.0/16 -j ACCEPT --wait]: exit status -1:
E0901 19:20:11.934311 1 iptables.go:115] Failed to ensure iptables rules: Error checking rule existence: failed to check rule existence: running [/sbin/iptables -t filter -C FORWARD -d 10.42.0.0/16 -j ACCEPT --wait]: exit status -1:
E0901 20:15:59.272599 1 iptables.go:115] Failed to ensure iptables rules: Error checking rule existence: failed to check rule existence: running [/sbin/iptables -t filter -C FORWARD -s 10.42.0.0/16 -j ACCEPT --wait]: exit status -1:
- Observe in dmesg on the nodes that iptables commands are segfaulting:
[Wed Sep 1 17:10:11 2021] iptables-save[1024483]: segfault at 80 ip 00007f39cff4a964 sp 00007fffd5f6d8a8 error 4 in libnftnl.so.11.2.0[7f39cff45000+19000]
[Wed Sep 1 17:10:11 2021] Code: 83 c4 08 5b 5d 41 5c 41 5d c3 0f 1f 40 00 48 83 c4 08 31 c0 5b 5d 41 5c 41 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa <48> 8b 87 80 00 00 00 48 83 ef 80 48 39 f8 74 1b 85 f6 75 0c eb 18
Result The overlay network is not able to correctly pass traffic because the underlying iptables rules are not properly put into place.
Expected Result
iptables commands run from within rancher/coreos-flannel:v0.13.0-rancher1
should not segfault.
Additional context This is discussed by the flannel project here: https://github.com/flannel-io/flannel/issues/1408
A PR was merged into flannel on June 28: https://github.com/flannel-io/flannel/pull/1449
SURE-3096
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 23 (9 by maintainers)
Confirmed this still appears to be an issue:
RHEL 8.4 Kernel - 4.18.0-305.19.1.el8_4.x86_64 #1 SMP Tue Sep 7 07:07:31 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux Rancher v 2.6.2 Hyberkube- v1.20.13-rancher1 and v1.21.5-rancher1-1 Flannel - v0.15.1