rancher: Rancher 2.5.9 uses flannel:v0.13.0 which causes iptables segfault in RHEL 8.4

Rancher Server Setup

  • Rancher version: 2.5.9
  • Installation option: Helm Chart, RKE1, but the cluster we observe the issue in was entirely created in the Rancher UI

Information about the Cluster

  • Kubernetes version: 1.20.9
  • Cluster Type (Local/Downstream): Downstream, Custom

Describe the bug Nodes running RHEL 8.4 4.18.0-305.12.1.el8_4.x86_64 have the following in dmesg

[Wed Sep  1 16:52:49 2021] iptables-save[999039]: segfault at 80 ip 00007ff2b2f4d964 sp 00007ffff48cca18 error 4 in libnftnl.so.11.2.0[7ff2b2f48000+19000]
[Wed Sep  1 16:52:49 2021] Code: 83 c4 08 5b 5d 41 5c 41 5d c3 0f 1f 40 00 48 83 c4 08 31 c0 5b 5d 41 5c 41 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa <48> 8b 87 80 00 00 00 48 83 ef 80 48 39 f8 74 1b 85 f6 75 0c eb 18

and can not pass overlay traffic.

To Reproduce

  1. Use RKE1 to create a local cluster, use Helm to install Rancher 2.5.9 into local cluster
  2. Use the Rancher UI to create a downstream cluster with Canal CNI and Project Isolation enabled, add RHEL 8.4 nodes with a minimal install and Docker CE 20.10.8
  3. Deploy workloads to the RHEL 8.4 nodes that pass traffic via the overlay
  4. Observe that nodes can not communicate via overlay network
  5. Observe that you can not use the Rancher UI to read the logs of the calico-node and kube-flannel containers on the RHEL 8.4 nodes (the UI says “disconnected”)
  6. Observe that when using docker logs k8s_kube-flannel_canal-XXXX to read the container logs on the RHEL 8.4 nodes you see:
E0901 16:46:29.953388       1 iptables.go:115] Failed to ensure iptables rules: Error checking rule existence: failed to check rule existence: running [/sbin/iptables -t nat -C POSTROUTING ! -s 10.42.0.0/16 -d 10.42.8.0/24 -j RETURN --wait]: exit status -1:
E0901 17:02:11.148676       1 iptables.go:115] Failed to ensure iptables rules: Error checking rule existence: failed to check rule existence: running [/sbin/iptables -t nat -C POSTROUTING -s 10.42.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE --random-fully --wait]: exit status -1:
E0901 17:42:27.498460       1 iptables.go:115] Failed to ensure iptables rules: Error checking rule existence: failed to check rule existence: running [/sbin/iptables -t filter -C FORWARD -d 10.42.0.0/16 -j ACCEPT --wait]: exit status -1:
E0901 19:20:11.934311       1 iptables.go:115] Failed to ensure iptables rules: Error checking rule existence: failed to check rule existence: running [/sbin/iptables -t filter -C FORWARD -d 10.42.0.0/16 -j ACCEPT --wait]: exit status -1:
E0901 20:15:59.272599       1 iptables.go:115] Failed to ensure iptables rules: Error checking rule existence: failed to check rule existence: running [/sbin/iptables -t filter -C FORWARD -s 10.42.0.0/16 -j ACCEPT --wait]: exit status -1:
  1. Observe in dmesg on the nodes that iptables commands are segfaulting:
[Wed Sep  1 17:10:11 2021] iptables-save[1024483]: segfault at 80 ip 00007f39cff4a964 sp 00007fffd5f6d8a8 error 4 in libnftnl.so.11.2.0[7f39cff45000+19000]
[Wed Sep  1 17:10:11 2021] Code: 83 c4 08 5b 5d 41 5c 41 5d c3 0f 1f 40 00 48 83 c4 08 31 c0 5b 5d 41 5c 41 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa <48> 8b 87 80 00 00 00 48 83 ef 80 48 39 f8 74 1b 85 f6 75 0c eb 18

Result The overlay network is not able to correctly pass traffic because the underlying iptables rules are not properly put into place.

Expected Result iptables commands run from within rancher/coreos-flannel:v0.13.0-rancher1 should not segfault.

Additional context This is discussed by the flannel project here: https://github.com/flannel-io/flannel/issues/1408

A PR was merged into flannel on June 28: https://github.com/flannel-io/flannel/pull/1449

SURE-3096

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 23 (9 by maintainers)

Most upvoted comments

Confirmed this still appears to be an issue:

RHEL 8.4 Kernel - 4.18.0-305.19.1.el8_4.x86_64 #1 SMP Tue Sep 7 07:07:31 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux Rancher v 2.6.2 Hyberkube- v1.20.13-rancher1 and v1.21.5-rancher1-1 Flannel - v0.15.1