k3s: pod ipv6 traffic unable to egress from cluster
Environmental Info: K3s Version:
k3s version v1.23.6+k3s1 (418c3fa8)
go version go1.17.5
Node(s) CPU architecture, OS, and Version:
[craigcabrey@littleboi ~]$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
alto Ready,SchedulingDisabled <none> 24d v1.23.6+k3s1 192.168.1.131 <none> CentOS Stream 8 4.18.0-394.el8.x86_64 containerd://1.5.11-k3s2
nas Ready <none> 23d v1.23.6+k3s1 192.168.1.137 <none> CentOS Stream 8 4.18.0-394.el8.x86_64 containerd://1.5.11-k3s2
nuc-node-1 Ready <none> 5d15h v1.23.6+k3s1 192.168.1.197 <none> CentOS Stream 9 5.14.0-109.el9.x86_64 containerd://1.5.11-k3s2
nuc-node-2 Ready <none> 5d18h v1.23.6+k3s1 192.168.1.210 <none> CentOS Stream 9 5.14.0-109.el9.x86_64 containerd://1.5.11-k3s2
nuc-node-3 Ready <none> 5d19h v1.23.6+k3s1 192.168.1.42 <none> CentOS Stream 9 5.14.0-109.el9.x86_64 containerd://1.5.11-k3s2
pi-node-1 Ready,SchedulingDisabled control-plane,etcd,master 29d v1.23.6+k3s1 192.168.1.48 <none> Debian GNU/Linux 11 (bullseye) 5.15.32-v8+ containerd://1.5.11-k3s2
pi-node-2 Ready,SchedulingDisabled control-plane,etcd,master 29d v1.23.6+k3s1 192.168.1.179 <none> Debian GNU/Linux 11 (bullseye) 5.15.32-v8+ containerd://1.5.11-k3s2
pi-node-3 Ready,SchedulingDisabled control-plane,etcd,master 29d v1.23.6+k3s1 192.168.1.139 <none> Debian GNU/Linux 11 (bullseye) 5.15.32-v8+ containerd://1.5.11-k3s2
pi-node-4 Ready <none> 28d v1.23.6+k3s1 192.168.1.123 <none> Debian GNU/Linux 11 (bullseye) 5.15.32-v8+ containerd://1.5.11-k3s2
pi-node-5 Ready <none> 28d v1.23.6+k3s1 192.168.1.44 <none> Debian GNU/Linux 11 (bullseye) 5.15.32-v8+ containerd://1.5.11-k3s2
pi-node-6 Ready <none> 28d v1.23.6+k3s1 192.168.1.7 <none> Debian GNU/Linux 11 (bullseye) 5.15.32-v8+ containerd://1.5.11-k3s2
Cluster Configuration:
- 11 nodes
- 3 control plane
- dual stack configured
cluster init’d as follows (please note the flannel ipv6 masq option):
ExecStart=/usr/local/bin/k3s \
server \
'--cluster-init' \
'--flannel-backend=vxlan' \
'--flannel-ipv6-masq' \
'--node-ip' \
'192.168.1.48,2605:[snip]' \
'--cluster-cidr' \
'10.42.0.0/16,2001:cafe:42:0::/56' \
'--service-cidr' \
'10.43.0.0/16,2001:cafe:42:1::/112' \
'--disable' \
'traefik' \
'--disable' \
'servicelb' \
flannel:
root@pi-node-1:~# cat /var/lib/rancher/k3s/agent/etc/flannel/net-conf.json
{
"Network": "10.42.0.0/16",
"EnableIPv6": true,
"EnableIPv4": true,
"IPv6Network": "2001:cafe:42::/56",
"Backend": {
"Type": "vxlan"
}
}
Note: I stupidly used the IPv6 CIDR in the example docs rather than using a /64 from the /56 I get from my ISP. This may be the problem, but my limited understanding is that I could use IPv6 masq (NAT) to shield myself from the mistake.
Describe the bug:
I am unable to get traffic out of the cluster via IPv6. Ingress via an ingress controller that has an IPv4 & IPv6 address assigned via metallb works fine. Metallb is configured to hand out 192.168.100/24 and 2605:[snip]::100-2605:[snip]::ffff. Inter-pod IPv6 traffic (via services) also works fine.
Two pods running to illustrate the issue:
pod-1 1/1 Running 0 42m 10.42.13.138 nuc-node-1 <none> <none>
pod-2 1/1 Running 0 42m 10.42.12.55 nuc-node-2 <none> <none>
Pod 1, showing pod IP, traffic to pod-2 IPv6 addr, and traffic to node IPv6 addr:
root@pod-1:/# ip -6 a show dev eth0
2: eth0@if881: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default link-netnsid 0
inet6 2001:cafe:42:d::36c/64 scope global
valid_lft forever preferred_lft forever
inet6 fe80::4cfe:6aff:fe9d:c53a/64 scope link
valid_lft forever preferred_lft forever
root@pod-1:/# ping6 -c3 2001:cafe:42:c::40c
PING 2001:cafe:42:c::40c(2001:cafe:42:c::40c) 56 data bytes
64 bytes from 2001:cafe:42:c::40c: icmp_seq=1 ttl=62 time=0.634 ms
64 bytes from 2001:cafe:42:c::40c: icmp_seq=2 ttl=62 time=0.568 ms
64 bytes from 2001:cafe:42:c::40c: icmp_seq=3 ttl=62 time=0.643 ms
--- 2001:cafe:42:c::40c ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2032ms
rtt min/avg/max/mdev = 0.568/0.615/0.643/0.033 ms
root@pod-1:/# ping -c3 2605:[snip]::34d
PING 2605:[snip]::34d(2605:[snip]::34d) 56 data bytes
64 bytes from 2605:[snip]::34d: icmp_seq=1 ttl=64 time=0.067 ms
64 bytes from 2605:[snip]::34d: icmp_seq=2 ttl=64 time=0.077 ms
64 bytes from 2605:[snip]::34d: icmp_seq=3 ttl=64 time=0.061 ms
--- 2605:[snip]::34d ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2055ms
rtt min/avg/max/mdev = 0.061/0.068/0.077/0.006 ms
Pod 2:
root@pod-2:/# ip -6 a show dev eth0
2: eth0@if1049: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default link-netnsid 0
inet6 2001:cafe:42:c::40c/64 scope global
valid_lft forever preferred_lft forever
inet6 fe80::1c9b:acff:fedf:68d3/64 scope link
valid_lft forever preferred_lft forever
root@pod-2:/# ping6 -c3 2001:cafe:42:d::36c/64
ping6: 2001:cafe:42:d::36c/64: Name or service not known
root@pod-2:/# ping6 -c3 2001:cafe:42:d::36c
PING 2001:cafe:42:d::36c(2001:cafe:42:d::36c) 56 data bytes
64 bytes from 2001:cafe:42:d::36c: icmp_seq=1 ttl=62 time=0.567 ms
64 bytes from 2001:cafe:42:d::36c: icmp_seq=2 ttl=62 time=0.530 ms
64 bytes from 2001:cafe:42:d::36c: icmp_seq=3 ttl=62 time=0.661 ms
Traffic to public IPv6 (Google DNS) from node:
[root@nuc-node-1 ~]# ping -c3 2001:4860:4860::8888
PING 2001:4860:4860::8888(2001:4860:4860::8888) 56 data bytes
64 bytes from 2001:4860:4860::8888: icmp_seq=1 ttl=117 time=7.07 ms
64 bytes from 2001:4860:4860::8888: icmp_seq=2 ttl=117 time=7.14 ms
64 bytes from 2001:4860:4860::8888: icmp_seq=3 ttl=117 time=7.02 ms
--- 2001:4860:4860::8888 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 7.015/7.076/7.140/0.051 ms
Traffic to public IPv6 (Google DNS) from pod-1:
root@pod-1:/# ping6 -c3 2001:4860:4860::8888
PING 2001:4860:4860::8888(2001:4860:4860::8888) 56 data bytes
^C
--- 2001:4860:4860::8888 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2049ms
Steps To Reproduce:
- Installed K3s using the configuration noted above
Expected behavior: Traffic can egress from the cluster via the IPv6 stack.
Actual behavior: Packet loss.
Additional context / logs:
There are rules in FORWARD for IPv4 that are not present for IPv6. This looked very suspicious to me:
[root@nuc-node-2 ~]# iptables -L FORWARD
Chain FORWARD (policy ACCEPT)
target prot opt source destination
KUBE-ROUTER-FORWARD all -- anywhere anywhere /* kube-router netpol - TEMCG2JMHZYE7H7T */
KUBE-FORWARD all -- anywhere anywhere /* kubernetes forwarding rules */
KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */
KUBE-EXTERNAL-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes externally-visible service portals */
ACCEPT all -- 10.42.0.0/16 anywhere /* flanneld forward */
ACCEPT all -- anywhere 10.42.0.0/16 /* flanneld forward */
[root@nuc-node-2 ~]# ip6tables -L FORWARD
Chain FORWARD (policy ACCEPT)
target prot opt source destination
KUBE-ROUTER-FORWARD all anywhere anywhere /* kube-router netpol - TEMCG2JMHZYE7H7T */
KUBE-FORWARD all anywhere anywhere /* kubernetes forwarding rules */
KUBE-SERVICES all anywhere anywhere ctstate NEW /* kubernetes service portals */
KUBE-EXTERNAL-SERVICES all anywhere anywhere ctstate NEW /* kubernetes externally-visible service portals */
routes seem fine:
Example node:
[root@nuc-node-2 ~]# ip -6 r
::1 dev lo proto kernel metric 256 pref medium
2001:cafe:42::/64 via 2001:cafe:42:: dev flannel-v6.1 metric 1024 onlink pref medium
2001:cafe:42:1::/64 via 2001:cafe:42:1:: dev flannel-v6.1 metric 1024 onlink pref medium
2001:cafe:42:2::/64 via 2001:cafe:42:2:: dev flannel-v6.1 metric 1024 onlink pref medium
2001:cafe:42:3::/64 via 2001:cafe:42:3:: dev flannel-v6.1 metric 1024 onlink pref medium
2001:cafe:42:4::/64 via 2001:cafe:42:4:: dev flannel-v6.1 metric 1024 onlink pref medium
2001:cafe:42:5::/64 via 2001:cafe:42:5:: dev flannel-v6.1 metric 1024 onlink pref medium
2001:cafe:42:6::/64 via 2001:cafe:42:6:: dev flannel-v6.1 metric 1024 onlink pref medium
2001:cafe:42:7::/64 via 2001:cafe:42:7:: dev flannel-v6.1 metric 1024 onlink pref medium
2001:cafe:42:a::/64 via 2001:cafe:42:a:: dev flannel-v6.1 metric 1024 onlink pref medium
2001:cafe:42:c:: dev flannel-v6.1 proto kernel metric 256 pref medium
2001:cafe:42:c::/64 dev cni0 proto kernel metric 256 pref medium
2001:cafe:42:d::/64 via 2001:cafe:42:d:: dev flannel-v6.1 metric 1024 onlink pref medium
2605:[snip]::60c dev enp88s0 proto kernel metric 100 pref medium
2605:[snip]::/64 dev enp88s0 proto ra metric 100 pref medium
fe80::/64 dev flannel.1 proto kernel metric 256 pref medium
fe80::/64 dev flannel-v6.1 proto kernel metric 256 pref medium
fe80::/64 dev cni0 proto kernel metric 256 pref medium
fe80::/64 dev vethc7d6fd78 proto kernel metric 256 pref medium
fe80::/64 dev vethd9e89d35 proto kernel metric 256 pref medium
fe80::/64 dev vethcb452314 proto kernel metric 256 pref medium
fe80::/64 dev veth047e29dd proto kernel metric 256 pref medium
fe80::/64 dev veth7f50f117 proto kernel metric 256 pref medium
fe80::/64 dev vethfb072f1b proto kernel metric 256 pref medium
fe80::/64 dev veth180c6d39 proto kernel metric 256 pref medium
fe80::/64 dev veth83a1f152 proto kernel metric 256 pref medium
fe80::/64 dev veth950fa28c proto kernel metric 256 pref medium
fe80::/64 dev vethaf109fd6 proto kernel metric 256 pref medium
fe80::/64 dev enp88s0 proto kernel metric 1024 pref medium
default via fe80::70b0:52ff:fe1e:6ec6 dev enp88s0 proto ra metric 100 pref high
pod-1:
root@pod-1:/# ip -6 r
2001:cafe:42:d::/64 dev eth0 proto kernel metric 256 pref medium
2001:cafe:42::/56 via 2001:cafe:42:d::1 dev eth0 metric 1024 pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium
default via 2001:cafe:42:d::1 dev eth0 metric 1024 pref medium
I tried manually adding forwarding rules to one of the nodes, but it had no effect. I’m not sure if it would need to be added to all of them.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 17 (8 by maintainers)
I finally got my dual-stack setup running. I did the initial dual-stack setup as shown in the k3s.io documentation. That means, i did not use the
--flannel-ipv6-masq flag. Instead, i tried to configure flannel after the installation of k3s. Obviously with not much success. I just ran the k3s setup command again, this time with the--flannel-ipv6-masqflag and immediately, all pods on all nodes can speak ipv4 and ipv6 like a charm.I built a single node cluster using Fedora CoreOS & an ignition file. I continued to have problems until I turned on the
flannel-ipv6-masqflag during installation. It’s unclear why this is the case but I’ll use this setting for now.Sorry, I missed this issue. When you don’t set that flag, the traffic leaving the node is using the pod’s IP as the source IP. If your internal routing is not well configured, the reply packets will not know how to reach the pod. When using that flag, you get the same behaviour as in IPv4, i.e. the source IP of the packets leaving the node is the node’s IP