k3s: pod ipv6 traffic unable to egress from cluster

Environmental Info: K3s Version:

k3s version v1.23.6+k3s1 (418c3fa8)
go version go1.17.5

Node(s) CPU architecture, OS, and Version:

[craigcabrey@littleboi ~]$ kubectl get nodes -o wide
NAME         STATUS                     ROLES                       AGE     VERSION        INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                         KERNEL-VERSION          CONTAINER-RUNTIME
alto         Ready,SchedulingDisabled   <none>                      24d     v1.23.6+k3s1   192.168.1.131   <none>        CentOS Stream 8                  4.18.0-394.el8.x86_64   containerd://1.5.11-k3s2
nas          Ready                      <none>                      23d     v1.23.6+k3s1   192.168.1.137   <none>        CentOS Stream 8                  4.18.0-394.el8.x86_64   containerd://1.5.11-k3s2
nuc-node-1   Ready                      <none>                      5d15h   v1.23.6+k3s1   192.168.1.197   <none>        CentOS Stream 9                  5.14.0-109.el9.x86_64   containerd://1.5.11-k3s2
nuc-node-2   Ready                      <none>                      5d18h   v1.23.6+k3s1   192.168.1.210   <none>        CentOS Stream 9                  5.14.0-109.el9.x86_64   containerd://1.5.11-k3s2
nuc-node-3   Ready                      <none>                      5d19h   v1.23.6+k3s1   192.168.1.42    <none>        CentOS Stream 9                  5.14.0-109.el9.x86_64   containerd://1.5.11-k3s2
pi-node-1    Ready,SchedulingDisabled   control-plane,etcd,master   29d     v1.23.6+k3s1   192.168.1.48    <none>        Debian GNU/Linux 11 (bullseye)   5.15.32-v8+             containerd://1.5.11-k3s2
pi-node-2    Ready,SchedulingDisabled   control-plane,etcd,master   29d     v1.23.6+k3s1   192.168.1.179   <none>        Debian GNU/Linux 11 (bullseye)   5.15.32-v8+             containerd://1.5.11-k3s2
pi-node-3    Ready,SchedulingDisabled   control-plane,etcd,master   29d     v1.23.6+k3s1   192.168.1.139   <none>        Debian GNU/Linux 11 (bullseye)   5.15.32-v8+             containerd://1.5.11-k3s2
pi-node-4    Ready                      <none>                      28d     v1.23.6+k3s1   192.168.1.123   <none>        Debian GNU/Linux 11 (bullseye)   5.15.32-v8+             containerd://1.5.11-k3s2
pi-node-5    Ready                      <none>                      28d     v1.23.6+k3s1   192.168.1.44    <none>        Debian GNU/Linux 11 (bullseye)   5.15.32-v8+             containerd://1.5.11-k3s2
pi-node-6    Ready                      <none>                      28d     v1.23.6+k3s1   192.168.1.7     <none>        Debian GNU/Linux 11 (bullseye)   5.15.32-v8+             containerd://1.5.11-k3s2

Cluster Configuration:

  • 11 nodes
  • 3 control plane
  • dual stack configured

cluster init’d as follows (please note the flannel ipv6 masq option):

ExecStart=/usr/local/bin/k3s \
    server \
        '--cluster-init' \
        '--flannel-backend=vxlan' \
        '--flannel-ipv6-masq' \
        '--node-ip' \
        '192.168.1.48,2605:[snip]' \
        '--cluster-cidr' \
        '10.42.0.0/16,2001:cafe:42:0::/56' \
        '--service-cidr' \
        '10.43.0.0/16,2001:cafe:42:1::/112' \
        '--disable' \
        'traefik' \
        '--disable' \
        'servicelb' \

flannel:

root@pi-node-1:~# cat /var/lib/rancher/k3s/agent/etc/flannel/net-conf.json 
{
	"Network": "10.42.0.0/16",
	"EnableIPv6": true,
	"EnableIPv4": true,
	"IPv6Network": "2001:cafe:42::/56",
	"Backend": {
	"Type": "vxlan"
}
}

Note: I stupidly used the IPv6 CIDR in the example docs rather than using a /64 from the /56 I get from my ISP. This may be the problem, but my limited understanding is that I could use IPv6 masq (NAT) to shield myself from the mistake.

Describe the bug:

I am unable to get traffic out of the cluster via IPv6. Ingress via an ingress controller that has an IPv4 & IPv6 address assigned via metallb works fine. Metallb is configured to hand out 192.168.100/24 and 2605:[snip]::100-2605:[snip]::ffff. Inter-pod IPv6 traffic (via services) also works fine.

Two pods running to illustrate the issue:

pod-1                                                  1/1     Running     0               42m     10.42.13.138   nuc-node-1   <none>           <none>
pod-2                                                  1/1     Running     0               42m     10.42.12.55    nuc-node-2   <none>           <none>

Pod 1, showing pod IP, traffic to pod-2 IPv6 addr, and traffic to node IPv6 addr:

root@pod-1:/# ip -6 a show dev eth0
2: eth0@if881: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default  link-netnsid 0
    inet6 2001:cafe:42:d::36c/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::4cfe:6aff:fe9d:c53a/64 scope link 
       valid_lft forever preferred_lft forever
root@pod-1:/# ping6 -c3 2001:cafe:42:c::40c   
PING 2001:cafe:42:c::40c(2001:cafe:42:c::40c) 56 data bytes
64 bytes from 2001:cafe:42:c::40c: icmp_seq=1 ttl=62 time=0.634 ms
64 bytes from 2001:cafe:42:c::40c: icmp_seq=2 ttl=62 time=0.568 ms
64 bytes from 2001:cafe:42:c::40c: icmp_seq=3 ttl=62 time=0.643 ms

--- 2001:cafe:42:c::40c ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2032ms
rtt min/avg/max/mdev = 0.568/0.615/0.643/0.033 ms
root@pod-1:/# ping -c3 2605:[snip]::34d    
PING 2605:[snip]::34d(2605:[snip]::34d) 56 data bytes
64 bytes from 2605:[snip]::34d: icmp_seq=1 ttl=64 time=0.067 ms
64 bytes from 2605:[snip]::34d: icmp_seq=2 ttl=64 time=0.077 ms
64 bytes from 2605:[snip]::34d: icmp_seq=3 ttl=64 time=0.061 ms

--- 2605:[snip]::34d ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2055ms
rtt min/avg/max/mdev = 0.061/0.068/0.077/0.006 ms

Pod 2:

root@pod-2:/# ip -6 a show dev eth0
2: eth0@if1049: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default  link-netnsid 0
    inet6 2001:cafe:42:c::40c/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::1c9b:acff:fedf:68d3/64 scope link 
       valid_lft forever preferred_lft forever
root@pod-2:/# ping6 -c3 2001:cafe:42:d::36c/64
ping6: 2001:cafe:42:d::36c/64: Name or service not known
root@pod-2:/# ping6 -c3 2001:cafe:42:d::36c   
PING 2001:cafe:42:d::36c(2001:cafe:42:d::36c) 56 data bytes
64 bytes from 2001:cafe:42:d::36c: icmp_seq=1 ttl=62 time=0.567 ms
64 bytes from 2001:cafe:42:d::36c: icmp_seq=2 ttl=62 time=0.530 ms
64 bytes from 2001:cafe:42:d::36c: icmp_seq=3 ttl=62 time=0.661 ms

Traffic to public IPv6 (Google DNS) from node:

[root@nuc-node-1 ~]# ping -c3 2001:4860:4860::8888
PING 2001:4860:4860::8888(2001:4860:4860::8888) 56 data bytes
64 bytes from 2001:4860:4860::8888: icmp_seq=1 ttl=117 time=7.07 ms
64 bytes from 2001:4860:4860::8888: icmp_seq=2 ttl=117 time=7.14 ms
64 bytes from 2001:4860:4860::8888: icmp_seq=3 ttl=117 time=7.02 ms

--- 2001:4860:4860::8888 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 7.015/7.076/7.140/0.051 ms

Traffic to public IPv6 (Google DNS) from pod-1:

root@pod-1:/# ping6 -c3 2001:4860:4860::8888
PING 2001:4860:4860::8888(2001:4860:4860::8888) 56 data bytes
^C
--- 2001:4860:4860::8888 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2049ms

Steps To Reproduce:

  • Installed K3s using the configuration noted above

Expected behavior: Traffic can egress from the cluster via the IPv6 stack.

Actual behavior: Packet loss.

Additional context / logs:

There are rules in FORWARD for IPv4 that are not present for IPv6. This looked very suspicious to me:

[root@nuc-node-2 ~]# iptables -L FORWARD
Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         
KUBE-ROUTER-FORWARD  all  --  anywhere             anywhere             /* kube-router netpol - TEMCG2JMHZYE7H7T */
KUBE-FORWARD  all  --  anywhere             anywhere             /* kubernetes forwarding rules */
KUBE-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes service portals */
KUBE-EXTERNAL-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes externally-visible service portals */
ACCEPT     all  --  10.42.0.0/16         anywhere             /* flanneld forward */
ACCEPT     all  --  anywhere             10.42.0.0/16         /* flanneld forward */
[root@nuc-node-2 ~]# ip6tables -L FORWARD
Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         
KUBE-ROUTER-FORWARD  all      anywhere             anywhere             /* kube-router netpol - TEMCG2JMHZYE7H7T */
KUBE-FORWARD  all      anywhere             anywhere             /* kubernetes forwarding rules */
KUBE-SERVICES  all      anywhere             anywhere             ctstate NEW /* kubernetes service portals */
KUBE-EXTERNAL-SERVICES  all      anywhere             anywhere             ctstate NEW /* kubernetes externally-visible service portals */

routes seem fine:

Example node:

[root@nuc-node-2 ~]# ip -6 r
::1 dev lo proto kernel metric 256 pref medium
2001:cafe:42::/64 via 2001:cafe:42:: dev flannel-v6.1 metric 1024 onlink pref medium
2001:cafe:42:1::/64 via 2001:cafe:42:1:: dev flannel-v6.1 metric 1024 onlink pref medium
2001:cafe:42:2::/64 via 2001:cafe:42:2:: dev flannel-v6.1 metric 1024 onlink pref medium
2001:cafe:42:3::/64 via 2001:cafe:42:3:: dev flannel-v6.1 metric 1024 onlink pref medium
2001:cafe:42:4::/64 via 2001:cafe:42:4:: dev flannel-v6.1 metric 1024 onlink pref medium
2001:cafe:42:5::/64 via 2001:cafe:42:5:: dev flannel-v6.1 metric 1024 onlink pref medium
2001:cafe:42:6::/64 via 2001:cafe:42:6:: dev flannel-v6.1 metric 1024 onlink pref medium
2001:cafe:42:7::/64 via 2001:cafe:42:7:: dev flannel-v6.1 metric 1024 onlink pref medium
2001:cafe:42:a::/64 via 2001:cafe:42:a:: dev flannel-v6.1 metric 1024 onlink pref medium
2001:cafe:42:c:: dev flannel-v6.1 proto kernel metric 256 pref medium
2001:cafe:42:c::/64 dev cni0 proto kernel metric 256 pref medium
2001:cafe:42:d::/64 via 2001:cafe:42:d:: dev flannel-v6.1 metric 1024 onlink pref medium
2605:[snip]::60c dev enp88s0 proto kernel metric 100 pref medium
2605:[snip]::/64 dev enp88s0 proto ra metric 100 pref medium
fe80::/64 dev flannel.1 proto kernel metric 256 pref medium
fe80::/64 dev flannel-v6.1 proto kernel metric 256 pref medium
fe80::/64 dev cni0 proto kernel metric 256 pref medium
fe80::/64 dev vethc7d6fd78 proto kernel metric 256 pref medium
fe80::/64 dev vethd9e89d35 proto kernel metric 256 pref medium
fe80::/64 dev vethcb452314 proto kernel metric 256 pref medium
fe80::/64 dev veth047e29dd proto kernel metric 256 pref medium
fe80::/64 dev veth7f50f117 proto kernel metric 256 pref medium
fe80::/64 dev vethfb072f1b proto kernel metric 256 pref medium
fe80::/64 dev veth180c6d39 proto kernel metric 256 pref medium
fe80::/64 dev veth83a1f152 proto kernel metric 256 pref medium
fe80::/64 dev veth950fa28c proto kernel metric 256 pref medium
fe80::/64 dev vethaf109fd6 proto kernel metric 256 pref medium
fe80::/64 dev enp88s0 proto kernel metric 1024 pref medium
default via fe80::70b0:52ff:fe1e:6ec6 dev enp88s0 proto ra metric 100 pref high

pod-1:

root@pod-1:/# ip -6 r
2001:cafe:42:d::/64 dev eth0 proto kernel metric 256 pref medium
2001:cafe:42::/56 via 2001:cafe:42:d::1 dev eth0 metric 1024 pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium
default via 2001:cafe:42:d::1 dev eth0 metric 1024 pref medium

I tried manually adding forwarding rules to one of the nodes, but it had no effect. I’m not sure if it would need to be added to all of them.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 17 (8 by maintainers)

Most upvoted comments

I finally got my dual-stack setup running. I did the initial dual-stack setup as shown in the k3s.io documentation. That means, i did not use the --flannel-ipv6-masq flag. Instead, i tried to configure flannel after the installation of k3s. Obviously with not much success. I just ran the k3s setup command again, this time with the --flannel-ipv6-masq flag and immediately, all pods on all nodes can speak ipv4 and ipv6 like a charm.

I built a single node cluster using Fedora CoreOS & an ignition file. I continued to have problems until I turned on the flannel-ipv6-masq flag during installation. It’s unclear why this is the case but I’ll use this setting for now.

I finally got my dual-stack setup running. I did the initial dual-stack setup as shown in the k3s.io documentation. That means, i did not use the --flannel-ipv6-masq flag. Instead, i tried to configure flannel after the installation of k3s. Obviously with not much success. I just ran the k3s setup command again, this time with the --flannel-ipv6-masq flag and immediately, all pods on all nodes can speak ipv4 and ipv6 like a charm.

Sorry, I missed this issue. When you don’t set that flag, the traffic leaving the node is using the pod’s IP as the source IP. If your internal routing is not well configured, the reply packets will not know how to reach the pod. When using that flag, you get the same behaviour as in IPv4, i.e. the source IP of the packets leaving the node is the node’s IP