cilium: Broken ipv6 hostPorts with cilium 1.14
Is there an existing issue for this?
- I have searched the existing issues
What happened?
When I upgraded to Cilium 1.14 all my external ipv6 connectivity broke. (I use hostports to map 80 and 443 to a Traefik Daemonset because the cluster nodes are not in the same networks and there are no load balancers available.)
Cilium Version
1.14.0 & 1.14.1
Kernel Version
Linux k8s-1 5.15.0-78-generic #85-Ubuntu SMP Fri Jul 7 15:25:09 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Kubernetes Version
Client Version: version.Info{Major:“1”, Minor:“27”, GitVersion:“v1.27.4”, GitCommit:“fa3d7990104d7c1f16943a67f11b154b71f6a132”, GitTreeState:“clean”, BuildDate:“2023-07-19T12:20:54Z”, GoVersion:“go1.20.6”, Compiler:“gc”, Platform:“linux/amd64”} Kustomize Version: v5.0.1 Server Version: version.Info{Major:“1”, Minor:“27”, GitVersion:“v1.27.4”, GitCommit:“fa3d7990104d7c1f16943a67f11b154b71f6a132”, GitTreeState:“clean”, BuildDate:“2023-07-19T12:14:49Z”, GoVersion:“go1.20.6”, Compiler:“gc”, Platform:“linux/amd64”}
Sysdump
No response
Relevant log output
The nodes do seem to recognize the ip adresses in the bpf lb table
kubectl exec -ti -n kube-system cilium-2cdxp -- cilium bpf lb list | grep HostPort
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
[::]:0 (431) (0) [HostPort, non-routable]
[fdd3:a7d6:38c5::9]:80 [::]:0 (424) (0) [HostPort]
[fdd3:a7d6:38c5::9]:8080 [::]:0 (418) (0) [HostPort]
0.0.0.0:8080 0.0.0.0:0 (416) (0) [HostPort, non-routable]
192.168.178.53:80 0.0.0.0:0 (420) (0) [HostPort]
0.0.0.0:0 (422) (0) [HostPort, non-routable]
0.0.0.0:0 (427) (0) [HostPort]
0.0.0.0:0 (415) (0) [HostPort]
[2a05:f080:0:8ff:3075:39ff:fea6:33dc]:8080 [::]:0 (417) (0) [HostPort]
[2a05:f080:0:8ff:3075:39ff:fea6:33dc]:80 [::]:0 (423) (0) [HostPort]
[::]:80 [::]:0 (425) (0) [HostPort, non-routable]
100.64.0.9:80 0.0.0.0:0 (421) (0) [HostPort]
0.0.0.0:443 0.0.0.0:0 (428) (0) [HostPort, non-routable]
192.168.178.53:8080 0.0.0.0:0 (414) (0) [HostPort]
100.64.0.9:443 0.0.0.0:0 (426) (0) [HostPort]
[::]:0 (419) (0) [HostPort, non-routable]
[2a05:f080:0:8ff:3075:39ff:fea6:33dc]:443 [::]:0 (430) (0) [HostPort]
[::]:0 (429) (0) [HostPort]
The node is unfortunately not responding on any of the opened ports :disappointed:
I’m getting generic timeouts:
curl https://[2a05:f080:0:8ff:3075:39ff:fea6:33dc] -vk
* Trying 2a05:f080:0:8ff:3075:39ff:fea6:33dc:443...
* TCP_NODELAY set
* connect to 2a05:f080:0:8ff:3075:39ff:fea6:33dc port 443 failed: Connection timed out
* Failed to connect to 2a05:f080:0:8ff:3075:39ff:fea6:33dc port 443: Connection timed out
* Closing connection 0
curl: (28) Failed to connect to 2a05:f080:0:8ff:3075:39ff:fea6:33dc port 443: Connection timed out
Oscar Wieman
14 hours ago
The session is visible in the connection tracking table. But it is just a few bytes (no packets?)
keti -n kube-system cilium-2cdxp -- cilium bpf ct ls global | grep 2a05:f080:0:8ff:141a:deff:fe9e:4ac6
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
TCP OUT 2a05:f080:0:8ff:3075:39ff:fea6:33dc:80 -> 2a05:f080:0:8ff:141a:deff:fe9e:4ac6:33888 service expires=1031959 RxPackets=0 RxBytes=2179 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=0 TxBytes=0 TxFlagsSeen=0x02 LastTxReport=1031898 Flags=0x0000 [ ] RevNAT=423 SourceSecurityID=0 IfIndex=0
TCP IN 2a05:f080:0:8ff:141a:deff:fe9e:4ac6:59370 -> fd00::349:8443 expires=1032065 RxPackets=7 RxBytes=658 RxFlagsSeen=0x02 LastRxReport=1031974 TxPackets=17 TxBytes=1598 TxFlagsSeen=0x12 LastTxReport=1032005 Flags=0x0020 [ NodePort ] RevNAT=0 SourceSecurityID=2 IfIndex=0
TCP IN 2a05:f080:0:8ff:141a:deff:fe9e:4ac6:33888 -> fd00::349:8000 expires=1031990 RxPackets=2 RxBytes=188 RxFlagsSeen=0x02 LastRxReport=1031898 TxPackets=7 TxBytes=658 TxFlagsSeen=0x12 LastTxReport=1031930 Flags=0x0020 [ NodePort ] RevNAT=0 SourceSecurityID=2 IfIndex=0
TCP OUT 2a05:f080:0:8ff:3075:39ff:fea6:33dc:443 -> 2a05:f080:0:8ff:141a:deff:fe9e:4ac6:59370 service expires=1032034 RxPackets=0 RxBytes=2181 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=0 TxBytes=0 TxFlagsSeen=0x02 LastTxReport=1031974 Flags=0x0000 [ ] RevNAT=430 SourceSecurityID=0 IfIndex=0
TCP OUT 2a05:f080:0:8ff:141a:deff:fe9e:4ac6:33888 -> fd00::349:8000 expires=1031990 RxPackets=7 RxBytes=658 RxFlagsSeen=0x12 LastRxReport=1031930 TxPackets=2 TxBytes=188 TxFlagsSeen=0x02 LastTxReport=1031898 Flags=0x0020 [ NodePort ] RevNAT=423 SourceSecurityID=2 IfIndex=2
TCP OUT 2a05:f080:0:8ff:141a:deff:fe9e:4ac6:59370 -> fd00::349:8443 expires=1032065 RxPackets=17 RxBytes=1598 RxFlagsSeen=0x12 LastRxReport=1032005 TxPackets=7 TxBytes=658 TxFlagsSeen=0x02 LastTxReport=1031974 Flags=0x0020 [ NodePort ] RevNAT=430 SourceSecurityID=2 IfIndex=2
ICMPv6 IN 2a05:f080:0:8ff:141a:deff:fe9e:4ac6:0 -> fd00::349:0 related expires=1031970 RxPackets=1 RxBytes=94 RxFlagsSeen=0x02 LastRxReport=1031910 TxPackets=0 TxBytes=0 TxFlagsSeen=0x00 LastTxReport=0 Flags=0x0030 [ SeenNonSyn NodePort ] RevNAT=0 SourceSecurityID=2 IfIndex=0
Anything else?
The Cilium sysdump is to big for Gitlab, contact me if anyone needs it.
Code of Conduct
- I agree to follow this project’s Code of Conduct
About this issue
- Original URL
- State: closed
- Created 10 months ago
- Reactions: 2
- Comments: 16 (7 by maintainers)
I have the feeling - at least in my ipv6 home network - the cilium_host ipv6 change broke the ipv6 connectivity between and out of pods. I’m using native routing and
autoDirectNodeRoutes: true. In cilium 1.13.6 and below everything works as expected. After upgrading to 1.14.x, I can’t ping the ipv6 gateway address from inside a pod.I’m using a setup similar to https://yolops.net/k8s-dualstack-cilium.html
@oscrx We’ve finally made some progress on https://github.com/cilium/cilium/issues/27898, which should be fixed by https://github.com/cilium/cilium/pull/28417. Would you be able to test if this fixes your nodeport connectivity issue as well? I can’t quite infer from the logs you posted if you could be affected.
Alternatively, I’d still be interested in that sysdump if you still have it. There’s an argument you can use to limit it to affected nodes only. For this case in particular, I’m interested in your
node_config.h(please don’t censor anyDEFINE_IPV6statements, this matters!) and anybpf_*.ocontained in the sysdump.Note that it’s possible for some nodes to be affected and not others, depending on the prefix assigned by your ISP/cloud provider and depending on how you’ve sliced up the IP space between your nodes.
This is probably related to #27898 which is forcing me to keep using 1.13.6.
Can’t reproduce.
My steps:
However, I noticed your ipv6 hostport became broken after upgrading from 1.13 to 1.14. Let me re-reproduce by following your steps: create hostport on 1.13 then upgrade to 1.14.
Edit: still can’t reproduce even if upgrade to 1.14
No problem, will do.