cilium: Broken ipv6 hostPorts with cilium 1.14

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

When I upgraded to Cilium 1.14 all my external ipv6 connectivity broke. (I use hostports to map 80 and 443 to a Traefik Daemonset because the cluster nodes are not in the same networks and there are no load balancers available.)

Cilium Version

1.14.0 & 1.14.1

Kernel Version

Linux k8s-1 5.15.0-78-generic #85-Ubuntu SMP Fri Jul 7 15:25:09 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

Client Version: version.Info{Major:“1”, Minor:“27”, GitVersion:“v1.27.4”, GitCommit:“fa3d7990104d7c1f16943a67f11b154b71f6a132”, GitTreeState:“clean”, BuildDate:“2023-07-19T12:20:54Z”, GoVersion:“go1.20.6”, Compiler:“gc”, Platform:“linux/amd64”} Kustomize Version: v5.0.1 Server Version: version.Info{Major:“1”, Minor:“27”, GitVersion:“v1.27.4”, GitCommit:“fa3d7990104d7c1f16943a67f11b154b71f6a132”, GitTreeState:“clean”, BuildDate:“2023-07-19T12:14:49Z”, GoVersion:“go1.20.6”, Compiler:“gc”, Platform:“linux/amd64”}

Sysdump

No response

Relevant log output

The nodes do seem to recognize the ip adresses in the bpf lb table

kubectl exec -ti -n kube-system cilium-2cdxp -- cilium bpf lb list | grep HostPort
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
                                              [::]:0 (431) (0) [HostPort, non-routable]       
[fdd3:a7d6:38c5::9]:80                        [::]:0 (424) (0) [HostPort]                     
[fdd3:a7d6:38c5::9]:8080                      [::]:0 (418) (0) [HostPort]                     
0.0.0.0:8080                                  0.0.0.0:0 (416) (0) [HostPort, non-routable]    
192.168.178.53:80                             0.0.0.0:0 (420) (0) [HostPort]                  
                                              0.0.0.0:0 (422) (0) [HostPort, non-routable]    
                                              0.0.0.0:0 (427) (0) [HostPort]                  
                                              0.0.0.0:0 (415) (0) [HostPort]                  
[2a05:f080:0:8ff:3075:39ff:fea6:33dc]:8080    [::]:0 (417) (0) [HostPort]                     
[2a05:f080:0:8ff:3075:39ff:fea6:33dc]:80      [::]:0 (423) (0) [HostPort]                     
[::]:80                                       [::]:0 (425) (0) [HostPort, non-routable]       
100.64.0.9:80                                 0.0.0.0:0 (421) (0) [HostPort]                  
0.0.0.0:443                                   0.0.0.0:0 (428) (0) [HostPort, non-routable]    
192.168.178.53:8080                           0.0.0.0:0 (414) (0) [HostPort]                  
100.64.0.9:443                                0.0.0.0:0 (426) (0) [HostPort]                  
                                              [::]:0 (419) (0) [HostPort, non-routable]       
[2a05:f080:0:8ff:3075:39ff:fea6:33dc]:443     [::]:0 (430) (0) [HostPort]                     
                                              [::]:0 (429) (0) [HostPort]     

The node is unfortunately not responding on any of the opened ports :disappointed:

I’m getting generic timeouts:
curl https://[2a05:f080:0:8ff:3075:39ff:fea6:33dc] -vk
*   Trying 2a05:f080:0:8ff:3075:39ff:fea6:33dc:443...
* TCP_NODELAY set
* connect to 2a05:f080:0:8ff:3075:39ff:fea6:33dc port 443 failed: Connection timed out
* Failed to connect to 2a05:f080:0:8ff:3075:39ff:fea6:33dc port 443: Connection timed out
* Closing connection 0
curl: (28) Failed to connect to 2a05:f080:0:8ff:3075:39ff:fea6:33dc port 443: Connection timed out

Oscar Wieman
  14 hours ago
The session is visible in the connection tracking table. But it is just a few bytes (no packets?)

keti -n kube-system cilium-2cdxp -- cilium bpf ct ls global | grep 2a05:f080:0:8ff:141a:deff:fe9e:4ac6

Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
TCP OUT 2a05:f080:0:8ff:3075:39ff:fea6:33dc:80 -> 2a05:f080:0:8ff:141a:deff:fe9e:4ac6:33888 service expires=1031959 RxPackets=0 RxBytes=2179 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=0 TxBytes=0 TxFlagsSeen=0x02 LastTxReport=1031898 Flags=0x0000 [ ] RevNAT=423 SourceSecurityID=0 IfIndex=0 
TCP IN 2a05:f080:0:8ff:141a:deff:fe9e:4ac6:59370 -> fd00::349:8443 expires=1032065 RxPackets=7 RxBytes=658 RxFlagsSeen=0x02 LastRxReport=1031974 TxPackets=17 TxBytes=1598 TxFlagsSeen=0x12 LastTxReport=1032005 Flags=0x0020 [ NodePort ] RevNAT=0 SourceSecurityID=2 IfIndex=0 
TCP IN 2a05:f080:0:8ff:141a:deff:fe9e:4ac6:33888 -> fd00::349:8000 expires=1031990 RxPackets=2 RxBytes=188 RxFlagsSeen=0x02 LastRxReport=1031898 TxPackets=7 TxBytes=658 TxFlagsSeen=0x12 LastTxReport=1031930 Flags=0x0020 [ NodePort ] RevNAT=0 SourceSecurityID=2 IfIndex=0 
TCP OUT 2a05:f080:0:8ff:3075:39ff:fea6:33dc:443 -> 2a05:f080:0:8ff:141a:deff:fe9e:4ac6:59370 service expires=1032034 RxPackets=0 RxBytes=2181 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=0 TxBytes=0 TxFlagsSeen=0x02 LastTxReport=1031974 Flags=0x0000 [ ] RevNAT=430 SourceSecurityID=0 IfIndex=0 
TCP OUT 2a05:f080:0:8ff:141a:deff:fe9e:4ac6:33888 -> fd00::349:8000 expires=1031990 RxPackets=7 RxBytes=658 RxFlagsSeen=0x12 LastRxReport=1031930 TxPackets=2 TxBytes=188 TxFlagsSeen=0x02 LastTxReport=1031898 Flags=0x0020 [ NodePort ] RevNAT=423 SourceSecurityID=2 IfIndex=2 
TCP OUT 2a05:f080:0:8ff:141a:deff:fe9e:4ac6:59370 -> fd00::349:8443 expires=1032065 RxPackets=17 RxBytes=1598 RxFlagsSeen=0x12 LastRxReport=1032005 TxPackets=7 TxBytes=658 TxFlagsSeen=0x02 LastTxReport=1031974 Flags=0x0020 [ NodePort ] RevNAT=430 SourceSecurityID=2 IfIndex=2 
ICMPv6 IN 2a05:f080:0:8ff:141a:deff:fe9e:4ac6:0 -> fd00::349:0 related expires=1031970 RxPackets=1 RxBytes=94 RxFlagsSeen=0x02 LastRxReport=1031910 TxPackets=0 TxBytes=0 TxFlagsSeen=0x00 LastTxReport=0 Flags=0x0030 [ SeenNonSyn NodePort ] RevNAT=0 SourceSecurityID=2 IfIndex=0

Anything else?

The Cilium sysdump is to big for Gitlab, contact me if anyone needs it.

Code of Conduct

  • I agree to follow this project’s Code of Conduct

About this issue

  • Original URL
  • State: closed
  • Created 10 months ago
  • Reactions: 2
  • Comments: 16 (7 by maintainers)

Most upvoted comments

I have the feeling - at least in my ipv6 home network - the cilium_host ipv6 change broke the ipv6 connectivity between and out of pods. I’m using native routing and autoDirectNodeRoutes: true. In cilium 1.13.6 and below everything works as expected. After upgrading to 1.14.x, I can’t ping the ipv6 gateway address from inside a pod.

I’m using a setup similar to https://yolops.net/k8s-dualstack-cilium.html

@oscrx We’ve finally made some progress on https://github.com/cilium/cilium/issues/27898, which should be fixed by https://github.com/cilium/cilium/pull/28417. Would you be able to test if this fixes your nodeport connectivity issue as well? I can’t quite infer from the logs you posted if you could be affected.

Alternatively, I’d still be interested in that sysdump if you still have it. There’s an argument you can use to limit it to affected nodes only. For this case in particular, I’m interested in your node_config.h (please don’t censor any DEFINE_IPV6 statements, this matters!) and any bpf_*.o contained in the sysdump.

Note that it’s possible for some nodes to be affected and not others, depending on the prefix assigned by your ISP/cloud provider and depending on how you’ve sliced up the IP space between your nodes.

I have the feeling - at least in my ipv6 home network - the cilium_host ipv6 change broke the ipv6 connectivity between and out of pods. I’m using native routing and autoDirectNodeRoutes: true. In cilium 1.13.6 and below everything works as expected. After upgrading to 1.14.x, I can’t ping the ipv6 gateway address from inside a pod.

I’m using a setup similar to https://yolops.net/k8s-dualstack-cilium.html

This is probably related to #27898 which is forcing me to keep using 1.13.6.

Can’t reproduce.

My steps:

# 1. kind setup. ./contrib/scripts/kind.sh is under the github.com/cilium/cilium
./contrib/scripts/kind.sh "" 3 "" "" "iptables" dual

# 2. install cilium 1.14.0, using exactly same config as https://github.com/bierteam/uber-kubernetes/blob/main/argocd-managed/cilium/values.yaml
helm install cilium cilium/cilium --version 1.14.0 --namespace kube-system --set hubble.enabled=false --set ipv4.enabled=true --set ipv6.enabled=true --set kubeProxyReplacement=strict --set hostPort.enabled=true

# 3. wait until cilium turns healthy
cilium status --wait

# 4. create hostport pods
cat > hostport.yaml <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: simpleserver2
spec:
  selector:
    matchLabels:
      run: simpleserver2
  replicas: 2
  template:
    metadata:
      labels:
        run: simpleserver2
    spec:
      containers:
      - name: simpleserver2
        image: python:3
        command: ["python"]
        args: ["-mhttp.server", "-b::", "80"]
        ports:
        - containerPort: 80
          hostPort: 8000
EOF
kubectl create -f hostport.yaml 

# 5. check cilium lb
kubectl -n kube-system exec -it $(ciliumat kind-worker) -- cilium bpf lb list 
# SERVICE ADDRESS       BACKEND ADDRESS (REVNAT_ID) (SLOT)
# 10.96.0.10:9153       10.0.0.57:9153 (3) (2)                        
#                       0.0.0.0:0 (3) (0) [ClusterIP, non-routable]   
#                       10.0.0.24:9153 (3) (1)                        
# 0.0.0.0:8000          10.0.2.7:80 (5) (1)                           
#                       0.0.0.0:0 (5) (0) [HostPort, non-routable]    
# 10.96.0.1:443         0.0.0.0:0 (1) (0) [ClusterIP, non-routable]   
#                       172.25.0.2:6443 (1) (1)                       
# 10.96.0.10:53         10.0.0.24:53 (2) (1)                          
#                       10.0.0.57:53 (2) (2)                          
#                       0.0.0.0:0 (2) (0) [ClusterIP, non-routable]   
# [fc00:c111::3]:8000   [fd00::209]:80 (6) (1)                        
#                       [::]:0 (6) (0) [HostPort]                     
# [::]:8000             [fd00::209]:80 (7) (1)                        
#                       [::]:0 (7) (0) [HostPort, non-routable]       
# 172.25.0.3:8000       0.0.0.0:0 (4) (0) [HostPort]                  
#                       10.0.2.7:80 (4) (1)  

# 6. check ipv6 hostport connectivity
curl [fc00:c111::3]:8000
# success

However, I noticed your ipv6 hostport became broken after upgrading from 1.13 to 1.14. Let me re-reproduce by following your steps: create hostport on 1.13 then upgrade to 1.14.

Edit: still can’t reproduce even if upgrade to 1.14

Cc @jschwinger233 do you have a chance to take a look? Might be potential regression.

No problem, will do.