cilium: Missing ICMPv6 time exceeded in-transit from Node
Is there an existing issue for this?
- I have searched the existing issues
What happened?
When trying to traceroute towards a Pod running on our cluster, the hop before the Pod, i.e. the Node, is not responding with ICMP6, time exceeded in-transit.
Our cluster is using eBPF for everything. See below in section “Anything else?” for more information.
Cilium Version
Client: 1.12.4 6eaecaf 2022-11-16T05:45:01+00:00 go version go1.18.8 linux/amd64 Daemon: 1.12.4 6eaecaf 2022-11-16T05:45:01+00:00 go version go1.18.8 linux/amd64
Kernel Version
Linux k8s0-controlplane0 6.0.0-5-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.0.10-2 (2022-12-01) x86_64 Linux
Kubernetes Version
Client Version: version.Info{Major:“1”, Minor:“26”, GitVersion:“v1.26.0”, GitCommit:“b46a3f887ca979b1a5d14fd39cb1af43e7e5d12d”, GitTreeState:“clean”, BuildDate:“2022-12-08T19:51:43Z”, GoVersion:“go1.19.4”, Compiler:“gc”, Platform:“darwin/amd64”} Kustomize Version: v4.5.7 Server Version: version.Info{Major:“1”, Minor:“26”, GitVersion:“v1.26.0”, GitCommit:“b46a3f887ca979b1a5d14fd39cb1af43e7e5d12d”, GitTreeState:“clean”, BuildDate:“2022-12-08T19:51:45Z”, GoVersion:“go1.19.4”, Compiler:“gc”, Platform:“linux/amd64”}
Sysdump
cilium-sysdump-20221213-140237.zip
Relevant log output
No response
Anything else?
This bug could affect IPv4 too, but IPv4 addresses are probably too expensive for anyone to run IPv4 clusters without masquerading 😄
Our hosts have CiliumClusterwideNetworkPolicies configured which allow them to respond to ICMPv4 and ICMPv6 echo requests.
cilium status:
KVStore: Ok Disabled
Kubernetes: Ok 1.26 (v1.26.0) [linux/amd64]
Kubernetes APIs: ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement: Strict [internet 192.0.2.1 2001:db8::1 (Direct Routing)]
Host firewall: Enabled [internet]
CNI Chaining: none
Cilium: Ok 1.12.4 (v1.12.4-6eaecaf)
NodeMonitor: Listening for events on 4 CPUs with 64x4096 of shared memory
Cilium health daemon: Ok
IPAM: IPv4: 4/254 allocated from 10.0.2.0/24, IPv6: 4/65534 allocated from 2001:db8:1f::2:0/112
BandwidthManager: EDT with BPF [BBR] [internet]
Host Routing: BPF
Masquerading: BPF [internet] 10.0.0.0/8 [IPv4: Enabled, IPv6: Disabled]
Controller Status: 30/30 healthy
Proxy Status: OK, ip 10.0.2.231, 0 redirects active on ports 10000-20000
Global Identity Range: min 256, max 65535
Hubble: Ok Current/Max Flows: 4095/4095 (100.00%), Flows/s: 45.90 Metrics: Ok
Encryption: Disabled
Cluster health: 3/3 reachable (2022-12-08T00:45:43Z)
Keywords: hops exceeded, TTL, tracert, mtr
Code of Conduct
- I agree to follow this project’s Code of Conduct
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 16 (6 by maintainers)
https://github.com/cilium/cilium/pull/26674 might help here (it speaks about DSR-eligible traffic, but actually also skips over non-Service traffic such as ICMP).
Note that with BPF Masquerading enabled, we could still end up applying SNAT to the traffic (and hit the same issue), depending on what
IPV6_MASQUERADEis selected: https://github.com/cilium/cilium/blob/fbbca8494eaa2417fdfe201cb4dcc262bb514ad6/bpf/lib/nat.h#L1619