cilium: Missing ICMPv6 time exceeded in-transit from Node

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

When trying to traceroute towards a Pod running on our cluster, the hop before the Pod, i.e. the Node, is not responding with ICMP6, time exceeded in-transit.

Our cluster is using eBPF for everything. See below in section “Anything else?” for more information.

Cilium Version

Client: 1.12.4 6eaecaf 2022-11-16T05:45:01+00:00 go version go1.18.8 linux/amd64 Daemon: 1.12.4 6eaecaf 2022-11-16T05:45:01+00:00 go version go1.18.8 linux/amd64

Kernel Version

Linux k8s0-controlplane0 6.0.0-5-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.0.10-2 (2022-12-01) x86_64 Linux

Kubernetes Version

Client Version: version.Info{Major:“1”, Minor:“26”, GitVersion:“v1.26.0”, GitCommit:“b46a3f887ca979b1a5d14fd39cb1af43e7e5d12d”, GitTreeState:“clean”, BuildDate:“2022-12-08T19:51:43Z”, GoVersion:“go1.19.4”, Compiler:“gc”, Platform:“darwin/amd64”} Kustomize Version: v4.5.7 Server Version: version.Info{Major:“1”, Minor:“26”, GitVersion:“v1.26.0”, GitCommit:“b46a3f887ca979b1a5d14fd39cb1af43e7e5d12d”, GitTreeState:“clean”, BuildDate:“2022-12-08T19:51:45Z”, GoVersion:“go1.19.4”, Compiler:“gc”, Platform:“linux/amd64”}

Sysdump

cilium-sysdump-20221213-140237.zip

Relevant log output

No response

Anything else?

This bug could affect IPv4 too, but IPv4 addresses are probably too expensive for anyone to run IPv4 clusters without masquerading 😄

Our hosts have CiliumClusterwideNetworkPolicies configured which allow them to respond to ICMPv4 and ICMPv6 echo requests.

cilium status:

KVStore:                 Ok   Disabled
Kubernetes:              Ok   1.26 (v1.26.0) [linux/amd64]
Kubernetes APIs:         ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement:    Strict    [internet 192.0.2.1 2001:db8::1 (Direct Routing)]
Host firewall:           Enabled   [internet]
CNI Chaining:            none
Cilium:                  Ok   1.12.4 (v1.12.4-6eaecaf)
NodeMonitor:             Listening for events on 4 CPUs with 64x4096 of shared memory
Cilium health daemon:    Ok   
IPAM:                    IPv4: 4/254 allocated from 10.0.2.0/24, IPv6: 4/65534 allocated from 2001:db8:1f::2:0/112
BandwidthManager:        EDT with BPF [BBR] [internet]
Host Routing:            BPF
Masquerading:            BPF   [internet]   10.0.0.0/8 [IPv4: Enabled, IPv6: Disabled]
Controller Status:       30/30 healthy
Proxy Status:            OK, ip 10.0.2.231, 0 redirects active on ports 10000-20000
Global Identity Range:   min 256, max 65535
Hubble:                  Ok   Current/Max Flows: 4095/4095 (100.00%), Flows/s: 45.90   Metrics: Ok
Encryption:              Disabled
Cluster health:          3/3 reachable   (2022-12-08T00:45:43Z)

Keywords: hops exceeded, TTL, tracert, mtr

Code of Conduct

  • I agree to follow this project’s Code of Conduct

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 16 (6 by maintainers)

Most upvoted comments

https://github.com/cilium/cilium/pull/26674 might help here (it speaks about DSR-eligible traffic, but actually also skips over non-Service traffic such as ICMP).

Note that with BPF Masquerading enabled, we could still end up applying SNAT to the traffic (and hit the same issue), depending on what IPV6_MASQUERADE is selected: https://github.com/cilium/cilium/blob/fbbca8494eaa2417fdfe201cb4dcc262bb514ad6/bpf/lib/nat.h#L1619