istio: Networking breaks when using cilium with strict kube-proxy replacement

Bug description This issue is related to a thread I’ve started on istio’s slack channel.

It affects inter-service telemetry, but it might impact other features as well, because traffic is not treated as http where it should be.

My setup is a kubernetes 1.18 cluster with cilium 1.8 configured with kubeProxyReplacement=strict. This means that the kube-proxy component from kubernetes is replaced and cilium handles its duties. I’m not an expert in how cilium works, but this mode should improve service to service communication (and more networking) leveraging eBPF functionality.

I have noticed (using tcpdump) that when this mode is enabled, if from one pod I make requests to another service (eg. curl http://servicename.namespace), the connections are “magically” made directly to destination pod (pod-ip:target-port), rather then going through the ClusterIP of the destination service. I don’t know the internals of how istio-proxy is configured or how metadata filter works, but this behaviour seem to make istio-proxy into thinking that requests go directly to the pod-ip:container-port, thus no route from istio config is matched, going through some default tcp path.

[ ] Docs [ ] Installation [X] Networking [ ] Performance and Scalability [X] Extensions and Telemetry [ ] Security [ ] Test and Release [ ] User Experience [ ] Developer Infrastructure

Expected behavior Istio metadata headers to be added to requests even when using cilium’s kube-proxy solution

Steps to reproduce the bug To test this bug, I’ve installed kube-proxy and changed cilium’s config to kubeProxyReplacement=partial and inter-service telemetry started to work, no changes to istio setup at all. Moreover, when inspecting traffic between pods using tcpdump, I was able to see some packages being sent to the ClusterIP of the destination service (this didn’t happen previously).

Version (include the output of istioctl version --remote and kubectl version --short and helm version if you used Helm)

client version: 1.7.2
control plane version: 1.7.2
data plane version: 1.7.2 (1225 proxies)

How was Istio installed? istioctl and istio-operator

Environment where bug was observed (cloud vendor, OS, etc) aws, kops cluster

I am not able to share a config dump of my proxies, but this issue can be reproduced on any test cluster with cilium configured as described.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 4
  • Comments: 31 (16 by maintainers)

Most upvoted comments

Just a heads-up - we released the fix (cilium/cilium#17154) in Cilium v1.10.5 which allows Cilium’s KPR to cooperate with Istio’s dataplane. The cilium-agent option is named --bpf-lb-sock-hostns-only, while in Helm - hostServices.hostNamespaceOnly. You need to set it to true in order to prevent the cilium’s socket LB from translating service connections. Otherwise, Istio sidecar proxy won’t be able to determine the original destination.

Just a heads-up - we released the fix (cilium/cilium#17154) in Cilium v1.10.5 which allows Cilium’s KPR to cooperate with Istio’s dataplane. The cilium-agent option is named --bpf-lb-sock-hostns-only, while in Helm - hostServices.hostNamespaceOnly. You need to set it to true in order to prevent the cilium’s socket LB from translating service connections. Otherwise, Istio sidecar proxy won’t be able to determine the original destination.

Applied above configuration on our clusters, all stuff work like a charm 🚀

Resolved: need kernel 5.7 at least k logs -f cilium-vd6mb | grep cookie (arn:aws:eks:us-west-2:171997294028:cluster/ek8s-test/kube-system) Defaulted container “cilium-agent” out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), wait-for-node-init (init), clean-cilium-state (init), install-cni-binaries (init) level=warning msg=“Without network namespace cookie lookup functionality, BPF datapath cannot distinguish root and non-root namespace, skipping socket-level loadbalancing will not work. Istio routing chains will be missed. Needs kernel version >= 5.7” subsys=daemon

Still seeing this issue on AWS EKS 1.23:

Linux 5.4.238-148.347.amzn2.x86_64 #1 SMP Thu Apr 6 19:42:57 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Minimal install:

helm install cilium cilium/cilium \
    --version 1.13.2 \
    --namespace kube-system \
    --set kubeProxyReplacement=strict \
    --set socketLB.hostNamespaceOnly=true

On left hand side is the application pod and right hand is the istio-proxy sidecar of said application pod. tcpdump on istio-proxy shows requests going to endpoints of corends, intead of kube-dns.

Prior to replacing kube-proxy we were seeing correct calls:

20:20:22.043611 IP test-cf8977485-gfh4n.33835 > kube-dns.kube-system.svc.cluster.local.53: 10927+ PTR? 0.100.51.198.in-addr.arpa. (43)
20:20:22.046156 IP kube-dns.kube-system.svc.cluster.local.53 > test-cf8977485-gfh4n.33835: 10927 NXDomain 0/1/0 (125)

After replacing kube-proxy:

05:35:36.927550 IP test-cf8977485-xx76b.46530 > 10-0-0-15.kube-dns.kube-system.svc.cluster.local.domain: 24413+ A? address.internal.default.svc.cluster.local. (60)
05:35:36.927600 IP test-cf8977485-xx76b.46530 > 10-0-0-15.kube-dns.kube-system.svc.cluster.local.domain: 46683+ AAAA? address.internal.default.svc.cluster.local. (60)
05:35:36.928428 IP test-cf8977485-xx76b.48588 > ip-10-0-0-202.us-west-2.compute.internal.domain: 9899+ A? address.internal.svc.cluster.local. (52)

Any thoughts on what else to try? After installing cilium, nodes were terminated to ensure all configurations applied correctly.

Screen Shot 2023-04-27 at 11 36 58 PM

For all who have troubles with cilium in strict mode without kube-proxy.

From my point of view, the root cause for the issues was our cilium version < 12.x and linux kernel < 5.7 due to the usage of ubuntu 20.04 as kubernetes node image.

There were the same issues as described in http(s) communication across namespaces using services with port mappings to different destination ports. Using the same ports mitigated the problem for some pods, but not for all. e.g. Java based service worked well, but python services with async lib had troubles…

After updating to cilium 1.12.6 we still faced the issues, but after looking into the logs of the cilium pods, I found in the startup section warnings about possible problems with istio using a kernel lower than 5.7.

We updated the cluster nodes to a current ubuntu 22.04 with kernel 5.15.x and voilà, all our issues were gone. We even don’t need the port mapping change to use the same destination ports.

The for us working command line to render the cilium chart is:

   helm template cilium cilium/cilium --version 1.12.6 --namespace` kube-system --set cni.exclusive=false --set operator.replicas=1 \
     --set hubble.tls.auto.method=cronJob --set hubble.relay.enabled=true --set hubble.ui.enabled=true \
     --set socketLB.hostNamespaceOnly=true --set nodePort.enabled=true --set hostPort.enabled=true \
     --set externalIPs.enabled=true --set bpf.masquerade=false \
     --set kubeProxyReplacement=strict --set k8sServiceHost=CHANGEME --set k8sServicePort=CHANGEME  

kubernetes node:

System Info:
  Kernel Version:             5.15.0-60-generic
  OS Image:                   Ubuntu 22.04.1 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.6.16
  Kubelet Version:            v1.22.16

I found this, because I saw a warning during startup. @brb Is the feature you want already implemented?

Maybe we can add a check to istioctl precheck to ensure compatible cilium settings are used (bpf-lb-sock-hostns-only)

How would Cilium redirect all traffic to Istio ? We do need some metadata - like original DST:port - that we get using a system call. We have discussed different ways to get this metadata - like a ‘proxy protocol’ prefix, or tunneled in H2 CONNECT - but it’ll take some time.

We have already built this using TPROXY so you will see the original network headers. This is also how Cilium redirects packets to Envoy when Istio is not in play. In the past, we have also shared metadata with Envoy and have built an eBPF metadata reader in Envoy to share arbitrary data between the Cilium datapath and Envoy.