cilium: Cannot reach hostNetwork pods

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

I am trying to make Azure AAD Pod Identity to work in NMI mode using cilium in kubeProxyReplacement=strict mode.

Azure AAD Pod Identity runs a daemonset in hostNetwork: true mode and listens to port 2579. All requests to the azure IMDS endpoint 169.254.169.254:80 are DNATted to localhost:2579 using these iptables rules:

-t nat -A PREROUTING -j aad-metadata
-A aad-metadata ! -s 127.0.0.1/32 -d 169.254.169.254/32 -p tcp -m tcp --dport 80 -j DNAT --to-destination 127.0.0.1:2579
-A aad-metadata -j RETURN

When running calico and kube-proxy this works normally. When running cilium it does not (TCP handshake towards 169.254.169.254 fails).

Here a tcpdump capture from the node where I run a test pod trying to reach 169.254.169.254:

# tcpdump -i any host 169.254.169.254
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
12:34:47.412934 IP 10.0.134.103.43426 > 169.254.169.254.http: Flags [S], seq 591260527, win 64860, options [mss 1410,sackOK,TS val 2982256119 ecr 0,nop,wscale 7], length 0
12:34:48.442267 IP 10.0.134.103.43426 > 169.254.169.254.http: Flags [S], seq 591260527, win 64860, options [mss 1410,sackOK,TS val 2982257148 ecr 0,nop,wscale 7], length 0

I tried to capture traffic towards port 2579, and got none.

I tried using pwru to capture cilium traffic and I got this:

               SKB          PROCESS                     FUNC        TIMESTAMP
0xffff99a6c6de5ce0           [curl]       0xffffffff8b665770   80886317502703 10.0.134.103:43438->169.254.169.254:2579(tcp)
0xffff99a6c6de5ce0           [curl]       0xffffffff8b665770   80886317522903 10.0.134.103:43438->169.254.169.254:2579(tcp)
0xffff99a6c6de5ce0           [curl]       0xffffffff8b6e7c90   80886317530803 10.0.134.103:43438->127.0.0.1:2579(tcp)
0xffff99a6c6de5ce0           [curl]       0xffffffff8b717dd0   80886317538503 10.0.134.103:43438->127.0.0.1:2579(tcp)
0xffff99a6c6de5ce0           [curl]       0xffffffff8b6e4dd0   80886317545703 10.0.134.103:43438->127.0.0.1:2579(tcp)
0xffff99a6c6de5ce0           [curl]       0xffffffff8b6e4b70   80886317558503 10.0.134.103:43438->127.0.0.1:2579(tcp)
0xffff99a6c6de5ce0           [curl]       0xffffffff8b6e3ec0   80886317564803 10.0.134.103:43438->127.0.0.1:2579(tcp)
0xffff99a6c6de5ce0           [curl]       0xffffffff8b632a00   80886317573204 10.0.134.103:43438->127.0.0.1:2579(tcp)
0xffff99a6c6de5ce0           [curl]       0xffffffff8b6329b0   80886317580104 10.0.134.103:43438->127.0.0.1:2579(tcp)
0xffff99a6c6de5ce0           [curl]       0xffffffff8b632930   80886317587504 10.0.134.103:43438->127.0.0.1:2579(tcp)
0xffff99a6c6de5ce0           [curl]       0xffffffff8b632ac0   80886317596704 10.0.134.103:43438->127.0.0.1:2579(tcp)
0xffff99a6c6de5ce0           [curl]       0xffffffff8b631990   80886317602804 10.0.134.103:43438->127.0.0.1:2579(tcp)

It feels to me that “127.0.0.1” is the pod’s localhost, and that’s why it never reaches the hostNetwork service.

I also tried setting a CiliumLocalRedirectPolicy to redirect traffic for 169.254.169.254:80 to the local pod on port 2579, but with no luck.

Is this even possible? What am I missing?

Cilium Version

$ cilium version cilium-cli: v0.11.2 compiled with go1.18.1 on linux/amd64 cilium image (default): v1.11.3 cilium image (stable): v1.11.5 cilium image (running): v1.11.2

Kernel Version

Linux worker-000021 5.15.32-flatcar #1 SMP Tue Apr 5 17:17:31 -00 2022 x86_64 Intel® Xeon® Platinum 8171M CPU @ 2.60GHz GenuineIntel GNU/Linux

Kubernetes Version

Server Version: version.Info{Major:“1”, Minor:“21”, GitVersion:“v1.21.11”, GitCommit:“38d3c1f3d5306401bcf39a71bad3b5a5106033d7”, GitTreeState:“clean”, BuildDate:“2022-03-16T14:02:06Z”, GoVersion:“go1.16.15”, Compiler:“gc”, Platform:“linux/amd64”}

Sysdump

Can’t, dump is too big

Relevant log output

No response

Anything else?

No response

Code of Conduct

  • I agree to follow this project’s Code of Conduct

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 18 (8 by maintainers)

Most upvoted comments

Hello, thanks for your answer. I mentioned it, I actually tried with the LRP with no luck. Indeed, the use case is very similar to the KIAM example, with the key difference that kiam agent is not using host network while azure ad pod identity is. I feel like that’s the key problem here

If you check the above linked Cilium guide section, kiam agent also runs in the host network mode. You can run cilium lrp list and cilium service list to see if the correct service translation entries have been created.