cilium: DNS request from hostNetwork pods cannot be delivered to local backend with LRP
When doing nslookup from within a hostNetwork pod with NodelocalDNS + Cilium (full KRP), DNS requests time out. I do not see any dropped packets with cilium monitor -t drop. Running tcpdump on the veth of the local node-cache backend pod doesn’t show anything, aka the packets never hit local backend pot.
General Information
-
Cilium version (run
cilium version): Client: 1.9.5 caf84d780 2021-04-23T00:39:47+00:00 go version go1.15.8 linux/amd64 -
Kernel version (run
uname -a): Linux gke-nld-default-pool-5a5e9ec3-8llq 5.4.104+ SMP Tue Apr 6 09:49:56 PDT 2021 x86_64 x86_64 x86_64 GNU/Linux -
Orchestration system version in use (e.g.
kubectl version, …) GKE cluster v1.20.6-gke.1400
How to reproduce the issue
- Deploy NodelocalDNS with Cilium following gsg
- Deploy following pod and do
nslookupfrom within:
apiVersion: v1
kind: Pod
metadata:
name: dnsclienthostnet1
spec:
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true
containers:
- image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.1
name: dnsclient
resources:
limits:
cpu: "0.1"
requests:
cpu: 100m
command: ["sh", "-c"]
args: ["sleep 36000"]
Note this issue only happens when the pod has dnsPolicy: ClusterFirstWithHostNet.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 16 (16 by maintainers)
Spent some more time on this today, finally figured out root cause.
tldr, we need an additional
ACCEPTrule infilter:CILIUM_OUTPUTto accept the outgoing packet.RCA: When a hostns pod sends out a dns pkt going to local node-cache pod, it will hit this rule (https://github.com/cilium/cilium/blob/master/pkg/datapath/iptables/iptables.go#L813) and skip conntrack, however, there’s no corresponding
ACCEPTrule to allow it infilter:output, resulting in a drop there, this can be observed via below pkt trace:Here’s
filter:OUTPUT:OSS NodelocalDNS does not have a specific ACCEPT rule either (that’s probably why we missed it in the first place), but it works because it’s DNATing the packet to a link-local dummy interface and the pkt will be routed to the loopback dev, hence hitting rule no.4 above and be allowed through.
Solution: After adding the last 2 rules into
filter:CILIUM_OUTPUTto explicitly allow this untracked pkt (to nodelcoaldns ip)I can see the pkt flows normally as expected through:
And dns request works just fine in hostns pod:
Added debug prints in bpf_sock.c, saw the following:
First line is right upon entry of sock4_xlate_fwd, second line is right b4 final return (I also have a printk in the LRP skip case which never triggers).
Convering the ip/port to human readable format I can see that the first packet is going to
kube-dnssvc VIP and second line goes to the nodelocaldns pod on the same node, so I’m pretty sure the rewrite happens correctly in this case.I can see the NOTRACK rule being hit when issuing
nslookupin the hostNetwork client (the counter number stays unchanged if no operation is taken):Before:
After:
Above seems aligns with the fact that “hacking the skip LRP translate doesn’t work” because the traffic is indeed DNATed to nodelocaldns’s IP.