cilium: DNS rules don't work anymore on kubernetes 1.18 with cilium 1.8
Bug report
General Information
- Cilium version
1.8.3 54cf3810d - Kernel version
Linux node1 5.7.0-0.bpo.2-amd64 #1 SMP Debian 5.7.10-1~bpo10+1 (2020-07-30) x86_64 GNU/Linux - Orchestration system
Kubernetes v1.18.9
Description of the problem
After upgrading a Kubernetes cluster from v1.17.7 to v1.18.9 and Cilium from 1.7.5 to v1.8.3 the policies containing a DNS rule don’t work anymore.
This is a test manifest I use to reproduce the problem:
---
apiVersion: v1
kind: Namespace
metadata:
name: test
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: test
name: test
namespace: test
spec:
replicas: 1
selector:
matchLabels:
app: test
template:
metadata:
labels:
app: test
spec:
containers:
- name: test
image: debian
command:
- sleep
- "20000"
---
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: egress-test
namespace: test
spec:
endpointSelector:
matchLabels:
"k8s:app": test
egress:
- toCIDR:
- 169.254.25.10/32
toPorts:
- ports:
- port: "53"
protocol: UDP
rules:
dns:
- matchPattern: "*"
If I try to ping an outside DNS name, I receive from the container, I get the following error:
# ping www.google.com
ping: www.google.com: Temporary failure in name resolution
And the fqdn cache is empty.
The same manifests, on a cluster with k8s v1.17.7 and cilium 1.7.5 works just fine:
# ping www.google.com
PING www.google.com (216.58.198.36) 56(84) bytes of data.
And the fqdn cache is populated:
$ k8s-cilium-exec.sh cilium fqdn cache list
Endpoint Source FQDN TTL ExpirationTime IPs
3018 lookup www.google.com. 3600 2020-09-28T11:12:20.121Z 2a00:1450:4002:802::2004
3018 lookup www.google.com. 3600 2020-09-28T11:12:20.121Z 216.58.198.36
If I try to monitor the policy of the non working pod, here is the results:
$ k8s-cilium-exec.sh cilium monitor --related-to 2902
Policy verdict log: flow 0x6830825c local EP ID 2902, remote ID 16777217, dst port 53, proto 17, ingress false, action redirect, match L3-L4, 10.233.66.224:50827 -> 169.254.25.10:53 udp
-> proxy flow 0x6830825c identity 2950->0 state new ifindex 0 orig-ip 0.0.0.0: 10.233.66.224:50827 -> 169.254.25.10:53 udp
-> proxy flow 0x6830825c identity 2950->0 state established ifindex 0 orig-ip 0.0.0.0: 10.233.66.224:50827 -> 169.254.25.10:53 udp
-> endpoint 2902 flow 0x0 identity 16777217->2950 state reply ifindex lxcb73878e9d0fa orig-ip 169.254.25.10: 169.254.25.10:53 -> 10.233.66.224:50827 udp
level=info msg="Initializing dissection cache..." subsys=monitor
-> endpoint 2902 flow 0x0 identity 16777217->2950 state reply ifindex lxcb73878e9d0fa orig-ip 169.254.25.10: 169.254.25.10:53 -> 10.233.66.224:50827 udp
-> proxy flow 0x6830825c identity 2950->0 state established ifindex 0 orig-ip 0.0.0.0: 10.233.66.224:50827 -> 169.254.25.10:53 udp
-> proxy flow 0x6830825c identity 2950->0 state established ifindex 0 orig-ip 0.0.0.0: 10.233.66.224:50827 -> 169.254.25.10:53 udp
-> Request dns from 2902 ([k8s:io.kubernetes.pod.namespace=test k8s:io.cilium.k8s.policy.serviceaccount=default k8s:io.cilium.k8s.policy.cluster=default k8s:app=test]) to 0 ([reserved:world]), identity 2950->2, verdict Denied DNS Query: www.googlc.ome.test.svc.cluster.local. A
-> endpoint 2902 flow 0x0 identity 16777217->2950 state reply ifindex lxcb73878e9d0fa orig-ip 169.254.25.10: 169.254.25.10:53 -> 10.233.66.224:50827 udp
-> Request dns from 2902 ([k8s:io.cilium.k8s.policy.cluster=default k8s:app=test k8s:io.kubernetes.pod.namespace=test k8s:io.cilium.k8s.policy.serviceaccount=default]) to 0 ([reserved:world]), identity 2950->2, verdict Denied DNS Query: www.googlc.ome.test.svc.cluster.local. AAAA
-> endpoint 2902 flow 0x0 identity 16777217->2950 state reply ifindex lxcb73878e9d0fa orig-ip 169.254.25.10: 169.254.25.10:53 -> 10.233.66.224:50827 udp
-> Request dns from 2902 ([k8s:io.cilium.k8s.policy.cluster=default k8s:app=test k8s:io.kubernetes.pod.namespace=test k8s:io.cilium.k8s.policy.serviceaccount=default]) to 0 ([reserved:world]), identity 2950->2, verdict Denied DNS Query: www.googlc.ome.test.svc.cluster.local. A
-> Request dns from 2902 ([k8s:io.cilium.k8s.policy.serviceaccount=default k8s:io.cilium.k8s.policy.cluster=default k8s:app=test k8s:io.kubernetes.pod.namespace=test]) to 0 ([reserved:world]), identity 2950->2, verdict Denied DNS Query: www.googlc.ome.test.svc.cluster.local. AAAA
Policy verdict log: flow 0x32eed54d local EP ID 2902, remote ID 16777217, dst port 53, proto 17, ingress false, action redirect, match L3-L4, 10.233.66.224:55978 -> 169.254.25.10:53 udp
-> proxy flow 0x32eed54d identity 2950->0 state new ifindex 0 orig-ip 0.0.0.0: 10.233.66.224:55978 -> 169.254.25.10:53 udp
-> proxy flow 0x32eed54d identity 2950->0 state established ifindex 0 orig-ip 0.0.0.0: 10.233.66.224:55978 -> 169.254.25.10:53 udp
-> Request dns from 2902 ([k8s:io.cilium.k8s.policy.serviceaccount=default k8s:io.cilium.k8s.policy.cluster=default k8s:app=test k8s:io.kubernetes.pod.namespace=test]) to 0 ([reserved:world]), identity 2950->2, verdict Denied DNS Query: www.googlc.ome. AAAA
-> Request dns from 2902 ([k8s:app=test k8s:io.kubernetes.pod.namespace=test k8s:io.cilium.k8s.policy.serviceaccount=default k8s:io.cilium.k8s.policy.cluster=default]) to 0 ([reserved:world]), identity 2950->2, verdict Denied DNS Query: www.googlc.ome. A
-> proxy flow 0x32eed54d identity 2950->0 state established ifindex 0 orig-ip 0.0.0.0: 10.233.66.224:55978 -> 169.254.25.10:53 udp
-> proxy flow 0x32eed54d identity 2950->0 state established ifindex 0 orig-ip 0.0.0.0: 10.233.66.224:55978 -> 169.254.25.10:53 udp
-> endpoint 2902 flow 0x0 identity 16777217->2950 state reply ifindex lxcb73878e9d0fa orig-ip 169.254.25.10: 169.254.25.10:53 -> 10.233.66.224:55978 udp
-> endpoint 2902 flow 0x0 identity 16777217->2950 state reply ifindex lxcb73878e9d0fa orig-ip 169.254.25.10: 169.254.25.10:53 -> 10.233.66.224:55978 udp
-> endpoint 2902 flow 0x0 identity 16777217->2950 state reply ifindex lxcb73878e9d0fa orig-ip 169.254.25.10: 169.254.25.10:53 -> 10.233.66.224:55978 udp
-> endpoint 2902 flow 0x0 identity 16777217->2950 state reply ifindex lxcb73878e9d0fa orig-ip 169.254.25.10: 169.254.25.10:53 -> 10.233.66.224:55978 udp
-> Request dns from 2902 ([k8s:io.cilium.k8s.policy.serviceaccount=default k8s:io.cilium.k8s.policy.cluster=default k8s:app=test k8s:io.kubernetes.pod.namespace=test]) to 0 ([reserved:world]), identity 2950->2, verdict Denied DNS Query: www.googlc.ome. AAAA
-> Request dns from 2902 ([k8s:io.cilium.k8s.policy.serviceaccount=default k8s:io.cilium.k8s.policy.cluster=default k8s:app=test k8s:io.kubernetes.pod.namespace=test]) to 0 ([reserved:world]), identity 2950->2, verdict Denied DNS Query: www.googlc.ome. A
How to reproduce the issue
Use kubespray to deploy a kubernetes cluster on Debian10 with backported kernel .7.0-0.bpo.2-amd64
Deploy the cluster with:
- kube_proxy_mode: iptables
- enable_nodelocaldns: true
Here the relevant configuration files for Kubuspray: kubespray-files.zip
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 15 (7 by maintainers)
Nothing left to close. I also posted a PR to clarify the point that @roodkcab pointed out above (#14893).
@joestringer Is there anything left to be done before we can close this issue? Your last comment seems to indicate the two root causes were addressed in the documentation and v1.9.0.
@gandalfmagic I have some progress about this issue, thanks for the help from @joestringer .
Firstly run
cilium monitor -t dropinside the cilium container which is in the same node with your pod, and rundig google.com, here is the monitor output:And I find
100.96.3.61:8053is the theendpointofkube-dnsep, which is strange.I try to edit the egress rule as below and run
dig google.comin the pod, it works this time:I think your cluster coredns port is not 8053, but you could still use
cilium monitor -t dropto find out where blocks. There is still some problems to figure out, but now it could work for me, hope this will help.Something else I just noticed which may be quite relevant is that I see in the description
enable_nodelocaldns: true. I’m not exactly sure what node-local-DNS will do in this scenario but it’s plausible that this is somehow related to the failure.UPDATE: my bad, the working solution was still on Kubernetes v1.17.13 with Cilium 1.8.4 Kubernetes 1.18.10 with Cilium 1.8.4 is still not working.
I’ve tested my cluster with cilium installed from kubespray with ansible and from the officil helm charts. No luck.
Thank you