kubernetes: NodeLocal DNSCache breaks external DNS updates
What happened:
Setup NodeLocal DNSCache as documented. While this works and avoids DNS resolution errors when nodes or sites are lost, it also prevents RFC 2136 connections from external-dns to cluster-external DNS servers.
time="2021-03-03T14:06:35Z" level=info msg="Instantiating new Kubernetes client"
time="2021-03-03T14:06:35Z" level=info msg="Using inCluster-config based on serviceaccount-token"
time="2021-03-03T14:06:35Z" level=info msg="Created Kubernetes client https://172.31.0.1:443"
time="2021-03-03T14:06:37Z" level=info msg="Configured RFC2136 with zone 'xxx.company.com.' and nameserver 'n0211.xxx.company.com:53'"
time="2021-03-03T14:31:58Z" level=error msg="failed to fetch records via AXFR: dial tcp: i/o timeout"
time="2021-03-03T14:32:59Z" level=error msg="failed to fetch records via AXFR: dial tcp: i/o timeout"
time="2021-03-03T14:34:00Z" level=error msg="failed to fetch records via AXFR: dial tcp: i/o timeout"
time="2021-03-03T14:35:00Z" level=error msg="failed to fetch records via AXFR: dial tcp: i/o timeout"
time="2021-03-03T14:36:00Z" level=error msg="failed to fetch records via AXFR: dial tcp: i/o timeout"
time="2021-03-03T14:37:00Z" level=error msg="failed to fetch records via AXFR: dial tcp: i/o timeout"
time="2021-03-03T14:38:01Z" level=error msg="failed to fetch records via AXFR: dial tcp: i/o timeout"
Looks like the netfilter rules explicitly deny requests to all other DNS connections:
Chain INPUT (policy ACCEPT 201 packets, 177K bytes)
pkts bytes target prot opt in out source destination
6917K 4056M cali-INPUT all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:Cz_u1IQiXIMmKD4c */
0 0 ACCEPT udp -- * * 0.0.0.0/0 172.31.0.10 udp dpt:53
0 0 ACCEPT tcp -- * * 0.0.0.0/0 172.31.0.10 tcp dpt:53
2 174 ACCEPT udp -- * * 0.0.0.0/0 169.254.20.10 udp dpt:53
0 0 ACCEPT tcp -- * * 0.0.0.0/0 169.254.20.10 tcp dpt:53
4002K 2803M KUBE-FIREWALL all -- * * 0.0.0.0/0 0.0.0.0/0
[…]
Chain OUTPUT (policy ACCEPT 391 packets, 229K bytes)
pkts bytes target prot opt in out source destination
7002K 1893M cali-OUTPUT all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:tVnHkvAo15HuiPy0 */
0 0 ACCEPT udp -- * * 172.31.0.10 0.0.0.0/0 udp spt:53
0 0 ACCEPT tcp -- * * 172.31.0.10 0.0.0.0/0 tcp spt:53
5852 1068K ACCEPT udp -- * * 169.254.20.10 0.0.0.0/0 udp spt:53
0 0 ACCEPT tcp -- * * 169.254.20.10 0.0.0.0/0 tcp spt:53
7007K 1894M KUBE-FIREWALL all -- * * 0.0.0.0/0 0.0.0.0/0
What you expected to happen:
NodeLocal DNSCache should allow connections to (selected) external DNS servers so applications and services can advertise themselves.
How to reproduce it (as minimally and precisely as possible):
- Deploy NodeLocal DNSCache as described
- Connect to an external DNS server (e.g.,
host 8.8.8.8 8.8.8.8
)
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
):v1.19.8
- Cloud provider or hardware configuration: On-premises, bare-metal
- OS (e.g:
cat /etc/os-release
): Ubuntu 18.04.5 LTS - Kernel (e.g.
uname -a
):Linux n0214 5.4.0-60-generic #67~18.04.1-Ubuntu SMP Tue Jan 5 22:01:05 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
- Install tools: kubeadm
- Network plugin and version (if this is a network-related bug): Calico
- Others: External DNS server is BIND9
(Possibly) related issues:
- https://github.com/projectcalico/calico/issues/3795
- https://github.com/kubernetes/kubernetes/issues/98758
/sig network
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 17 (13 by maintainers)
Some more testing showed that NodeLocal DNSCache does not really block traffic to the external DNS server but changes DNS resolution of external FQDN somehow: Changing the external DNS server to an absolute FQDN (e.g.,
--rfc2136-host=my-dns-server.company.com.
instead ofmy-dns-server.company.com
, note the trailing dot) makes it work again.For other Pods I noticed occasional delays for FQDN resolution even if the same name has been resolved quickly a second before. With some Alpine Linux based Pods, hostname resolution of cluster-external names sometimes fails completely and only absolute names work.
Weird. I cannot observe any of these effects without NodeLocal DNSCache so far.