kubeadm: Support for more than one DNS Deployment/Service for faster failover
What keywords did you search in kubeadm issues before filing this one?
“dns”, also skimmed the kubeadm docs looking for a solution
Is this a BUG REPORT or FEATURE REQUEST?
FEATURE REQUEST
Problem description:
When a coredns pod goes down (for example, due to node running out of disk space, or some other unpredictable event), it takes some time for kubelet to declare the pod “not ready”, and then for it to be removed from the Endpoints and finally from the dataplane. During that time, DNS clients can see their DNS requests and their retries all go to the failed pod (and thus get no response).
This is due to an interaction between flow-based load balancing for UDP (as used by kube-proxy) and the behaviour of (at least glibc’s) DNS resolver:
- When the glibc resolver only has one nameserver and it retries a DNS query, it re-uses the same source port.
- This results in all the retry packets being classified as part of one flow by conntrack.
- Once a backend coredns pod has been chosen for the flow, the retry packets all go tot he same pod.
- If the backend pod has failed, the DNS resolution fails until the pod is marked as non-ready (plus extra time for kube-proxy to clean up the dataplane state).
The impact is that, with default configuration, DNS resolution times out instead of the retry going to a good pod. This can be fatal for many long-lived applications.
We saw the same behaviour in iptables mode, IPVS mode and it’s likely that thirdparty eBPF dataplanes suffer the same problem.
Suggested enhancement
Since changing DNS resolver behaviour is infeasible (and none of glibc’s standard configuration seems to be of much use here), I think the best solution is to:
- Deploy two kube-dns deployments
- Deploy two kube-dns services, say with well-known IPs
10.96.0.10
and10.96.0.11
- Configure both of those as nameservers for the pods.
Then, if any one coredns pod goes down, it will affect only one service and the resolver will naturally retry via the other service.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 15 (9 by maintainers)
https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/dns/nodelocaldns helps during dns upgrades or downtime as there is a local cache.
/sig network /area dns