kubernetes: No DNS resolve for 5 min after node failure (CoreDNS)
What happened: we have a 3 master node setup with 2 CoreDNS pods (default setup). In case of a single node failure (killing the node that hosted one of the CoreDNS pods) DNS resolve stops working for 5 minutes. After the failed node reports ‘NotReady’, the CoreDNS still reports happy, and after 5mins the pod re-created. I fixed that pod re-creation issue with adding the following tolarations to CoreDNS deployment:
- key: "node.kubernetes.io/unreachable" operator: "Exists" effect: "NoExecute" tolerationSeconds: 0 - key: "node.kubernetes.io/not-ready" operator: "Exists" effect: "NoExecute" tolerationSeconds: 0
After the tolerations modification DNS resolve still not works as expected: if the node reports ‘NotReady’, the CoreDNS pod re-creates immediately, but DNS resolve still not works for 5 minutes.
Other user pods reports: ‘bad address’ and ‘cannot assign requested address’ exceptions.
What you expected to happen: after node failure (that hosted one of the CoreDNS pods) a new pod should be created immediately (this is solved), and DNS resolve should working within a few seconds instead of 5 minutes.
How to reproduce it (as minimally and precisely as possible): build up a 3 master node system. Remove the power cord from the machine that holds the master Core-DNS pod.
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
): - Cloud provider or hardware configuration: v1.13.5
- OS (e.g:
cat /etc/os-release
): Ubuntu 16.04.4 LTS - Kernel (e.g.
uname -a
): Linux hi3 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux - Install tools: kubeadmin
- Network plugin and version (if this is a network-related bug): weave-net
- Others: CoreDNS-1.2.6
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 1
- Comments: 15 (7 by maintainers)
Thanks for the update, i am closing this issue. Feel free to reopen if needed. /close