kubernetes: No DNS resolve for 5 min after node failure (CoreDNS)

What happened: we have a 3 master node setup with 2 CoreDNS pods (default setup). In case of a single node failure (killing the node that hosted one of the CoreDNS pods) DNS resolve stops working for 5 minutes. After the failed node reports ‘NotReady’, the CoreDNS still reports happy, and after 5mins the pod re-created. I fixed that pod re-creation issue with adding the following tolarations to CoreDNS deployment: - key: "node.kubernetes.io/unreachable" operator: "Exists" effect: "NoExecute" tolerationSeconds: 0 - key: "node.kubernetes.io/not-ready" operator: "Exists" effect: "NoExecute" tolerationSeconds: 0 After the tolerations modification DNS resolve still not works as expected: if the node reports ‘NotReady’, the CoreDNS pod re-creates immediately, but DNS resolve still not works for 5 minutes. Other user pods reports: ‘bad address’ and ‘cannot assign requested address’ exceptions.

What you expected to happen: after node failure (that hosted one of the CoreDNS pods) a new pod should be created immediately (this is solved), and DNS resolve should working within a few seconds instead of 5 minutes.

How to reproduce it (as minimally and precisely as possible): build up a 3 master node system. Remove the power cord from the machine that holds the master Core-DNS pod.

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version):
Cloud provider or hardware configuration: v1.13.5
OS (e.g: cat /etc/os-release): Ubuntu 16.04.4 LTS
Kernel (e.g. uname -a): Linux hi3 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Install tools: kubeadmin
Network plugin and version (if this is a network-related bug): weave-net
Others: CoreDNS-1.2.6

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 1
Comments: 15 (7 by maintainers)

Most upvoted comments

Thanks for the update, i am closing this issue. Feel free to reopen if needed. /close

prameshj on Jun 14, 2019