kubernetes: [1.14-beta2] coredns imediatelly crash if Kubernetes API is unavailable
What happened:
In Kubernetes 1.14 beta2, if kube-apiserver is shortly unavailable (e.g. restart), all CoreDNS pod will crash which cause a short DNS outage
What you expected to happen:
Like in Kubernetes 1.13, if kube-apiserver is down, CoreDNS continue to run and provide DNS service.
How to reproduce it (as minimally and precisely as possible):
- Create cluster with kubeadm & calico (any CNI should work).
- Wait for CoreDNS to start
- Restart kube-apiserver
- CoreDNS has crashed (it didn’t crashed in Kubernetes 1.13 using the same steps)
# POD_CIDR=10.244.0.0/16
# kubeadm init --pod-network-cidr $POD_CIDR --kubernetes-version v1.14.0-beta.2
[...]
# curl https://docs.projectcalico.org/v3.6/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml | sed -e "s?192.168.0.0/16?$POD_CIDR?g" | kubectl apply -f -
# kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
[...]
coredns-fb8b8dccf-k8hc2 1/1 Running 0 26m
coredns-fb8b8dccf-wfbwg 1/1 Running 0 26m
[...]
# docker rm -f k8s_kube-apiserver_kube-apiserver-node-3_kube-system_1e79449d50c9f3add3dd82d2706ed2f3_0
# kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-k8hc2 0/1 Running 1 27m
coredns-fb8b8dccf-wfbwg 0/1 CrashLoopBackOff 1 27m
Anything else we need to know?:
Log of crashed CoreDNS:
# docker logs k8s_coredns_coredns-fb8b8dccf-wfbwg_kube-system_d872299d-4745-11e9-9462-080027c5f494_1
E0315 17:41:50.400737 1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:317: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0315 17:41:50.400737 1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:317: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
log: exiting because of error: log: cannot create log: open /tmp/coredns.coredns-fb8b8dccf-wfbwg.unknownuser.log.ERROR.20190315-174150.1: no such file or directory
Environment:
- Kubernetes version (use
kubectl version
):
Client Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.0-beta.2.66+846a82fecc6959", GitCommit:"846a82fecc69594712040f715d5447bcd445b9c2", GitTreeState:"clean", BuildDate:"2019-03-15T09:35:50Z", GoVersion:"go1.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.0-beta.2", GitCommit:"b1e389e6f7bd798a8dd162f82b918f509ac5291b", GitTreeState:"clean", BuildDate:"2019-03-12T18:01:33Z", GoVersion:"go1.12", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration: Bare-metal (virtualbox)
- OS (e.g:
cat /etc/os-release
):
NAME="Ubuntu"
VERSION="18.04.1 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.1 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
- Kernel (e.g.
uname -a
):
Linux node-3 4.15.0-46-generic #49-Ubuntu SMP Wed Feb 6 09:33:07 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
- Install tools: Using kubeadm. Debs file dowloaded from https://console.cloud.google.com/storage/browser/kubernetes-release-dev/bazel/v1.14.0-beta.2.66+846a82fecc6959/
- Others:
# docker run --rm -ti k8s.gcr.io/coredns:1.3.1 --version
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
/sig network
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 54 (34 by maintainers)
I had the same problem too:
the error logs
the env
i try to config emptydir ,without effect
https://www.reddit.com/r/kubernetes/comments/bbok8w/coredns_fails_on_node/
update coredns to 1.5.0 resolve my problems
https://github.com/coredns/deployment/blob/master/kubernetes/Upgrading_CoreDNS.md
if you update the coredns to 1.4.0,without effect too,maybe you must update it to 1.5.0
the result
###test
but there still have the same error at the log,i don’t know why
IMO, 1.5.0, barring unforeseen issues, whenever it arrives.
But until then, I think there are three equally stable options:
reload
plugin removed from the Corefile.Edit: Moving to 1.5.0 may require editing the Corefile, replacing
proxy
withforward
.I mitigate the klog issue by mounting an EmptyDir volume to /tmp.
Le mar. 19 mars 2019 à 21:22, Richard Theis notifications@github.com a écrit :
The reload bug fix will be included in CoreDNS 1.4.1 release. But I think the 1.4.1 release also removes an option that is used in the default CoreDNS configuration (which would render it invalid). So we need to be careful there. Probably not where we want to go in a patch release.
We’ll see where the discussion in coredns/coredns#2708 goes. Breaking new ground by releasing a 1.3.2 with the klog fix may be the best option.
I agree. Just letting you know project’s history. While the history cannot change, things could change going forward if there is demand. IMO, the best place to help create demand for this would be to open an issue in CoreDNS.
There is no restart with kube-dns, the DNS service continue to answer while kube-apiserver is down.
the DNS service continue to respond while kube-apiserver is down: