coredns: Kubernetes - SERVFAIL if we cycle our apiserver certs
We cycle our apiserver certs on a regular basis. In the same time coredns is not able to answer dns queries.
Hard facts:
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.2", GitCommit:"cff46ab41ff0bb44d8584413b598ad8360ec1def", GitTreeState:"clean", BuildDate:"2019-01-10T23:35:51Z", GoVersion:"go1.11.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.5", GitCommit:"753b2dbc622f5cc417845f0ff8a77f539a4213ea", GitTreeState:"clean", BuildDate:"2018-11-26T14:31:35Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
.:53
2019-01-14T21:20:42.732Z [INFO] CoreDNS-1.3.1
2019-01-14T21:20:42.732Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
Config:
.:53 {
errors
health
autopath @kubernetes
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods verified
upstream
fallthrough in-addr.arpa ip6.arpa
ttl 5
}
prometheus :9153
proxy . /etc/resolv.conf {
policy sequential
}
cache 300
reload
}
Below a screenshot of a visualization of logs. The green bars are apiserver restarts. Other colors are several dns queries failing. Last 24 hours. The 2 reloads to the right are reloads of the standby master 2. The reload of the active master is barely visible between the dns errors.
There are two other guys in the slack channel with the same problem. Maybe they can post their findings here…
Greetings, Max
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 4
- Comments: 29 (14 by maintainers)
We are observing the same behaviour. We currently run kube 1.12.3 (we have clusters both on aws and gcp) and we are restarting kubelets daily to rotate certs (includes restarting
kubelet
systemd unit).Our coredns conf looks like:
The way to reproduce the problem is just to restart kubelets on master nodes sequentially. Also note that just deleting kube apiserver pods doesn’t seem to cause an issue.
The only interesting log that I could spot is:
During the time of failure we cannot resolve anything,
dig
gives:it looks like some kind of error that cannot be handled by the stream watcher and results in coredns failing to respond.
@ffilippopoulos I think its important to note that restarting our kubelet on master also does an explicit
docker restart
of the api-server component: