kubernetes: kube-dns: dnsmasq intermittent connection refused
Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.):
What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.):
Is this a BUG REPORT or FEATURE REQUEST? (choose one):
Kubernetes version (use kubectl version):
kubectl version
Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.7", GitCommit:"8eb75a5810cba92ccad845ca360cf924f2385881", GitTreeState:"clean", BuildDate:"2017-04-27T10:00:30Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.7", GitCommit:"8eb75a5810cba92ccad845ca360cf924f2385881", GitTreeState:"clean", BuildDate:"2017-04-27T09:42:05Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Environment:
- Cloud provider or hardware configuration: AWS
- OS (e.g. from /etc/os-release): PRETTY_NAME=“Container Linux by CoreOS 1339.0.0 (Ladybug)”
- Kernel (e.g.
uname -a): 4.10.1-coreos - Install tools: custom ansible
- Others: kube dns related images. gcr.io/google_containers/kubedns-amd64:1.9 and gcr.io/google_containers/kube-dnsmasq-amd64:1.4.1
What happened: java.net.UnknownHostException: dynamodb.us-east-1.amazonaws.com
What you expected to happen: Receive a response to the name lookup request.
How to reproduce it (as minimally and precisely as possible): This is the kicker. We are not able to reproduce this issue on purpose. However we experience this in our production cluster 1 - 500 times a week.
Anything else we need to know: In the past 2 months or so we had experienced a handful of events where DNS was failing for most/all of our production pods and the event would last for 5 - 10 minutes. During this time the kube-dns service was healthy with 3 - 6 available endpoints at all times. We increased our kube-dns pod count to 20 in 20 node production clusters. This level of provisioning alleviated the DNS issues that were taking down our production services. However we still experience at least weekly smaller events ranging from 1 second to 30 seconds which affect a small subset of pods. During these events 1 - 5 pods on different nodes across the cluster experience a burst of DNS failures which have a much smaller end user impact. We enabled query logging in dnsmasq as we were not sure whether the queries made it from the client pod to one of the kube-dns pods or not. What was interesting is that during the DNS events where query logging was enabled none of the name lookup requests that resulted in an exception were received by dnsmasq. At this point my colleague noticed these errors coming from dnsmasq-metrics
ERROR: logging before flag.Parse: W0517 03:19:50.139060 1 server.go:53] Error getting metrics from dnsmasq: read udp 127.0.0.1:36181->127.0.0.1:53: i/o timeout
That error as near as I can tell is basically a name resolution error from dnsmasq-metrics as it’s trying to query the dnsmasq container in the same pod to get dnsmasq’s internal metrics similar to running dig +short chaos txt cachesize.bind.
All of our DNS events are happening at the exact same time that 1 or more dnsmasq-metrics container is throwing those errors. We thought we might be possibly exceeding the default 150 connection limit that dnsmasq has but we do not see any logs indicating that. IF we did we would expect to see these log messages
dnsmasq: Maximum number of concurrent DNS queries reached (max: 150)
Based off of conversations with other cluster operators and users in slack I know that other users are experiencing these same problems. I’m hoping that this issue can be used to centralize our efforts and determine if dnsmasq refusing connections is the problem or a symptom of something else.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 21
- Comments: 103 (50 by maintainers)
This comment explains the root cause pretty well: https://github.com/weaveworks/weave/issues/3287#issuecomment-387178077
We have switched our resolvers to TCP and since not seen these issues anymore. This is probably better than the 4ms artificial delay to avoid the race which was suggested in the weave issue and is much easier to implement.
The title of this issue should be updated, it doesn’t only affect kube-dns.
I’ve been debugging intermittent DNS errors in a 1.7.2 cluster with Ubuntu 16 nodes on AWS deployed by kops 1.7.x. I manually cut down kube-dns (1.14.5) to just a single running replica so I could watch that EC2 node and capture DNS traffic for analysis. Notes:
Regarding ndots: You can add a dot to the end of your domain name, this way it will be treated as FQDN and local search will never be attempted
arale-ng.cyw3ljy98zq7.eu-west-1.rds.amazonaws.com..We see the same intermittent DNS resolution issues in all clusters. In our case it’s a Python application and we are failing to resolve external domains. It isn’t related to kube-dns autoscaling events because we are running a ridiculously high but fixed number of kube-dns pods. We are also not hitting conntrack limits.
Kubernetes 1.9 on AWS, Networking is kubenet, same results with kube-dns:1.14.9 and kube-dns:1.14.5.
I observed similar symptoms today.
This is running with flannel, k8s 1.7.6
I think what’s odd is that this continued for days. I have three working theories:
One more data point - I was able to simulate an exit by
kubectl exec-ing into the dnsmasq pod and doing akill 1. This did cause the pod to restart, kube-proxy loggeddeleting connection tracking state..., but no application level errors were visible. To me that points away from the connection flush towards a bad node or pod, but I’m just guessing at this point.Next time this happens my plan is:
@YoniTapingo I run this little script from my container entrypoint:
You could also do it in a preStart lifecycle hook if you have root or sudo.
@joekohlsdorf that’s a quick win!! thank you for the tip.
Sorry for the noise, I sent the comment a few times by accident while editting it.
Hello @ApsOps I’m a coworker of @joanfont. What you mention is true but it should not be neccessary to change it, there is indeed a problem with how kubernetes is handling the DNS resolution. In the attached pcap file you can see how for the same DNS name kubernetes some times makes the correct query and other times it decides to use search domains.
In the pcap you can see the following with more detail, but here is a short version:
Pod asks for
arale-ng.cyw3ljy98zq7.eu-west-1.rds.amazonaws.comAt the host level we see this queries going to aws dns servers (10.0.0.2).
The pod is performing always the same query and dnsmasq is sometimes doing the right thing (forwarding the query “as is”) and other times is deciding to apply search domains on it. I think the behavior of ndots is consistent and does not explain this problem.
I have a similar problem. My application throws an exception for not being able to resolve database (Amazon RDS) host. Analyzing the DNS traffic produced at the time the exception is thrown, I can see that the previous DNS query resolves correctly to the RDS host (internal IP) but the following queries do not resolve correctly.
First they should try to resolve the hostname, and then, if this query fails, search domains should be used to try to resolve hostname. What happens here is that first query, using only the hostname, is not performed, and the first query is tried is using search domains. q
I’ve attached the filtered pcap where you can see the first query that is performed correctly and then the failed queries.
dns_filtered.pcap.zip
Regarding scalability, pod updates include all of the pod spec + status. The endpoints change less frequently as it is only when the IPs for Pods selected by services change.
TCP has the advantage of knowing when a conntrack entry can be deleted (FIN, RST etc)
If you look at the commit that added the conntrack removal, it was to solve the bug caused in the opposite direction, which was that a client with constant UDP traffic from the same socket will never switch over to a live endpoint due to the conntrack entry refresh (packets would go to a blackhole). We could delay the conntrack entry removal by looking at pod state (i.e graceful termination period), but kube-proxy does not have the information about pods (it would need to watch all pods and this is a major scalability issue).
One hack would be to introduce a small delay (say ~ 1 - 5 seconds) between iptables update and removal of conntrack entries. Most UDP protocols would respond in the allotted time so the existing responses can come back. Most DNS client libraries I have looked at use a new socket for each request, so the new requests post iptables update would not go to the removed backend.
I’m back to my original thinking on this, that kube-proxy should not be deleting UDP connections immediately on endpoint removal. Imagine if it did this for TCP connections - the shutdown grace period becomes useless. It needs to wait some period of time to give the terminating pods time to gracefully stop things. I see at least a few scenarios:
So an idea would be to detect (1) somehow, and then delay the connection entry deletion in that case (like check if the associated pod is in terminating status before removing).
Replacing Kube-DNS with CoreDNS resulted in the same bahaviour… Looks like the issue isn’t with DNS servers. The issue must be higher up in the Kubernetes DNS middleware.
@cmluciano we use the openjdk and the default for networkaddress.cache.ttl is 30 seconds according to https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/share/classes/sun/net/InetAddressCachePolicy.java#L48. I verified by capturing traffic from a java app that is just doing a dns lookup in a loop for kinesis.us-east-1.amazonaws.com. I see requests hit the wire about every 30 seconds even though the loops are at 10 second intervals. Increasing this to 60 seconds may lighten the load on the name servers but dnsmasq is still refusing queries occasionally.