dns: kube-dns never resolves if a domain returns NOERROR with 0 answer records once
tl;dr If a nameserver replies status=NOERROR with no answer section to a DNS A question, kube-dns always caches this result. If the domain name actually gets an A record after it’s queried through kube-dns, it never (I waited a few days) resolves from the pods, but does resolve outside the container (e.g. on my laptop) just fine.
Repro steps
Prerequisites
- Have a domain name
alp.imand the nameservers are pointed to CloudFlare. - Have nslookup/dig installed on your workstation.
- Have a minikube cluster ready on your workstation
- running kubernetes v1.6.0
- kube-dns comes by default, running
gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.1
Step 1: Domain does not exist, query from your laptop
Note ANSWER: 0, and status: NOERROR
$ dig A z.alp.im
; <<>> DiG 9.8.3-P1 <<>> A z.alp.im
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64978
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;z.alp.im. IN A
;; AUTHORITY SECTION:
alp.im. 1799 IN SOA ivan.ns.cloudflare.com. dns.cloudflare.com. 2025042470 10000 2400 604800 3600
;; Query time: 196 msec
;; SERVER: 2401:fa00:fa::1#53(2401:fa00:fa::1)
;; WHEN: Thu Jun 29 10:51:35 2017
;; MSG SIZE rcvd: 99
Step 2: Domain does not exist, query from Pod on Kubernetes
Start a toolbelt/dig container with shell and run the same query:
⚠️ Do not exit this container as you will reuse it later.
Note the response is the same, ANSWER: 0 and NOERROR.
$ kubectl run -i -t --rm --image=toolbelt/dig dig --command -- sh
If you don't see a command prompt, try pressing enter.
/ # dig A z.alp.im
; <<>> DiG 9.11.1-P1 <<>> A z.alp.im
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11209
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;z.alp.im. IN A
;; AUTHORITY SECTION:
alp.im. 1724 IN SOA ivan.ns.cloudflare.com. dns.cloudflare.com. 2025042470 10000 2400 604800 3600
;; Query time: 74 msec
;; SERVER: 10.0.0.10#53(10.0.0.10)
;; WHEN: Thu Jun 29 17:55:46 UTC 2017
;; MSG SIZE rcvd: 99
(Also note that SERVER: 10.0.0.10#53 which is kube-dns.)
Step 3: Create an A record for the domain
Here I use CloudFlare as it manages my DNS.

Step 4: Test DNS record from your laptop
Run dig on your laptop (note ;; ANSWER SECTION: and 8.8.8.8 answer):
$ dig A z.alp.im
; <<>> DiG 9.8.3-P1 <<>> A z.alp.im
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37570
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;z.alp.im. IN A
;; ANSWER SECTION:
z.alp.im. 299 IN A 8.8.8.8
;; Query time: 196 msec
;; SERVER: 2401:fa00:fa::1#53(2401:fa00:fa::1)
;; WHEN: Thu Jun 29 10:54:44 2017
;; MSG SIZE rcvd: 53
Step 5: Test DNS record from Pod on Kubernetes
Run the same command again:
/ # dig A z.alp.im
; <<>> DiG 9.11.1-P1 <<>> A z.alp.im
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45420
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;z.alp.im. IN A
;; Query time: 0 msec
;; SERVER: 10.0.0.10#53(10.0.0.10)
;; WHEN: Thu Jun 29 18:00:24 UTC 2017
;; MSG SIZE rcvd: 37
Note the diff:
- still
ANSWER: 0andstatus: NOERROR(but it resolves just fine outside the cluster) ;; AUTHORITY SECTION:disappeared andAUTHORITY:changed to0from the previous time we ran this.;; Query time: 0 msec(was 79 ms) –I assume it means it’s just a cached response.- Query time stays as 0 ms no matter how many times I run the same command.
What else I tried
-
Try it on GKE: I tried with k8s v1.5.x and v1.6.4. → Same issue. (cc: @bowei)
-
Query from a different pod on minikube: I started a new Pod and queried from there → Same issue.
-
Restart kube-dns Pod → This worked on GKE, but not on minikube.
$ kubectl delete pods -n kube-system -l k8s-app=kube-dns pod "kube-dns-268032401-69xk5" deleted
Impact
I am not sure why this has not been discovered before. I noticed this behavior while using kube-lego on GKE. Once kube-lego applies for a TLS certificate, it polls the domain name of the service (e.g. example.com/.well-known/<token>) before asking Let’s Encrypt to validate it. Before I create an Ingress with kube-lego annotation, I don’t have the external IP yet so I can’t configure the domain, but the kube-lego Pod already picks it up and starts querying my domain in an infinite loop. It never succeeds because first time it looked up the hostname, the A record didn’t exist, so that result is cached forever. After I add A record, it still can’t resolve. The moment I delete kube-dns Pods and they get recreated, it immediately starts working and resolves the hostname and completes the kube-lego challenge.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 22 (14 by maintainers)
Looks like
- --no-negcacheadded to the dnsmasq args ought to do it.Credit to https://rsmitty.github.io/KubeDNS-Tweaks/
Why did we disable neg-caching as default instead of setting reasonable TTL value with
--neg-ttl=600? With huge amount of queries in kubernetes related to ndots settings this would have negative impact.I will try playing with dnsmasq flags and see if we can change its negative caching behavior.
@miekg I think we don’t know what will this change break. However, unless changed, many software that rely on domains eventually resolving stays broken. I’m not sure if we have enough tools to answer this question properly.
RCODE=0 with no response is the NODATA pseudo-rcode. For the purpose of caching, it shouldn’t be treated differently from NXDOMAIN with one exception - it doesn’t say anything about non-existence of names below the requested name. See https://tools.ietf.org/html/rfc2308#section-2.2 for guideline. It’s possibly related to https://github.com/miekg/dns/issues/428