coredns: "serve_stale" option in the "cache" plugin behaves incorrectly

What happened: serve_stale does not update NXDOMAIN status if it gets constantly hammered by requests.

What you expected to happen: CoreDNS updates the record once an upstream DNS starts to return an A record after returning NXDOMAIN.

How to reproduce it (as minimally and precisely as possible): Begin the exercise by repeatedly hammering a CoreDNS instance with requests to the non-existent domain. It correctly returns NXDOMAIN.

while sleep 0.5; do dig test.default.svc.cluster.local @169.254.20.10; done

Create an appropriate (I’ve just created a Service in Kubernetes) A record on the upstream DNS. Verify:

$ dig test.default.svc.cluster.local. @192.168.0.10 +short
10.10.50.53

The aforementioned while loop will return NXDOMAIN indefinitely.

Removing the serve_stale option alleviates the issue.

Anything else we need to know?:

Notice, that these tests are performed not against the primary CoreDNS of a Kubernetes cluster, but against a secondary one that forwards requests to the primary (a node-level caching mechanism).

Environment:

  • the version of CoreDNS: 1.6.6
  • Corefile:
.:53 {
  errors {
    consolidate 10s ".* i/o timeout$"
    consolidate 10s ".* write: operation not permitted$"
  }
  cache {
    success 39936
    denial 9984
    prefetch 10 1m 25%
    serve_stale
  }
  reload 2s
  loop
  bind 192.168.0.10 169.254.20.10
  forward . 192.168.0.10 192.168.0.10 192.168.0.10 {
    max_fails 0
  }
  prometheus 127.0.0.1:9254
  health 127.0.0.1:9225
}
  • logs, if applicable:
  • OS (e.g: cat /etc/os-release):
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (10 by maintainers)

Commits related to this issue

Most upvoted comments

Default TTL for denial of existence responses is 1800 according to the docs and the example config doesn’t overwrite TTLs.

I haven’t verified that this is what happens here but it looks to be the expected behavior, independent of using serve_stale or not.