dns: High level of i/o timeout erros in nodelocaldns pod when coredns pod on same node

Environment

coredns version 1.8.0 nodelocaldns version v1.17.0 kubernetes version : 1.20.8 kube-proxy in ipvs mode

Config

configmap:

data:
  Corefile: |
    .:53 {
        errors
        cache {
                success 9984 30
                denial 9984 5
        }
        reload
        loop
        bind 169.254.20.10
        forward . 192.168.128.10 {
                force_tcp
        }
        prometheus :9253
        health 169.254.20.10:8080
        }

Issue Description

We are seeing an issue with a high level of i/o timeout errors in the node-local-dns daemonset pods on those nodes where the coredns pod is also running .

Running

for pod in `kubectl -n kube-system get po -l k8s-app=node-local-dns -o name|cut -f2 -d '/'`;do echo "$(kubectl -n kube-system get po $pod -o wide --no-headers| awk '{ print $1,$7,$5 }') $(kubectl -n kube-system logs $pod|grep ERRO|wc -l)";done

When sorting above output based on error count we see massive increase in error count for those nodes where coredns is running.

Reproduce

Running a k8s job with a dnsperf pod in it - with 3 fqdns to lookup - 1 on cluster service, one service in AWS and one external service

On a node where coredns is also running

Statistics:

  Queries sent:         1500000
  Queries completed:    1500000 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 1499884 (99.99%), SERVFAIL 116 (0.01%)
  Average packet size:  request 52, response 224
  Run time (s):         300.000132
  Queries per second:   4999.997800

  Average Latency (s):  0.000260 (min 0.000044, max 1.067598)
  Latency StdDev (s):   0.012459

On a node where coredns is not also running

Statistics:

  Queries sent:         1500000
  Queries completed:    1500000 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 1500000 (100.00%)
  Average packet size:  request 52, response 224
  Run time (s):         300.000153
  Queries per second:   4999.997450

  Average Latency (s):  0.000208 (min 0.000045, max 1.082059)
  Latency StdDev (s):   0.010506

cluster service resolution vs external resolution

when test set is all k8s service fqdns - SERVFAIL errors much lower than when test set is all external fqdns.

On node where no coredns is running: 0 servfail On node where coredns is running and all on cluster fqdns: 8 servfail from 1500000 queries On node where coredns is running and all on external fqdns: 164 servfail from 1500000 queries

When we scale coredns down to one replica and run dns perf on the same node we get no servfails int he test

About this issue

Original URL
State: closed
Created 3 years ago
Reactions: 1
Comments: 31 (18 by maintainers)

Most upvoted comments

Screen Shot 2021-09-29 at 12 02 25 AM

The i/o timeout most likely means coreDNS pod did not respond in 2 seconds. NodeLocalDNS(which uses coreDNS) has a 2s read timeout for forwarded requests. It sends the request via TCP to coreDNS.

In my case, I think coreDNS is working fine (it does not make any timeout). However nodelocaldns rarely makes any connection with coreDNS service(192.168.0.3).

# conntrack -L | grep 192.168.0.3
tcp      6 108 SYN_SENT src=192.168.0.3 dst=192.168.0.3 sport=51422 dport=53 [UNREPLIED] src=192.168.89.1 dst=192.168.126.0 sport=53 dport=19944 mark=0 use=1
tcp      6 107 SYN_SENT src=192.168.0.3 dst=192.168.0.3 sport=51414 dport=53 [UNREPLIED] src=192.168.76.1 dst=192.168.126.0 sport=53 dport=45629 mark=0 use=1
tcp      6 110 SYN_SENT src=192.168.0.3 dst=192.168.0.3 sport=51438 dport=53 [UNREPLIED] src=192.168.87.1 dst=192.168.126.0 sport=53 dport=22125 mark=0 use=1
tcp      6 105 SYN_SENT src=192.168.0.3 dst=192.168.0.3 sport=51394 dport=53 [UNREPLIED] src=192.168.66.2 dst=192.168.126.0 sport=53 dport=42584 mark=0 use=1
tcp      6 111 SYN_SENT src=192.168.0.3 dst=192.168.0.3 sport=51444 dport=53 [UNREPLIED] src=192.168.117.1 dst=192.168.126.0 sport=53 dport=43203 mark=0 use=1
tcp      6 109 SYN_SENT src=192.168.0.3 dst=192.168.0.3 sport=51430 dport=53 [UNREPLIED] src=192.168.72.1 dst=192.168.126.0 sport=53 dport=48859 mark=0 use=1
tcp      6 7 TIME_WAIT src=192.168.0.3 dst=192.168.0.3 sport=50610 dport=53 src=192.168.126.1 dst=10.168.30.38 sport=53 dport=11486 [ASSURED] mark=0 use=1
tcp      6 103 SYN_SENT src=192.168.0.3 dst=192.168.0.3 sport=51372 dport=53 [UNREPLIED] src=192.168.84.1 dst=192.168.126.0 sport=53 dport=35905 mark=0 use=1
tcp      6 86392 ESTABLISHED src=192.168.0.3 dst=192.168.0.3 sport=51454 dport=53 src=192.168.126.1 dst=10.168.30.38 sport=53 dport=42359 [ASSURED] mark=0 use=1
udp      17 20 src=192.168.0.3 dst=192.168.0.3 sport=51515 dport=53 [UNREPLIED] src=192.168.84.1 dst=192.168.126.0 sport=53 dport=53046 mark=0 use=1
tcp      6 102 SYN_SENT src=192.168.0.3 dst=192.168.0.3 sport=51366 dport=53 [UNREPLIED] src=192.168.111.4 dst=192.168.126.0 sport=53 dport=39537 mark=0 use=1
tcp      6 106 SYN_SENT src=192.168.0.3 dst=192.168.0.3 sport=51404 dport=53 [UNREPLIED] src=192.168.93.1 dst=192.168.126.0 sport=53 dport=33050 mark=0 use=1
udp      17 16 src=192.168.0.3 dst=192.168.0.3 sport=49564 dport=53 [UNREPLIED] src=192.168.111.4 dst=192.168.126.0 sport=53 dport=18936 mark=0 use=1
udp      17 23 src=192.168.0.3 dst=192.168.0.3 sport=37426 dport=53 [UNREPLIED] src=192.168.124.1 dst=192.168.126.0 sport=53 dport=65084 mark=0 use=1
tcp      6 67 TIME_WAIT src=192.168.0.3 dst=192.168.0.3 sport=51056 dport=53 src=192.168.126.1 dst=10.168.30.38 sport=53 dport=55891 [ASSURED] mark=0 use=1
tcp      6 104 SYN_SENT src=192.168.0.3 dst=192.168.0.3 sport=51384 dport=53 [UNREPLIED] src=192.168.124.1 dst=192.168.126.0 sport=53 dport=58595 mark=0 use=1
udp      17 27 src=192.168.0.3 dst=192.168.0.3 sport=59777 dport=53 [UNREPLIED] src=192.168.66.2 dst=192.168.126.0 sport=53 dport=55549 mark=0 use=1
udp      17 13 src=192.168.0.3 dst=192.168.0.3 sport=32790 dport=53 [UNREPLIED] src=192.168.78.1 dst=192.168.126.0 sport=53 dport=41575 mark=0 use=1

gjkim42 on Sep 28, 2021

@rahul-paigavan In most modern K8S systems the DNS and TLS layer is abused by not properly using Keep Alive (HTTP / TCP) of the connections, this causing DNS to be queiried as well TLS to be handshaked on each request - here is a great blog post about that https://www.lob.com/blog/use-http-keep-alive in the context of NodeJS, but others are similar.

Other than that you should check and optimize the DNS zones inside the Nodelocal, look for:

instead of Nodelocal to forward each and every request to K8S DNS, you can get some zones on the fast path
you can tune the Nodelocal to properly cache and pre-fetch commonly queried stuff
you should tune and disable IPv6 if not used and possible
you should tune the ndots setting and / or use the proper DNS policy for that service https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy
you should tune and beware of Linux connection tracking and DNS https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts (as well check https://kb.isc.org/docs/aa-01183) and smartly use TCP on the relevant layers to avoid that, and of course use UDP on other places

Overall the DNS and TLS layer is the most abused and most neglected layer by Engineers, but it is the most important thing in a distrubuted, clusterized system - so put decent enough of your time and efforts in to that as you will regret it!

smoke on Sep 2, 2022

I think I have the same issue. There are quite a lot of timeout requests from nodelocaldns to coredns. (about 70% of DNS requests are timeout in worst case node) And I just realized that only nodes having coredns pod have this issue as @rtmie said.

gjkim42 on Sep 28, 2021