coredns: Frequent DNS failures on ARM64 - DialWithDialer Timed Out
We’re running CoreDNS on ARM64, built with Go v1.10. Queries are served to CloudFlare encrypted DNS (1.1.1.1) which runs fine most of the time. However, they are frequently timing out. This always occurs in batches and after 5-10 seconds, the problem resolves itself. It can be several minutes before it occurs again. Possibly related, we’re seeing extended CPU spikes to 40%+.
` [/usr/local/go/src/net/dial.go:net.(*Resolver).resolveAddrList 208] err:%!(EXTRA <nil>)
[/usr/local/go/src/net/dial.go:net.(*Dialer).DialContext 390] [DEBUG] err:%!(EXTRA <nil>)
2018/11/02 19:45:38 [ERROR] 2 init.itunes.apple.com. A: EOF
[/usr/local/go/src/net/dial.go:net.(*Resolver).resolveAddrList 208] err:%!(EXTRA <nil>)
[/usr/local/go/src/net/dial.go:net.(*Dialer).DialContext 390] [DEBUG] err:%!(EXTRA <nil>)
[/usr/local/go/src/net/dial.go:net.(*Resolver).resolveAddrList 208] err:%!(EXTRA <nil>)
[/usr/local/go/src/net/dial.go:net.(*Dialer).DialContext 390] [DEBUG] err:%!(EXTRA <nil>)
2018/11/02 19:45:38 [ERROR] 2 cl4.apple.com. AAAA: tls: DialWithDialer timed out
2018/11/02 19:45:38 [ERROR] 2 gs-loc.apple.com. A: tls: DialWithDialer timed out
2018/11/02 19:45:38 [ERROR] 2 mesu.apple.com. A: tls: DialWithDialer timed out
2018/11/02 19:45:38 [ERROR] 2 gateway.icloud.com. A: tls: DialWithDialer timed out
2018/11/02 19:45:38 [ERROR] 2 wu-calculator.apple.com. A: tls: DialWithDialer timed out
2018/11/02 19:45:38 [ERROR] 2 cl5.apple.com. A: tls: DialWithDialer timed out
2018/11/02 19:45:38 [ERROR] 2 cl2.apple.com. A: tls: DialWithDialer timed out
2018/11/02 19:45:38 [ERROR] 2 init-p01md.apple.com. A: tls: DialWithDialer timed out
2018/11/02 19:45:38 [ERROR] 2 smp-device-content.apple.com. A: tls: DialWithDialer timed out
2018/11/02 19:45:38 [ERROR] 2 bag.itunes.apple.com. A: tls: DialWithDialer timed out
2018/11/02 19:45:38 [ERROR] 2 smp-device-content.apple.com. A: tls: DialWithDialer timed out
2018/11/02 19:45:38 [ERROR] 2 gspe35-ssl.ls.apple.com. A: tls: DialWithDialer timed out
2018/11/02 19:45:38 [ERROR] 2 gs-loc.apple.com. A: tls: DialWithDialer timed out
2018/11/02 19:45:38 [ERROR] 2 cl2.apple.com. A: tls: DialWithDialer timed out `
Our CoreFile:
.:53 { log stdout errors cache 300 hosts /etc/winston/hosts { fallthrough } forward . tls://1.1.1.1 { tls_servername cloudflare-dns.com health_check 5s } fallback SERVFAIL . 8.8.8.8:53 }
We’re not sure how to go about diagnosing this. Any suggestions welcome.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 57 (28 by maintainers)
[ Quoting notifications@github.com in “Re: [coredns/coredns] Frequent DNS …” ]
So this might be because the host plugin was never designed for hosts file with 80k lines. But regardless of that. I think this may happen when we re-read the file and take a writelock.