coredns: Problems with latest 1.8.0
Hello Folks, last week we update our prod environments to 1.8.0 from 1.6.9 and unfortunately we had to rollback, I did try to debug the issue but since was prod I didn’t have enough time to get to the bottom of it. Here the symptoms:
- DNS queries (A type) for entries with high number of endpoints (>200) taking ~35ms and very often failing and coreDNS logging the following error:
[ERROR] plugin/errors: 2 <MyDNS>. A: dns: overflow unpacking uint16
[ERROR] plugin/errors: 2 <MyOtherDNS>. A: dns: overflow unpacking uint32
-
7X Number of DNS requests to the upsteam DNS server
-
DNS over TCP requests
I did a tcpdump but it looks inconclusive, but is is clear that each request is retransmitted:
this is the coreDNS config:
Corefile: |-
.:53 {
bind 192.168.0.1
errors
health :8081 {
lameduck 5s
}
kubernetes cluster.local. in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf
loop
cache 30
loadbalance
reload
}
my-internal.domain {
bind 192.168.0.1
errors
cache 30 {
prefetch 2 1m 20%
}
forward . 127.0.0.1:8600 172.30.234.20:8600 {
policy sequential
}
}
I gave a look at changes from 1.6.9 and 1.8.0 and i noticed some changes/fix on EDNS0 that’s my main suspect atm. I tried to reproduce the issue in staging without luck, so the QPS seems to be another factor.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 43 (21 by maintainers)
[ Quoting notifications@github.com in “Re: [coredns/coredns] Problems with…” ]
that is very likely not the issue