coredns: coredns doesn't perform better despite having more cores
We are running CoreDNS 1.9.3 (retrieved from the official releases on GitHub), and have been having difficulty with increasing performance of a single instance of coredns.
With GOMAXPROCS set to 1, we observe ~60k qps and full utilization of one core.
With GOMAXPROCS set to 2, we seem to hit a performance limit of ~90-100k qps, but it consumes almost entirely two cores.
With GOMAXPROCS set to 4, we observe that coredns will use all 4 cores - but throughput does not increase, and latency seems to be the same.
With GOMAXPROCS set to 8-64, we observe the same CPU usage and throughput.
We have the following corefile:
.:55 {
file db.example.org example.org
cache 100
whoami
}
db.example.org
$ORIGIN example.org.
@ 3600 IN SOA sns.dns.icann.org. noc.dns.icann.org. 2017042745 7200 3600 1209600 3600
3600 IN NS a.iana-servers.net.
3600 IN NS b.iana-servers.net.
www IN A 127.0.0.1
IN AAAA ::1
We are using dnsperf
: https://github.com/DNS-OARC/dnsperf
And the following command:
dnsperf -d test.txt -s 127.0.0.1 -p 55 -Q 10000000 -c 1 -l 10000000 -S .1 -t 8
test.txt:
www.example.com AAAA
Is there anything we could be missing?
Thanks!
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 2
- Comments: 22 (13 by maintainers)
yes @lobshunter that is correct. I think lwn article explain the improvements and few caveats (esp. with TCP) of using
SO_REUSEPORT
option. Last week, I had validated the improvements by simply starting multiple servers on same port (as we’ve already set above option atListenPacket
as seen here) after making following code changes:Essentially, I’ve just exposed an env var
NUM_SOCK
representing no. of socket (therebyservers
) one wants to use for serving requests. For validating the improvements, I’ve used similar Corefile as mentioned at issue description above:1. With single listen socket, I’m able to achieve ~130K qps throughput from dnsperf on some private cloud instance.
2. With two listen socket, I’m able to achieve ~235K qps throughput from dnsperf.
3. With 4 listen socket, I’m able to achieve ~400K qps throughput from dnsperf.
So, I think bottleneck was indeed due to throughput limitation on single socket & we are able to scale throughput almost linearly as we increase no. of listen socket. I’ll create a pull request after validating the tcp traffic (non tls based) as I gets some more time. Thanks.
A memo: I found an interesting approach that uses
SO_REUSEPORT
and multiplenet.ListenUDP
call. According to the author’s benchmark, it outperforms the solution of single listen, multiple ReadFromUDP.I shall give it a try when I got time.
I could try to find a way. But I do agree with the idea of redis team:
scaling horizontally is paramount
, and CoreDNS can scale horizontally pretty well. So it’s not a critical issue that it doesn’t scale vertically.PS: @Lobshunter86 is me, too.
Could be something like that.
Generally, if giving more CPU doesn’t fix it, it is because you are hitting other bottlenecks. The question is whether those are in the CoreDNS code (for example, some mutex contention or somethign), or in the underlying OS or hardware. In this case it looks like writing to the UDP socket. Look into tuning UDP performance on your kernel. You may want to look at your UDP write buffer sizes, for example.