dnsperf: -Q max_qps limit does not work as advertised
Version affected: 2.10.0
First, definition of -Q in the manual page:
-Q max_qps Limits the number of requests per second. There is no default limit.
(Emphasis added by me.)
This does not match the real behavior of that feature. Right now the code looks like this:
/* Rate limiting */
if (tinfo->max_qps > 0) {
run_time = now - times->start_time;
req_time = (MILLION * stats->num_sent) / tinfo->max_qps;
if (req_time > run_time) {
usleep(req_time - run_time);
now = perf_get_time();
continue;
}
}
It calculates when the next packet should be sent based on total number of packets, not number of packets in given second.
Consequently, if the server is slower than specified -Q for a period of time, unsent packets “accumulate” and then the -Q value is overshot, possibly very significantly.
Here is example of such test run in table. Columns 2, 3, 4 are number of packets as reported by dnsperf. See time 25 s and following.
| time [s] | sent | recvd | timeouts |
|---|---|---|---|
| 1 | 100021 | 99673 | 0 |
| 2 | 100031 | 100024 | 0 |
| 3 | 100028 | 100027 | 0 |
| 4 | 100027 | 100031 | 0 |
| 5 | 100028 | 100030 | 0 |
| 6 | 100023 | 100023 | 323 |
| 7 | 100022 | 100023 | 0 |
| 8 | 100018 | 100015 | 0 |
| 9 | 100025 | 100020 | 0 |
| 10 | 100019 | 100022 | 0 |
| 11 | 100130 | 100134 | 0 |
| 12 | 100026 | 100026 | 0 |
| 13 | 100028 | 100033 | 0 |
| 14 | 100018 | 100018 | 0 |
| 15 | 100032 | 100025 | 0 |
| 16 | 100023 | 100027 | 0 |
| 17 | 100026 | 100017 | 0 |
| 18 | 100022 | 100028 | 0 |
| 19 | 100025 | 100020 | 0 |
| 20 | 100026 | 100025 | 0 |
| 21 | 100034 | 100039 | 0 |
| 22 | 100026 | 100029 | 0 |
| 23 | 100028 | 100014 | 0 |
| 24 | 100023 | 100040 | 0 |
| 25 | 100028 | 100023 | 0 |
| 26 | 51631 | 50659 | 0 |
| 27 | 0 | 0 | 0 |
| 28 | 0 | 0 | 0 |
| 29 | 0 | 0 | 0 |
| 30 | 0 | 0 | 0 |
| 31 | 9383 | 8383 | 1000 |
| 32 | 0 | 0 | 0 |
| 33 | 0 | 0 | 0 |
| 34 | 0 | 0 | 0 |
| 35 | 0 | 0 | 0 |
| 36 | 9388 | 8388 | 1000 |
| 37 | 0 | 0 | 0 |
| 38 | 0 | 0 | 0 |
| 39 | 0 | 0 | 0 |
| 40 | 0 | 0 | 0 |
| 41 | 9349 | 8350 | 1000 |
| 42 | 9 | 8 | 0 |
| 43 | 0 | 0 | 0 |
| 44 | 0 | 0 | 0 |
| 45 | 0 | 0 | 0 |
| 46 | 98089 | 98080 | 999 |
| 47 | 215355 | 215341 | 1 |
| 48 | 214683 | 214645 | 0 |
| 49 | 214870 | 214909 | 0 |
| 50 | 214906 | 214837 | 0 |
| 51 | 215887 | 215965 | 0 |
| 52 | 219043 | 218387 | 0 |
| 53 | 209458 | 209342 | 0 |
| 54 | 213390 | 213553 | 0 |
| 55 | 215100 | 215093 | 0 |
| 56 | 214460 | 214249 | 0 |
| 57 | 214883 | 215026 | 609 |
| 58 | 213354 | 213398 | 0 |
| 59 | 212175 | 211942 | 0 |
Simplest reproducer is to run dnsperf with -Q against an echo server, kill the server in the middle of test, and then restart it. dnsperf will send more queries as it tries to “catch up”.
$ yes '. A' | dnsperf -O suppress=timeout -t 1 -Q 100000 -l 10 -s 127.0.0.1 -S1 -p 5353
DNS Performance Testing Tool
Version 2.10.0
[Status] Command line: dnsperf -O suppress=timeout -t 1 -Q 100000 -l 10 -s 127.0.0.1 -S1 -p 5353
[Status] Sending queries (to 127.0.0.1:5353)
[Status] Started at: Thu Feb 2 16:12:30 2023
[Status] Stopping after 10.000000 seconds
1675350751.204563: 99992.803613
1675350752.205596: 100005.694118
1675350753.206756: 34156.378601
1675350754.207943: 0.000000
1675350755.209119: 0.000000
1675350756.210197: 0.000000
1675350757.211227: 226568.634307
1675350758.211343: 324562.350767
1675350759.211482: 114571.074621
[Status] Testing complete (time limit)
Statistics:
Queries sent: 999995
Queries completed: 999595 (99.96%)
Queries lost: 400 (0.04%)
Response codes: NOERROR 999595 (100.00%)
Average packet size: request 17, response 17
Run time (s): 10.000020
Queries per second: 99959.300081
Average Latency (s): 0.000160 (min 0.000005, max 0.000604)
Latency StdDev (s): 0.000142
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 29 (29 by maintainers)
Commits related to this issue
- Fix dnsperf -Q to not overshot target value - `dnsperf`: Fix #224 to not overshot max QPS when server sometimes timeouts — committed to pspacek/dnsperf by pspacek a year ago
- Fix dnsperf -Q to not overshot target value - `dnsperf`: Fix #224 to not overshot max QPS when server sometimes timeouts For mysterious reasons usleep() was too slow when called in tight loop. I'v... — committed to pspacek/dnsperf by pspacek a year ago
- Fix dnsperf -Q to not overshot target value - `dnsperf`: Fix #224 to not overshot max QPS when server sometimes timeouts For mysterious reasons usleep() was too slow when called in tight loop. I'v... — committed to pspacek/dnsperf by pspacek a year ago
- Fix dnsperf -Q to not overshot target value - `dnsperf`: Fix #224 to not overshot max QPS when server sometimes timeouts For mysterious reasons usleep() was too slow when called in tight loop. I'v... — committed to pspacek/dnsperf by pspacek a year ago
- qps_threshold_wait - `dnsperf`: Fix #224: - Add man-page about `qps_threshold_wait` option - Refactor QPS overshot fix to have less change to upstream - Add `q_sent` check over a 1 second time ... — committed to pspacek/dnsperf by jelu a year ago
sorry @pspacek, yet another update to the branch 😃