performance: Perfomance regression node 16.20.0 -> 18.16.0
Running a simple HTTP graphql server and following the guide https://nodejs.org/en/docs/guides/simple-profiling
- What factors or changes could be taken into consideration to account for this perfomance regression?
- Internal profilings show no significant difference and its mostly notable on request/response times
------------- NODE 16----------------
This is ApacheBench, Version 2.3 <$Revision: 1903618 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Finished 250 requests
Server Software:
Server Hostname: localhost
Server Port: 3100
Document Path: /graphql
Document Length: 34 bytes
Concurrency Level: 20
Time taken for tests: 3.203 seconds
Complete requests: 250
Failed requests: 0
Keep-Alive requests: 250
Total transferred: 75000 bytes
Total body sent: 367750
HTML transferred: 8500 bytes
Requests per second: 78.06 [#/sec] (mean)
Time per request: 256.227 [ms] (mean)
Time per request: 12.811 [ms] (mean, across all concurrent requests)
Transfer rate: 22.87 [Kbytes/sec] received
112.13 kb/s sent
135.00 kb/s total
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.2 0 1
Processing: 186 224 60.2 208 451
Waiting: 186 224 60.2 208 451
Total: 186 224 60.4 208 452
Percentage of the requests served within a certain time (ms)
50% 208
66% 213
75% 215
80% 217
90% 224
95% 424
98% 440
99% 447
100% 452 (longest request)
-------------NODE 18----------------
This is ApacheBench, Version 2.3 <$Revision: 1903618 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Finished 250 requests
Server Software:
Server Hostname: localhost
Server Port: 3100
Document Path: /graphql
Document Length: 34 bytes
Concurrency Level: 20
Time taken for tests: 29.894 seconds
Complete requests: 250
Failed requests: 0
Keep-Alive requests: 250
Total transferred: 75000 bytes
Total body sent: 367750
HTML transferred: 8500 bytes
Requests per second: 8.36 [#/sec] (mean)
Time per request: 2391.480 [ms] (mean)
Time per request: 119.574 [ms] (mean, across all concurrent requests)
Transfer rate: 2.45 [Kbytes/sec] received
12.01 kb/s sent
14.46 kb/s total
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.2 0 1
Processing: 191 345 1682.1 219 26814
Waiting: 191 345 1682.0 219 26813
Total: 191 345 1682.1 219 26814
Percentage of the requests served within a certain time (ms)
50% 219
66% 226
75% 229
80% 235
90% 257
95% 470
98% 473
99% 473
100% 26814 (longest request)
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 1
- Comments: 16 (7 by maintainers)
I created a sample case for the issue of a simple
http
server and testing different response times and node version:Code:
The tests were run using
The label on the axis is
{$concurrency}-{$requests}
and the y-axis isrequests/sec
from theab
outputIt is clear that at a high number of concurrency requests there is a significant performance regression of a simple
http
server.The difference for
16.20.0
and18.16.0
is around ~5% slower when loading the requests with 150 concurrency and over 200K requestsI’ve been looking into the v20 regression.
The issue seems to be caused by this libuv commit which makes libuv to stop accepting concurrent connections in a loop and defers every next incoming connection handling to the next loop iteration. Reverting it, makes v20 to be on par with v18.
As explained in the commit, the main reason for this change is that in most of the scenarios we save an extra
accept()
call. Are we sure the scenario that this benchmark is showing is representative of a real-life scenario or is more an academic thing? Just asking so we might consider reverting that commit./cc @bnoordhuis, what are your thoughts?
UPDATE: Ran the tests now again with the latest version of node 20:
20.6.0
, node 18 :18.17.1
and node 16:16.20.2
and more combinations of{$concurrency}-{$requests}
@RafaelGSS Yeah it was a dedicated machine:
@LuisFros The regression between 18 and 20 is quite different from the ones I measured some time ago (I didn’t publish it anywhere yet). Would you mind sharing the hardware information? Was it a dedicated machine?