node-mysql2: Node process does not end and but CPU utilization at 100%
I have been investigating a problem that is causing trouble on our test suite and I managed to trace it to mysql2
. This script allows reproducing it:
Reproduction instructions
// create database test;
// create table test(id int(10) unsigned not null auto_increment, primary key (`id`));
// insert into test(id) values(1);
const mysql = require('mysql2/promise');
async function main() {
console.log("===========================>", "foo");
const connection = await mysql.createConnection({
host: 'localhost',
user: 'root',
database: 'test',
password: 'rootpass',
connectionLimit: 1234,
});
for (let ix = 0; ix < 157; ix++) {
await basicHanging3(connection);
}
connection.destroy();
console.log('=========================>', 'Done');
}
async function basicHanging3(connection) {
let query = 'SELECT ';
let limit = 1000;
for (let i = 0; i < limit; i++) {
query += `t.id AS t_id_${i},`;
}
query += `t.id AS t_id_${limit}`;
query += ' FROM test as t';
const [rows] = await connection.query(
query,
);
return rows;
}
main();
{
"name": "reproducer",
"version": "1.0.0",
"main": "index.js",
"license": "MIT",
"dependencies": {
"mysql2": "^3.4.2",
"yarn": "^1.22.19"
}
}
It happens with multiple node versions (v16.14.0, v18.16.0 are some of those I tested). For node 20.3.1 it happens, but you need to raise the 157 iterations.
In our tests we run thousands of DB queries which seems to be reaching this threshold and causing the process to “hung” (it eventually exits, but it takes a few minutes). As you can imagine this is a huge pain on CI because the “exit” takes as much as the test suite itself.
2 other interesting things to note:
- the CPU for the node process stays at 100% even though nothing is supposedly being done (we checked to see if it is gc using --trace-gc but no traces are output after “Done”)
- if no data is returned (you can delete the insert statement on index.js) it seems to be ok (at least I raised the number of iterations to 10k and it finished just fine)
- if you have less columns then you require more iterations to achieve the effect
Netstat shows the DB connection is closed while the process is waiting:
$ sudo netstat -alpW | grep -i <my_pid>
shows no output.
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 2
- Comments: 25 (14 by maintainers)
Also @sidorares I tested the first file as you suggested (repro1) and it reproduces.
Just tested with --noopt and it finished successfully almost immediately.
Not sure if this related. We reproduced the same kind of problem few time ago when we migrated from another mysql driver. We noticed an important increase of the cpu usage on the nodejs server and more slow queries. What was strange is when scaling horizontally or vertically the server, not the db, the slow queries decreased.
We thought that the problem came from our side since few issues were open on the repository and one of the purpose of this lib was a performance gain. But we finally reverted the change of the driver and the problem is gone.
Indeed maybe it’s when we manage many objects with many relations.
We use RDS mysql with ECS on fargate, v16.20 node version, mysql2@2.3.0 If necessary i can give other information.
since there is reliable repro without dependency on mysql2 I’ll try to “fix” the issue by converting parser to add fields in the constructorActually, I think the better solution would be to cap the number of fields in dynamically compiled parser, if the number is over the limit use “interpreting” parser instead ( e.g in each row iterate over every column and read the code based on the column type ). We used to have “interpreting” parser a while ago - https://github.com/sidorares/node-mysql2/blob/04f07f82714587049eb7cf1c92f983da34de87c5/lib/commands/query.js#L59-L69
What is strange is that process hangs with 100% cpu just before exit. I suspect some combination of GC and large number of objects with large number of fields. Profiler does not pick that and shows a very small number of ticks in the main code and in the GC, the ticks during the “frozen” part are in “unaccounted” section
The difference with 2.3.0 is highlighted in #2055, I’ll try to see if changing that part only helps.