central: KnextTimeoutError: The pool is full
Problem
Users can’t log into ODK Central 1.0.1, login times out with a diplomatic “the server received an invalid error” message.
docker-compose logs --tail=100 -f shows on login attempts:
}KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
service | at Client_PG.acquireConnection (/usr/odk/node_modules/knex/lib/client.js:349:26) {
service | name: 'KnexTimeoutError',
service | sql: undefined,
service | bindings: undefined
That looks like too many open db connections timing out.
Environment
ODK Central 1.0.1 running via docker-compose using custom db and mail server. The server is used for ruODK unit tests from GH Actions and Appveyor. This means that 15 ruODK instances send the same 461 unit tests at 1200 daily. The requests from the instances are staggered through the differing build times. The server is also used for production campaigns, receiving a few 100 records daily within a few hours in the late morning.
Solution
I didn’t have much time to debug this, so I’ve upgraded and restarted ODK Central which fixed the issue by resetting the db connection pool. I am not sure whether the root cause of the error is address though.
Working versions:
versions:
70c200232acfca99da484bb5f15f67e1f5857c90
4732f7112a286165241aaf7f971f2c2e38d6bb8a client (v1.0.0)
e9ffd2c0c3aa1a9475852e1397b8259e2b03165a server (v1.0.3)
Error search
I’m using an external postgres instance (internal policy, it’s backed up, got plenty of storage and grunt).
The config has one extra parameter: "ssl": {"rejectUnauthorized": false}.
"database": {
"host": "${DBHOST}",
"user": "${DBUSER}",
"password": "${DBPASS}",
"database": "${DBNAME}","
"ssl": {"rejectUnauthorized": false}
},
ODK Central backend uses knex 0.21
The same issue has been reported by others. A fix has been reported by upgrading knex to 0.21.1 and pg to 8.0.3 here.
The only other mention of the knex connection pool is at https://github.com/getodk/central-backend/issues/255#issuecomment-606228073.
Is this info enough to triage?
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 15 (6 by maintainers)
No one reported any problems, so far so good! The server is one of two production servers, currently running:
Our ETL runs near daily and scrapes
with all attachments (downloads only if new). There are a handful of other projects with far fewer submissions and access traffic as well. ruODK unit tests run daily. I would expect fewer users to log in day to day now, as all data pipelines are automated.
I haven’t run into the problem since and will close this issue as resolved. The context and explanations from @ln will be very valuable for others possibly getting stuck with the same problem.