central: KnextTimeoutError: The pool is full

Problem

Users can’t log into ODK Central 1.0.1, login times out with a diplomatic “the server received an invalid error” message. docker-compose logs --tail=100 -f shows on login attempts:

}KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
service               |     at Client_PG.acquireConnection (/usr/odk/node_modules/knex/lib/client.js:349:26) {
service               |   name: 'KnexTimeoutError',
service               |   sql: undefined,
service               |   bindings: undefined

That looks like too many open db connections timing out.

Environment

ODK Central 1.0.1 running via docker-compose using custom db and mail server. The server is used for ruODK unit tests from GH Actions and Appveyor. This means that 15 ruODK instances send the same 461 unit tests at 1200 daily. The requests from the instances are staggered through the differing build times. The server is also used for production campaigns, receiving a few 100 records daily within a few hours in the late morning.

Solution

I didn’t have much time to debug this, so I’ve upgraded and restarted ODK Central which fixed the issue by resetting the db connection pool. I am not sure whether the root cause of the error is address though.

Working versions:

versions:
70c200232acfca99da484bb5f15f67e1f5857c90
 4732f7112a286165241aaf7f971f2c2e38d6bb8a client (v1.0.0)
 e9ffd2c0c3aa1a9475852e1397b8259e2b03165a server (v1.0.3)

Error search

I’m using an external postgres instance (internal policy, it’s backed up, got plenty of storage and grunt). The config has one extra parameter: "ssl": {"rejectUnauthorized": false}.

"database": {
      "host": "${DBHOST}",
      "user": "${DBUSER}",
      "password": "${DBPASS}",
      "database": "${DBNAME}","
      "ssl": {"rejectUnauthorized": false}
    },

ODK Central backend uses knex 0.21

The same issue has been reported by others. A fix has been reported by upgrading knex to 0.21.1 and pg to 8.0.3 here.

The only other mention of the knex connection pool is at https://github.com/getodk/central-backend/issues/255#issuecomment-606228073.

Is this info enough to triage?

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (6 by maintainers)

Most upvoted comments

No one reported any problems, so far so good! The server is one of two production servers, currently running:

versions:
647569c54f6bbf26ea356eca0d14f7e5d1a89c6b
 cddb691e40e84aabff87b9d427e22a50282d6f99 client (v1.1.2)
 a33bc6fb3c34fe38894b0e9d0bb404f81da325e6 server (v1.1.1)

Our ETL runs near daily and scrapes

<ODKC Turtle Data> accessed on 2021-03-08 20:12:20
Areas: 15
Sites: 121
Survey start points: 1315
Survey end points: 1275
Marine Wildlife Incidents (rescues, strandings): 144
Live sightings: 2
Turtle Tracks or Nests: 37556
Turtle Track Tallies: 2

with all attachments (downloads only if new). There are a handful of other projects with far fewer submissions and access traffic as well. ruODK unit tests run daily. I would expect fewer users to log in day to day now, as all data pipelines are automated.

I haven’t run into the problem since and will close this issue as resolved. The context and explanations from @ln will be very valuable for others possibly getting stuck with the same problem.