uptime-kuma: Shrinking databse/Blocking database opreations Give False Downtime

⚠️ Please verify that this bug has NOT been raised before.

  • I checked and didn’t find similar issue

🛡️ Security Policy

Description

One of my monitors said it was down because: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?

I was deleting a monitor, probably with a lot of data (i had history set for 365 previously. insane, i know), so deleting took a long time, which then caused the monitor being down.

The issue can also be caused by a manually triggered shrink database operation.

Related

👟 Reproduction steps

  1. Have some monitors (any should be fine, just fast enough like at most 20 seconds)
  2. Have a large database (say >512MB)
  3. Shrink database (Settings > Monitor History > Shrink database)
  4. Experience behavior

👀 Expected behavior

The monitors will continue as “up” and saving the correct data later (if needed).

😓 Actual Behavior

The monitors are considered “down” because a blocking database operation is happening.

🐻 Uptime-Kuma Version

1.19.0

💻 Operating System and Arch

macOS 13.1

🌐 Browser

LibreWolf 108.0.1-1

🐋 Docker Version

No response

🟩 NodeJS Version

v16.18.1

📝 Relevant log output

Dec 25 12:36:43 laptop-server npm[1618271]: 2022-12-25T11:36:43Z [RATE-LIMIT] INFO: remaining requests: 20
Dec 25 12:37:06 laptop-server npm[1618271]: 2022-12-25T11:37:06Z [MONITOR] WARN: Monitor #6 'mastodon/mcrblgng (micro.)': Pending: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
| Max retries: 12 | Retry: 1 | Retry Interval: 60 seconds | Type: keyword
Dec 25 12:37:10 laptop-server npm[1618271]: 2022-12-25T11:37:10Z [MONITOR] WARN: Monitor #7 'peertube (videos.)': Pending: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? | Max re
tries: 12 | Retry: 1 | Retry Interval: 30 seconds | Type: keyword
Dec 25 12:37:10 laptop-server npm[1618271]: 2022-12-25T11:37:10Z [MONITOR] WARN: Monitor #40 'conduit (conduit.hazmat.)': Pending: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
| Max retries: 15 | Retry: 1 | Retry Interval: 60 seconds | Type: keyword
Dec 25 12:37:11 laptop-server npm[1618271]: 2022-12-25T11:37:11Z [MONITOR] WARN: Monitor #36 'unbound DNS server (telemetry)': Pending: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) c
all? | Max retries: 2 | Retry: 1 | Retry Interval: 30 seconds | Type: keyword
Dec 25 12:37:12 laptop-server npm[1618271]: Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
Dec 25 12:37:12 laptop-server npm[1618271]:     at Client_SQLite3.acquireConnection (/home/uptime/uptime-kuma/node_modules/knex/lib/client.js:305:26)
Dec 25 12:37:12 laptop-server npm[1618271]:     at async Runner.ensureConnection (/home/uptime/uptime-kuma/node_modules/knex/lib/execution/runner.js:259:28)
Dec 25 12:37:12 laptop-server npm[1618271]:     at async Runner.run (/home/uptime/uptime-kuma/node_modules/knex/lib/execution/runner.js:30:19)
Dec 25 12:37:12 laptop-server npm[1618271]:     at async RedBeanNode.findOne (/home/uptime/uptime-kuma/node_modules/redbean-node/dist/redbean-node.js:515:19)
Dec 25 12:37:12 laptop-server npm[1618271]:     at async Function.handleStatusPageResponse (/home/uptime/uptime-kuma/server/model/status_page.js:23:26)
Dec 25 12:37:12 laptop-server npm[1618271]:     at async /home/uptime/uptime-kuma/server/routers/status-page-router.js:16:5 {
Dec 25 12:37:12 laptop-server npm[1618271]:   sql: undefined,
Dec 25 12:37:12 laptop-server npm[1618271]:   bindings: undefined
Dec 25 12:37:12 laptop-server npm[1618271]: }
Dec 25 12:37:12 laptop-server npm[1618271]:     at process.<anonymous> (/home/uptime/uptime-kuma/server/server.js:1779:13)
Dec 25 12:37:12 laptop-server npm[1618271]:     at process.emit (node:events:513:28)
Dec 25 12:37:12 laptop-server npm[1618271]:     at emit (node:internal/process/promises:140:20)
Dec 25 12:37:12 laptop-server npm[1618271]:     at processPromiseRejections (node:internal/process/promises:274:27)
Dec 25 12:37:12 laptop-server npm[1618271]:     at processTicksAndRejections (node:internal/process/task_queues:97:32)
Dec 25 12:37:13 laptop-server npm[1618271]: If you keep encountering errors, please report to https://github.com/louislam/uptime-kuma/issues
Dec 25 12:37:13 laptop-server npm[1618271]: 2022-12-25T11:37:13Z [MONITOR] WARN: Monitor #44 'prometheus (prometheus.)': Pending: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? |
 Max retries: 12 | Retry: 1 | Retry Interval: 60 seconds | Type: keyword
Dec 25 12:37:14 laptop-server npm[1618271]: 2022-12-25T11:37:14Z [AUTH] INFO: Successfully logged in user jackson. IP=176.241.52.131
Dec 25 12:37:15 laptop-server npm[1618271]: 2022-12-25T11:37:15Z [RATE-LIMIT] INFO: remaining requests: 20
Dec 25 12:37:19 laptop-server npm[1618271]: 2022-12-25T11:37:19Z [MONITOR] WARN: Monitor #34 'ntfy localhost': Failing: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? | Interval:
 20 seconds | Type: http | Down Count: 0 | Resend Interval: 15
Dec 25 12:37:43 laptop-server npm[1618271]: 2022-12-25T11:37:43Z [RATE-LIMIT] INFO: remaining requests: 20

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 19 (9 by maintainers)

Most upvoted comments

@louislam if you’re considering supporting other databases, i would personally suggest considering postgresql and mysql and the pros/cons

i don’t have mysql on my server, because nothing uses it. pretty much everything i run (peertube, mastodon, synapse (matrix homeserver)) are using postgres.

anyways, here’s some already existing issues/comments:

MariaDB support is merged already. And I’m submitting Postgres support in #3748

But I also don’t know how to write a better description so it is how it is.

Maybe just this for now:

Trigger database VACUUM for SQLite. AUTO_VACUUM is already enabled and this action is not needed in most cases.


OK, in documentation (https://www.sqlite.org/pragma.html#pragma_auto_vacuum) we have this:

Auto-vacuum does not defragment the database nor repack individual database pages the way that the VACUUM command does. In fact, because it moves pages around within the file, auto-vacuum can actually make fragmentation worse.

IMO, we can write:

Trigger VACUUM for SQLite to defragment and repack database. Remember, AUTO_VACUUM is already enabled, but this does not defragment the database nor repack individual database pages.

or just:

Trigger database VACUUM for SQLite. AUTO_VACUUM is already enabled but this does not defragment the database nor repack individual database pages the way that the VACUUM command does.

Yea, indeed shrinking is not the same as deleting monitors. Should have read more carefully, sorry about that

@cypa Please see the performance changes we are doing in v2.0 => see https://github.com/louislam/uptime-kuma/issues/4500. While optimising the indexies might also be a way, we have chosen to do aggregation instead. The relevant PR here is https://github.com/louislam/uptime-kuma/pull/2750

=> deleting in smaller batches (i.e. allowing for other operations to sneak in) would be the way to resolve this issue.