backstage: š Bug Report: switching to node 18 causes DB timeout on startup
š Description
Hi there,
I just switched the base image to node 18 node:18-bookworm-slim in the Dockerfile and Backstage is failing to start and getting the following error:
Loaded config from app-config.yaml, app-config.production.yaml
Backend failed to start up Error: Failed to connect to the database to make sure that 'backstage_plugin_catalog' exists, KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
at /app/node_modules/@backstage/backend-common/dist/index.cjs.js:1035:17
at async CatalogBuilder.build (/app/node_modules/@backstage/plugin-catalog-backend/dist/cjs/CatalogBuilder-60d67596.cjs.js:6513:22)
at async createPlugin$8 (/app/packages/backend/dist/index.cjs.js:127:40)
at async main (/app/packages/backend/dist/index.cjs.js:383:29)
When reverting back to node:16-bullseye-slim everything works fine
š Expected behavior
Backstage should start up normally and successfully connect to postgres
š Actual Behavior with Screenshots
Backstage fails to start with the stacktrace mentioned in the Description section
š Reproduction steps
This is how the Dockerfile looks like:
FROM node:18-bookworm-slim
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt,sharing=locked \
apt-get update && \
apt-get install -y --no-install-recommends python3 g++ build-essential && \
yarn config set python /usr/bin/python3 && \
npm install -g node-gyp
# From here on we use the least-privileged `node` user to run the backend.
USER node
WORKDIR /app
# This switches many Node.js dependencies to production mode.
ENV NODE_ENV production
# Copy repo skeleton first, to avoid unnecessary docker cache invalidation.
# The skeleton contains the package.json of each package in the monorepo,
# and along with yarn.lock and the root package.json, that's enough to run yarn install.
COPY --chown=node:node yarn.lock package.json packages/backend/dist/skeleton.tar.gz backstage.json ./
RUN tar xzf skeleton.tar.gz && rm skeleton.tar.gz
# TODO: Build bundle.tar.gz (backend) also in Container?
RUN --mount=type=cache,target=/home/node/.cache/yarn,sharing=locked,uid=1000,gid=1000 \
yarn install --frozen-lockfile --production --network-timeout 300000
# Then copy the rest of the backend bundle, along with any other files we might want.
COPY --chown=node:node packages/backend/dist/bundle.tar.gz app-config.yaml ./
RUN tar xzf bundle.tar.gz && rm bundle.tar.gz
CMD ["node", "packages/backend", "--config", "app-config.yaml", "--config", "/tmp/backstage/app-config.production.yaml"]
#TODO: Mount the env specific file in k8s
š Provide the context for the Bug.
I tried also the following but unfortunately did not help:
- increase the timeout by extending the
KnexConfiginapp-config.yaml - add
npm install -g node-gypto the Dockerfile (see above)
š„ļø Your Environment
running Backstage on v1.18.4 deployed on AWS EKS with RDS postgres Aurora database
OS: Darwin 22.4.0 - darwin/arm64
node: v18.18.0
yarn: 1.22.19
cli: 0.22.13 (installed)
backstage: 1.18.4
Dependencies:
@backstage/app-defaults 1.4.3
@backstage/backend-app-api 0.5.5
@backstage/backend-common 0.19.7
@backstage/backend-dev-utils 0.1.1
@backstage/backend-openapi-utils 0.0.4
@backstage/backend-plugin-api 0.6.5
@backstage/backend-tasks 0.5.10
@backstage/catalog-client 1.4.4
@backstage/catalog-model 1.4.2
@backstage/cli-common 0.1.12
@backstage/cli-node 0.1.4
@backstage/cli 0.22.13
@backstage/config-loader 1.5.0
@backstage/config 1.1.0
@backstage/core-app-api 1.10.0
@backstage/core-components 0.12.5, 0.13.5
@backstage/core-plugin-api 1.6.0
@backstage/dev-utils 1.0.21
@backstage/errors 1.2.2
@backstage/eslint-plugin 0.1.3
@backstage/integration-aws-node 0.1.6
@backstage/integration-react 1.1.19
@backstage/integration 1.7.0
@backstage/plugin-adr-backend 0.4.2
@backstage/plugin-adr-common 0.2.15
@backstage/plugin-api-docs 0.9.11
@backstage/plugin-app-backend 0.3.53
@backstage/plugin-app-node 0.1.5
@backstage/plugin-auth-backend-module-gcp-iap-provider 0.1.2
@backstage/plugin-auth-backend-module-github-provider 0.1.2
@backstage/plugin-auth-backend-module-gitlab-provider 0.1.2
@backstage/plugin-auth-backend-module-google-provider 0.1.2
@backstage/plugin-auth-backend-module-oauth2-provider 0.1.2
@backstage/plugin-auth-backend 0.19.2
@backstage/plugin-auth-node 0.2.19, 0.3.2
@backstage/plugin-catalog-backend-module-github 0.4.3
@backstage/plugin-catalog-backend-module-scaffolder-entity-model 0.1.2
@backstage/plugin-catalog-backend 1.13.3
@backstage/plugin-catalog-common 1.0.16
@backstage/plugin-catalog-graph 0.2.36
@backstage/plugin-catalog-import 0.10.0
@backstage/plugin-catalog-node 1.4.6
@backstage/plugin-catalog-react 1.8.4
@backstage/plugin-catalog 1.13.0
@backstage/plugin-devtools-backend 0.2.2
@backstage/plugin-devtools-common 0.1.4
@backstage/plugin-devtools 0.1.4
@backstage/plugin-events-backend-module-github 0.1.15
@backstage/plugin-events-backend 0.2.14
@backstage/plugin-events-node 0.2.14
@backstage/plugin-explore-backend 0.0.15
@backstage/plugin-explore-common 0.0.2
@backstage/plugin-explore-react 0.0.31
@backstage/plugin-explore 0.4.10
@backstage/plugin-github-actions 0.6.5
@backstage/plugin-github-pull-requests-board 0.1.18
@backstage/plugin-home-react 0.1.3
@backstage/plugin-home 0.5.8
@backstage/plugin-org 0.6.14
@backstage/plugin-permission-common 0.7.8
@backstage/plugin-permission-node 0.7.16
@backstage/plugin-permission-react 0.4.15
@backstage/plugin-proxy-backend 0.4.2
@backstage/plugin-scaffolder-backend 1.17.3
@backstage/plugin-scaffolder-common 1.4.1
@backstage/plugin-scaffolder-node 0.2.5
@backstage/plugin-scaffolder-react 1.5.5
@backstage/plugin-scaffolder 1.15.0
@backstage/plugin-search-backend-module-catalog 0.1.9
@backstage/plugin-search-backend-module-explore 0.1.9
@backstage/plugin-search-backend-module-pg 0.5.14
@backstage/plugin-search-backend-module-techdocs 0.1.9
@backstage/plugin-search-backend-node 1.2.9
@backstage/plugin-search-backend 1.4.5
@backstage/plugin-search-common 1.2.6
@backstage/plugin-search-react 1.7.0
@backstage/plugin-search 1.4.0
@backstage/plugin-shortcuts 0.3.14
@backstage/plugin-tech-radar 0.6.8
@backstage/plugin-techdocs-backend 1.7.2
@backstage/plugin-techdocs-module-addons-contrib 1.1.0
@backstage/plugin-techdocs-node 1.8.2
@backstage/plugin-techdocs-react 1.1.11
@backstage/plugin-techdocs 1.7.0
@backstage/plugin-user-settings 0.7.10
@backstage/release-manifests 0.0.10
@backstage/test-utils 1.4.3
@backstage/theme 0.2.19, 0.4.2
@backstage/types 1.1.1
@backstage/version-bridge 1.0.5
š Have you spent some time to check if this bug has been raised before?
- I checked and didnāt find similar issue
š¢ Have you read the Code of Conduct?
- I have read the Code of Conduct
Are you willing to submit PR?
No, but Iām happy to collaborate on a PR with someone else
About this issue
- Original URL
- State: closed
- Created 9 months ago
- Reactions: 2
- Comments: 26 (21 by maintainers)
just did an experiment that showed an interesting result.
I created a separate small node.js app that only connects to an RDS postgres instance using
pg. When running it with node 16 it worked fine, but when running it with node 18 it returned a connection refused (same error as backstage). I suspect the problem is in the postgres engine version (currently 14.6), will try upgrading the db version and rerun the app and report back here.Thanks! Weāre off this week, Iām try to take a peek next one!
We had been using node 18 in our POC backstage for a while and seen no issues. Then I created a new cluster for prod-backstage and encountered this issue. So no, for me personally, this did not arise due to node 16 -> 18.
I was able to fix my issue by attaching the proper security group to my RDS instance. this is my config for the security group
I think a good step for future debuggers would be to try and connect via psql (I did this by creating ubuntu debug image in EKS cluster, and installing postgres client). If you are able to connect, backstage is at fault. Otherwise, your connection is the issue. Once i fixed my security groups, psql worked fine.
Its annoying that Knex manifests connection issues in this way.
Oh, arrgh. Hm, then it gets a bit trickier.
@wanisfahmyDE Iām so sorry, but the invite has expired. Is there any chance of a renewal?
Hi @freben, sorry for replying late, today I updated our Backstage instance to the latest upstream (v1.19.3) with
node:18-bookworm-slimand the problem persists but with a slightly different error (doesnāt come from Knex anymore)the error is:
Not sure if itās helpful but thought of sharing it anyways. And again, reverting back to node16 with Backstage v1.19.3 worked fine.
I will create a fork of our instance and share it with you in the next days.
@wanisfahmyDE if itās not too much to ask, could you possibly set up a reproduction repository or branch or docker-compose thing, that just exhibits the problem 100% consistently or at least most of the time? It would really help us try to hunt this down if we could run it and see it happening more consistently.
Hey @freben thanks for the info, yea I am also hesitant to try the above configs for the same reason.
I tried some more experiments, locally on my Mac Arm64 M1 running backstage on node 18 against a postgres docker image worked actually fine.
I also tried switching to
node:lts-bookwormwhich unlikenode:18-bookworm-slimhas openssl installed by default (thought might be a reason), but this also resulted in the same error when deployed to EKS.Thank you for the detailed report! We have seen some of this cropping up in CI builds too, but just sometimes - making tests flaky. At this point Iām not entirely sure whatās causing it, so I appreciate that you made the finding that it happens only when upgrading to node 18.
Weāve added a help wanted on this for now. If you find anything else or have more input on the issue at hand, please let us know.