backstage: 🐛 Bug Report: Collating search indexing is slow and failed at inserting data from documents_to_insert table to documents table

📜 Description

Collating documents for techdocs and software-catalog keep failling with timeout error like .

{“level”:“error”,“message”:“Collating documents for techdocs failed: error: insert into documents (type, document, hash) select "type", "document", "hash" from "documents_to_insert" on conflict ("hash") do nothing - canceling statement due to statement timeout”,“plugin”:“search”,“service”:“backstage”,“type”:“plugin”} {“level”:“error”,“message”:“Collating documents for software-catalog failed: error: insert into documents (type, document, hash) select "type", "document", "hash" from "documents_to_insert" on conflict ("hash") do nothing - canceling statement due to statement timeout”,“plugin”:“search”,“service”:“backstage”,“type”:“plugin”}

Current statement_timeout is 5s, it works fine for staging with less data (documents table is 250 MB ), although production has more data, documents table is 360 MB, 5s should also be sufficient.

👍 Expected behavior

Collating documents for techdocs and software-catalog works fine.

👎 Actual Behavior with Screenshots

Collating documents for software-catalog and techdocs failed at insert into documents (type, document, hash) select \"type\", \"document\", \"hash\" from \"documents_to_insert\" on conflict (\"hash\") do nothing - canceling statement due to statement timeout when there is more data in production.

For staging, this query is successful after a long time. I have checked the pg_stat_statements in staging using select * from pg_stat_statements where query = 'insert into documents (type, document, hash) select "type", "document", "hash" from "documents_to_insert" on conflict ("hash") do nothing';

`

calls total_time min_time max_time mean_time stddev_time rows shared_blks_hit shared_blks_read shared_blks_dirtied shared_blks_written local_blks_hit local_blks_read local_blks_dirtied local_blks_written temp_blks_read temp_blks_written blk_read_time blk_write_time
1 4679.825284 4679.825284 4679.825284 4679.825284 0 0 139071 7453 0 0 3 1896 0 1023 0 0 3286.667592 0
1 11371.131656 11371.131656 11371.131656 11371.131656 0 0 221909 8543 0 0 23695 7342 0 1021 0 0 423.288968 0
1 4197.742147 4197.742147 4197.742147 4197.742147 0 1 139107 7488 8 0 1 1899 0 1023 0 0 2875.815587 0
1 10825.297827 10825.297827 10825.297827 10825.297827 0 0 222233 8219 0 0 23695 7342 0 1021 0 0 78.340588 0
1 1286.8494780000000 1286.8494780000000 1286.8494780000000 1286.8494780000000 0 0 142457 4067 0 0 3 1896 0 1023 0 0 30.875488 0

`

👟 Reproduction steps

it happens when there are many documents like 360 MB.

📃 Provide the context for the Bug.

Now the searching indexing keeps failing.

I am wondering whether this issue is related to query insert into documents (type, document, hash) select \"type\", \"document\", \"hash\" from \"documents_to_insert\" on conflict (\"hash\") do nothing

Would it be caused by there is no index on documents_to_insert?

And since documents_to_insert is a temp table, i am not sure how to analyse this query’s performance.

🖥️ Your Environment

yarn backstage-cli info yarn run v1.22.18

OS: Darwin 21.6.0 - darwin/x64 node: v16.14.2 yarn: 1.22.18 cli: 0.18.1 (installed) backstage: 1.5.1

Dependencies: @backstage/app-defaults 1.0.7 @backstage/backend-common 0.15.1 @backstage/backend-plugin-api 0.1.1 @backstage/backend-tasks 0.3.4 @backstage/catalog-client 1.1.2 @backstage/catalog-model 1.1.3, 1.1.4 @backstage/cli-common 0.1.10 @backstage/cli 0.18.1 @backstage/config-loader 1.1.4 @backstage/config 1.0.4, 1.0.5 @backstage/core-app-api 1.2.0 @backstage/core-components 0.11.2, 0.12.0 @backstage/core-plugin-api 1.1.0 @backstage/errors 1.1.0, 1.1.3, 1.1.4 @backstage/integration-react 1.1.5 @backstage/integration 1.4.0 @backstage/plugin-analytics-module-ga 0.1.19 @backstage/plugin-api-docs 0.8.8 @backstage/plugin-app-backend 0.3.35 @backstage/plugin-auth-backend 0.15.1 @backstage/plugin-auth-node 0.2.5 @backstage/plugin-catalog-backend-module-gitlab 0.1.6 @backstage/plugin-catalog-backend-module-ldap 0.5.2 @backstage/plugin-catalog-backend 1.3.1 @backstage/plugin-catalog-common 1.0.8, 1.0.9 @backstage/plugin-catalog-graph 0.2.20 @backstage/plugin-catalog-import 0.8.11 @backstage/plugin-catalog-node 1.0.1 @backstage/plugin-catalog-react 1.2.1 @backstage/plugin-catalog 1.5.0 @backstage/plugin-explore-react 0.0.20 @backstage/plugin-explore 0.3.39 @backstage/plugin-home 0.4.24 @backstage/plugin-kubernetes-backend 0.7.1 @backstage/plugin-kubernetes-common 0.4.1 @backstage/plugin-kubernetes 0.7.1 @backstage/plugin-org 0.5.8 @backstage/plugin-pagerduty 0.5.1 @backstage/plugin-permission-backend 0.5.10 @backstage/plugin-permission-common 0.6.4, 0.7.1, 0.7.2 @backstage/plugin-permission-node 0.6.4 @backstage/plugin-permission-react 0.4.7 @backstage/plugin-proxy-backend 0.2.29 @backstage/plugin-scaffolder-backend 1.5.1 @backstage/plugin-scaffolder-common 1.1.2 @backstage/plugin-scaffolder 1.5.0 @backstage/plugin-search-backend-module-pg 0.3.6 @backstage/plugin-search-backend-node 1.0.1 @backstage/plugin-search-backend 1.0.1 @backstage/plugin-search-common 1.1.1, 1.2.0 @backstage/plugin-search-react 1.0.1 @backstage/plugin-search 1.0.1 @backstage/plugin-shortcuts 0.3.0 @backstage/plugin-stack-overflow 0.1.4 @backstage/plugin-tech-radar 0.5.15 @backstage/plugin-techdocs-backend 1.2.1 @backstage/plugin-techdocs-module-addons-contrib 1.0.3 @backstage/plugin-techdocs-node 1.3.0 @backstage/plugin-techdocs-react 1.0.3 @backstage/plugin-techdocs 1.3.1 @backstage/plugin-user-settings 0.4.7 @backstage/release-manifests 0.0.5 @backstage/test-utils 1.2.2 @backstage/theme 0.2.16 @backstage/types 1.0.1, 1.0.2 @backstage/version-bridge 1.0.2 ✨ Done in 2.13s.

👀 Have you spent some time to check if this bug has been raised before?

  • I checked and didn’t find similar issue

🏢 Have you read the Code of Conduct?

Are you willing to submit PR?

Yes I am willing to submit a PR!

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 19 (18 by maintainers)

Most upvoted comments

Looks like https://github.com/backstage/backstage/pull/20936 released with v1.20 fixed the immediate issue with performance, thanks @mariia-maksimova

+1 We’ve experienced a similar issue, only for us the collating was taking ±20 min every hour. After larger refreshes of various entities, it took up to 7h to finish (with subsequent runs piling up on each other 😦 ).

Maybe this will have no impact, but I wonder if the NOT IN subquery is problematic? Could be simplified to a LEFT JOIN with a NULL check on the joined table: https://github.com/backstage/backstage/blob/4b2a9b5/plugins/search-backend-module-pg/src/database/DatabaseDocumentStore.ts#L120-L126

This option did help a lot: bringing us from 20 min down to 3 min. We will probably look further into indexes as well, but replacing NOT IN definitely helped. It would be great to integrate it into the codebase 😃