backstage: Invalid Catalog Entries get stuck in refresh loop

Situation

Backstage is configured with a Github Org catalog discovery that will scan all repositories for a catalog-info.yaml in the root of the default branch. An invalid catalog-info.yaml is checked into the repo myOrg/FooService - say an item in metadata.tags includes whitespace. Backstage attempts to load this invalid catalog-info.yaml, but will fail and emit a log. An engineer sees the log and goes to fix the invalid catalog-info.yaml and makes it valid by updating the catalog-info.yaml that is checked into myOrg/FooService.

Expected Behavior

Backstage will discard the invalid catalog-info and will not try and load it again, it will instead load the new valid catalog info and it will appear in the catalog.

Actual Behavior

Backstage will continue to try and load the invalid catalog-info and will not re-load the new valid catalog-info that is checked into myOrg/FooService. The entry will not appear in the catalog’s UI. It appears that the invalid catalog-info is stuck in a loop in the refresh_state table continuing to fail being refreshed.

If the error was in one of the unique attributes of an entity (namespace, name, kind) then it will successfully load the new valid catalog entry (if it is fixed in the source), however, the original failing entity is still stuck in a failing loop in the refresh_state table.

Steps to Reproduce

Set up Backstage with org discovery
Create a catalog-info.yaml with metadata.tags list that includes items with whitespace
See a policy checked failed log from Backstage
Fix the catalog-info.yaml by removing the whitespace from tags.
Observe that the catalog entry does not appear in the Backstage UI and the policy checked failed log continues to be intermittently emitted.

Context

This issue makes it challenging for us to display valid catalog entries in the Backstage UI after they were initially created with some mistake in them. We are forced to manually delete rows from the refresh_state table in order for Backstage to successfully discover the valid catalog entry and display it.

Your Environment

Output of yarn backstage-cli info:

OS:   Linux 5.4.0-110-generic - linux/x64
node: v16.14.2
yarn: 1.22.10
cli:  0.17.2 (installed)

Dependencies:
  @backstage/app-defaults                          1.0.3
  @backstage/backend-common                        0.14.0
  @backstage/backend-tasks                         0.3.2
  @backstage/catalog-client                        1.0.3
  @backstage/catalog-model                         1.0.3
  @backstage/cli-common                            0.1.9
  @backstage/cli                                   0.17.2
  @backstage/config-loader                         1.1.2
  @backstage/config                                1.0.1
  @backstage/core-app-api                          1.0.3
  @backstage/core-components                       0.9.5
  @backstage/core-plugin-api                       1.0.3
  @backstage/dev-utils                             1.0.3
  @backstage/errors                                1.0.0
  @backstage/integration-react                     1.1.1
  @backstage/integration                           1.2.1
  @backstage/plugin-analytics-module-ga            0.1.17
  @backstage/plugin-api-docs                       0.8.6
  @backstage/plugin-app-backend                    0.3.33
  @backstage/plugin-auth-backend                   0.14.1
  @backstage/plugin-auth-node                      0.2.2
  @backstage/plugin-catalog-backend-module-github  0.1.4
  @backstage/plugin-catalog-backend                1.2.0
  @backstage/plugin-catalog-common                 1.0.3
  @backstage/plugin-catalog-graph                  0.2.18
  @backstage/plugin-catalog-import                 0.8.9
  @backstage/plugin-catalog-react                  1.1.1
  @backstage/plugin-catalog                        1.3.0
  @backstage/plugin-circleci                       0.3.6
  @backstage/plugin-github-actions                 0.5.6
  @backstage/plugin-github-pull-requests-board     0.1.0
  @backstage/plugin-home                           0.4.22
  @backstage/plugin-kubernetes-backend             0.6.0
  @backstage/plugin-kubernetes-common              0.3.0
  @backstage/plugin-kubernetes                     0.6.6
  @backstage/plugin-org                            0.5.6
  @backstage/plugin-permission-common              0.6.2
  @backstage/plugin-permission-node                0.6.2
  @backstage/plugin-permission-react               0.4.2
  @backstage/plugin-proxy-backend                  0.2.27
  @backstage/plugin-scaffolder-backend             1.3.0
  @backstage/plugin-scaffolder-common              1.1.1
  @backstage/plugin-scaffolder                     1.3.0
  @backstage/plugin-search-backend-module-pg       0.3.4
  @backstage/plugin-search-backend-node            0.6.2
  @backstage/plugin-search-backend                 0.5.3
  @backstage/plugin-search-common                  0.3.5
  @backstage/plugin-search-react                   0.2.1
  @backstage/plugin-search                         0.9.0
  @backstage/plugin-shortcuts                      0.2.7
  @backstage/plugin-stack-overflow                 0.1.2
  @backstage/plugin-techdocs-backend               1.1.2
  @backstage/plugin-techdocs-module-addons-contrib 1.0.1
  @backstage/plugin-techdocs-node                  1.1.2
  @backstage/plugin-techdocs-react                 1.0.1
  @backstage/plugin-techdocs                       1.2.0
  @backstage/plugin-user-settings                  0.4.5
  @backstage/release-manifests                     0.0.4
  @backstage/test-utils                            1.1.1
  @backstage/theme                                 0.2.15
  @backstage/types                                 1.0.0
  @backstage/version-bridge                        1.0.1

About this issue

Original URL
State: closed
Created 2 years ago
Reactions: 4
Comments: 20 (7 by maintainers)

Most upvoted comments

Hey so I figured I’d share what we have built as a “work-around” (adjacent feature?). We ended up building a plugin to provide an API that surfaced entities with errors and their uid. The meat of the logic is basically getting all entities that have some error via

const rows = await tx<DbRefreshStateRow>('refresh_state')
      .whereNotJsonObject('errors', [])
      .select();

With this API we can take a look at why entities are erroring and it allows us to A) Rectify the issue by updating the catalog file, in which some cases the errored entity will then be fixed and no longer have an error or B) Delete the entity via the DELETE /api/catalog/entities/by-uid/:uid route if the error is one that will never be resolved.

Hope this help folks in the mean time.

Shayon on Mar 6, 2023

Can this issue be re-opened? It still seems like this is a problem.

If it helps re-opening, I can offer some simple steps to reproduce.

Create a component with the metadata.name example service
See that the refresh_state table holds an entry with an error.
Update the component to a valid metadata.name, e.g. example-service
After the next processing loop, see that example-service now appears in the catalog, but the refresh_state table still holds an entry with an error, with the entity_ref of component:default/example service.

Shayon on Nov 7, 2022

We are seeing this issue with bitbucket server. Essentially the data never processes and you have no direct way of finding the bad entities because they aren’t tied to any real location. We wiped all the data and it was still processing them and reporting the warning. We only discovered it by searching for the records in the db directly in refresh_state table

paylocity-sflanders on Nov 1, 2022

Ah yeah, IDs of providers are very strictly meant to be stable. They behave roughly like S3 bucket names in which entities are placed, so to speak. So when the ID changes, the old bucket stays around, and the provider tries to put things into a new bucket - which will collide with the old remnants of course.

freben on Jul 6, 2022