backstage: Invalid Catalog Entries get stuck in refresh loop
Situation
Backstage is configured with a Github Org catalog discovery that will scan all repositories for a catalog-info.yaml
in the root of the default branch. An invalid catalog-info.yaml
is checked into the repo myOrg/FooService
- say an item in metadata.tags
includes whitespace. Backstage attempts to load this invalid catalog-info.yaml
, but will fail and emit a log. An engineer sees the log and goes to fix the invalid catalog-info.yaml
and makes it valid by updating the catalog-info.yaml
that is checked into myOrg/FooService
.
Expected Behavior
Backstage will discard the invalid catalog-info and will not try and load it again, it will instead load the new valid catalog info and it will appear in the catalog.
Actual Behavior
Backstage will continue to try and load the invalid catalog-info and will not re-load the new valid catalog-info that is checked into myOrg/FooService
. The entry will not appear in the catalog’s UI. It appears that the invalid catalog-info is stuck in a loop in the refresh_state
table continuing to fail being refreshed.
If the error was in one of the unique attributes of an entity (namespace, name, kind) then it will successfully load the new valid catalog entry (if it is fixed in the source), however, the original failing entity is still stuck in a failing loop in the refresh_state
table.
Steps to Reproduce
- Set up Backstage with org discovery
- Create a
catalog-info.yaml
withmetadata.tags
list that includes items with whitespace - See a
policy checked failed
log from Backstage - Fix the
catalog-info.yaml
by removing the whitespace from tags. - Observe that the catalog entry does not appear in the Backstage UI and the
policy checked failed
log continues to be intermittently emitted.
Context
This issue makes it challenging for us to display valid catalog entries in the Backstage UI after they were initially created with some mistake in them. We are forced to manually delete rows from the refresh_state
table in order for Backstage to successfully discover the valid catalog entry and display it.
Your Environment
- Output of
yarn backstage-cli info
:
OS: Linux 5.4.0-110-generic - linux/x64
node: v16.14.2
yarn: 1.22.10
cli: 0.17.2 (installed)
Dependencies:
@backstage/app-defaults 1.0.3
@backstage/backend-common 0.14.0
@backstage/backend-tasks 0.3.2
@backstage/catalog-client 1.0.3
@backstage/catalog-model 1.0.3
@backstage/cli-common 0.1.9
@backstage/cli 0.17.2
@backstage/config-loader 1.1.2
@backstage/config 1.0.1
@backstage/core-app-api 1.0.3
@backstage/core-components 0.9.5
@backstage/core-plugin-api 1.0.3
@backstage/dev-utils 1.0.3
@backstage/errors 1.0.0
@backstage/integration-react 1.1.1
@backstage/integration 1.2.1
@backstage/plugin-analytics-module-ga 0.1.17
@backstage/plugin-api-docs 0.8.6
@backstage/plugin-app-backend 0.3.33
@backstage/plugin-auth-backend 0.14.1
@backstage/plugin-auth-node 0.2.2
@backstage/plugin-catalog-backend-module-github 0.1.4
@backstage/plugin-catalog-backend 1.2.0
@backstage/plugin-catalog-common 1.0.3
@backstage/plugin-catalog-graph 0.2.18
@backstage/plugin-catalog-import 0.8.9
@backstage/plugin-catalog-react 1.1.1
@backstage/plugin-catalog 1.3.0
@backstage/plugin-circleci 0.3.6
@backstage/plugin-github-actions 0.5.6
@backstage/plugin-github-pull-requests-board 0.1.0
@backstage/plugin-home 0.4.22
@backstage/plugin-kubernetes-backend 0.6.0
@backstage/plugin-kubernetes-common 0.3.0
@backstage/plugin-kubernetes 0.6.6
@backstage/plugin-org 0.5.6
@backstage/plugin-permission-common 0.6.2
@backstage/plugin-permission-node 0.6.2
@backstage/plugin-permission-react 0.4.2
@backstage/plugin-proxy-backend 0.2.27
@backstage/plugin-scaffolder-backend 1.3.0
@backstage/plugin-scaffolder-common 1.1.1
@backstage/plugin-scaffolder 1.3.0
@backstage/plugin-search-backend-module-pg 0.3.4
@backstage/plugin-search-backend-node 0.6.2
@backstage/plugin-search-backend 0.5.3
@backstage/plugin-search-common 0.3.5
@backstage/plugin-search-react 0.2.1
@backstage/plugin-search 0.9.0
@backstage/plugin-shortcuts 0.2.7
@backstage/plugin-stack-overflow 0.1.2
@backstage/plugin-techdocs-backend 1.1.2
@backstage/plugin-techdocs-module-addons-contrib 1.0.1
@backstage/plugin-techdocs-node 1.1.2
@backstage/plugin-techdocs-react 1.0.1
@backstage/plugin-techdocs 1.2.0
@backstage/plugin-user-settings 0.4.5
@backstage/release-manifests 0.0.4
@backstage/test-utils 1.1.1
@backstage/theme 0.2.15
@backstage/types 1.0.0
@backstage/version-bridge 1.0.1
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 4
- Comments: 20 (7 by maintainers)
Hey so I figured I’d share what we have built as a “work-around” (adjacent feature?). We ended up building a plugin to provide an API that surfaced entities with errors and their uid. The meat of the logic is basically getting all entities that have some error via
With this API we can take a look at why entities are erroring and it allows us to A) Rectify the issue by updating the catalog file, in which some cases the errored entity will then be fixed and no longer have an error or B) Delete the entity via the
DELETE /api/catalog/entities/by-uid/:uid
route if the error is one that will never be resolved.Hope this help folks in the mean time.
Can this issue be re-opened? It still seems like this is a problem.
If it helps re-opening, I can offer some simple steps to reproduce.
metadata.name
example service
refresh_state
table holds an entry with an error.metadata.name
, e.g.example-service
example-service
now appears in the catalog, but therefresh_state
table still holds an entry with an error, with theentity_ref
ofcomponent:default/example service
.We are seeing this issue with bitbucket server. Essentially the data never processes and you have no direct way of finding the bad entities because they aren’t tied to any real location. We wiped all the data and it was still processing them and reporting the warning. We only discovered it by searching for the records in the db directly in refresh_state table
Ah yeah, IDs of providers are very strictly meant to be stable. They behave roughly like S3 bucket names in which entities are placed, so to speak. So when the ID changes, the old bucket stays around, and the provider tries to put things into a new bucket - which will collide with the old remnants of course.