registry: Registry server: slow listing of artifacts under revisioned resources in large collections.
Quoting a benchmark result listed in #1068, listing of spec artifacts can be significantly slowed when there are large numbers of specs:
BenchmarkListArtifacts/ListApiArtifacts-16 3 435709811 ns/op
BenchmarkListArtifacts/ListVersionArtifacts-16 3 455032674 ns/op
BenchmarkListArtifacts/ListSpecArtifacts-16 1 72584223258 ns/op
BenchmarkListArtifacts/ListDeploymentArtifacts-16 1 17589723714 ns/op
This is with the N=1000 test set running with a server that already has a large collection of API specs (~13000).
The impact of this is that it is difficult to get or delete spec artifacts across a collection, eg.
registry get apis/-/versions/-/specs/-/artifacts/summary -o yaml or registry delete apis/-/versions/-/specs/-/artifacts/vocabulary. Both of these are uncomfortably slow for large registries.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 15 (15 by maintainers)
You’ve added
api_idtodeploymentsandspecs… I wonder if adding api_id toartifactsas well could be useful as a pretty common join? Also, since we often sort onapi_id- I wonder if addingsortto our index definitions might be useful.I think we’re good. I wasn’t thinking about the name being used as primary key.
I believe you can just set our debug to trace level to get the SQL. Then, run the explain against postgres using pgadmin or whatever tool.
This is with just api_id indexes on specs and deployments.
This first run was after I had created the indexes and run for a while, but I deleted the api_id index on specs and benchmarked. Without the api_id index on specs, the spec listing was clearly slow:
Then I recreated the index with
registry rpc admin migrate-database --follow. This returned immediately, but I think the index might have still been building. The immediate next benchmark was still slow, but the following two were good.This was with 8800 specs in the database.