harbor: harbor-core sporadically throws HTTP.5xx - Database performance related
Running Harbor v2.2.1, and using an AWS RDS as database.
Some specs for our Harbor deployment:
- 300 Repositories
- 800 artifacts per Repository (approximately)
We are observing high database usage generated from harbor. We’ve enabled PostgreSQL slow_log, to gather some insights on the database usage. The following queries are reported with duration over 10seconds:
| 1622621735000 | 2021-06-02 08:15:35 UTC:10.1.15.1(1331):harbor@core_db:[21602]:LOG: duration: 13985.656 ms execute <unnamed>: SELECT "id", "reference", "reference_id", "hard", "creation_time", "update_time" FROM "quota" WHERE "reference" = $1 AND "reference_id" = $2 FOR UPDATE | mgmt-harbor-serverless-prod.1 |
| 1622621735000 | 2021-06-02 08:15:35 UTC:10.1.17.254(33891):harbor@core_db:[21663]:LOG: duration: 14040.258 ms execute <unnamed>: SELECT "id", "reference", "reference_id", "hard", "creation_time", "update_time" FROM "quota" WHERE "reference" = $1 AND "reference_id" = $2 FOR UPDATE | mgmt-harbor-serverless-prod.1 |
| 1622621735000 | 2021-06-02 08:15:35 UTC:10.1.11.177(40627):harbor@core_db:[21719]:LOG: duration: 13736.308 ms execute <unnamed>: SELECT "id", "reference", "reference_id", "hard", "creation_time", "update_time" FROM "quota" WHERE "reference" = $1 AND "reference_id" = $2 FOR UPDATE | mgmt-harbor-serverless-prod.1 |
These are FOR UPDATE queries in which causes the rows retrieved by the SELECT statement to be locked as though for update.
Database Locking graph confirms the DB locking issue, showing 400+ locked tables.

When this happens, we are sporadically getting HTTP.5xx errors thrown by harbor on PUT Requests, thus image upload fails for clients:
"PUT /v2/beat/[...] HTTP/1.1" 500 66 "-" " 8516 13.609 [harbor-harbor-core-http] [] 100.97.80.52:8080 66 13.612 500 90bf9ce9edddd7d33ccd7014dae4a47e
"PUT /v2/beat/[...] HTTP/1.1" 500 66 "-" " 2269 6.295 [harbor-harbor-core-http] [] 100.97.81.139:8080 66 6.296 500 9fa9d68459840391c48486e08061fbed
"PUT /v2/beat/[...] HTTP/1.1" 500 66 "-" "[...] 561 30.762 [harbor-harbor-core-http] [] 100.97.81.139:8080 66 30.760 500 2cc59940a45cec7fd5b372fe74c88f5c
"PUT /v2/beat/[...] /manifests/[...] HTTP/1.1" 500 66 "-" [...] " 3309 6.366 [harbor-harbor-core-http] [] 100.97.81.40:8080 66 6.364 500
"PUT /v2/beat/[...] [...]blobs/uploads/cb4b12f8-98f8-43a1-9e27-04bfe1d46f80? HTTP/1.1" 500 66 "-" "[...] 1584 23.598 [harbor-harbor-core-http] [] 100.97.80.52:8080 66 23.600 500
and at the same time Harbor throws errors that suggest issues with the database:
[ERROR] [/lib/http/error.go:54]: {"errors":[{"code":"UNKNOWN","message":"unknown: deal with /api/v2.0/projects/beat/repositories/[...]/artifacts request in transaction failed: driver: bad connection"}]}
[ERROR] [/lib/orm/orm.go:86]: commit transaction failed: driver: bad connection
[ERROR] [/lib/http/error.go:54]: {"errors":[{"code":"UNKNOWN","message":"unknown: deal with /api/v2.0/projects/beat/repositories/[...]/artifacts request in transaction failed: driver: bad connection"}]}
[ERROR] [/lib/orm/orm.go:86]: commit transaction failed: driver: bad connection
[ERROR] [/lib/http/error.go:54]: {"errors":[{"code":"UNKNOWN","message":"unknown: deal with /v2/beat/[....]/manifests/[...] request in transaction failed: driver: bad connection"}]}
[ERROR] [/lib/orm/orm.go:78]: rollback transaction failed: driver: bad connection
[ERROR] [/lib/http/error.go:54]: {"errors":[{"code":"UNKNOWN","message":"unknown: driver: bad connection"}]}
[ERROR] [/server/middleware/blob/put_manifest.go:63][middleware="blob" requestID="07fa6cc169437ec9272eebdab971e216"]: get project failed, error: driver: bad connection
[ERROR] [/lib/http/error.go:54]: {"errors":[{"code":"UNKNOWN","message":"unknown: deal with /api/v2.0/projects/beat/repositories/[...]/artifacts request in transaction failed: driver: bad connection"}]}
[ERROR] [/lib/orm/orm.go:86]: commit transaction failed: driver: bad connection
[ERROR] [/lib/http/error.go:54]: {"errors":[{"code":"UNKNOWN","message":"unknown: driver: bad connection"}]}
[ERROR] [/lib/orm/orm.go:78]: rollback transaction failed: driver: bad connection
[ERROR] [/controller/quota/controller.go:299][requestID="c5be4a76ee93e8f79beaf362a309f1de"]: failed to calculate quota usage for project 37, error: driver: bad connection
Database specs are upgraded to be able to cope with this. Currently running under 16VCPUs and 122GB Ram, dedicated to Harbor.
Could this suggest an issue with the way that Harbor is managing the Database?
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 10
- Comments: 27 (11 by maintainers)
Hey @heww , thanks for your input.
During the duration of the cleanup which lasted 337min performed yesterday, there were pushing actions performed. But in today’s specific logs that we’ve shared (
2021-06-08 07:28:25 UTC) there was no cleanup action on-going.Nginx-Ingress HTTP/PUT requests forwarded to harbor, last 24h:
Nginx-Ingress HTTP/PUT requests forwarded to harbor, last 7d:
Problem is observed when we get > 250-300 Lock Tables, when harbor runs
SELECT "id", "reference", "reference_id", "hard", "creation_time", "update_time" FROM "quota" WHERE "reference" = $1 AND "reference_id" = $2 FOR UPDATEon the database :DB CPU/MEM usage, at the same time window as the above graph, seems nominal :
This is a 16VCPUs and 122GB Ram RDS DB Instance, dedicated to Harbor.
Just to make sure we are on the same page, the locking issue is now mitigated with https://github.com/goharbor/harbor/issues/15048#issuecomment-855767153 as proposed.
We now have a new issue concerning the database, with duplicate_keys (unique_artifact), that seem to came up after the cleanup performed with untagged artifacts. Would you advise opening a new issue about this?
According to the logs, the images
beat/repo_name:phobosandbeat/repo_name:deimoshave the same digestsha256: e1aee8bb773a4615023387b261d1e93feef5e6ed2d903004813032a01464bad1When pushing the images with the same digest into the same repository, it may hit a bug.
@phoinixgrr Thanks for your input. I’ll create a PR to fix it
@heww
Calling API endpoint
/api/internal/switchquotaactually fixed our extreme locking! Awesome. Seems to be working as a workaround in our case. Question: Should this indicate an issue with the way harbor manages the Database?Since then, we’ve performed a cleanup on our largest project, by deleting all untagged artifacts (using TAG RETENTION rule ) And we have a new kind of
HTTP.500issues thrown, when pushes happens. We’ve managed to drill-down the logs to identify this new issue.Docker client gets HTTP 500 Internal Server error, from harbor, when
PUSHING:Jun 8 07:28:25 ip-10-30-2-156 dockerd[541]: level=info msg="Attempting next endpoint for push after error: received unexpected HTTP status: 500 Internal Server Error"Nginx-ingress confirms the 500 thrown by Harbor:
Harbor logs suggest Database related issues:
2021-06-08T07:28:25Z [ERROR] [/lib/http/error.go:54]: {"errors":[{"code":"UNKNOWN","message":"unknown: pq: current transaction is aborted, commands ignored until end of transaction block"}]}PostgrsSQL logs, show duplicate keys/ unique constraint violated.
As conclusion, we’ve managed to Workaround our main issue concerning the extreme database locking happening. Unfortunately, it seems like the cleaning performed on the untagged artifacts, has introduced new issues with the database, related to duplicate_keys (
unique_artifact)@paulliss There is no observed IOWAIT on any of the subsystems, either Harbor itself, nor the RDS Database. Storage is performant.
Other than that, things are not looking good. When this extreme locking to the Database is performed, Harbor starts throwing 5xx errors to image pushes. The following query has been identified to produce the locking:
@aladjadj Quotas are reported as disabled (
-1) from the API:Hello, if you do not use quota feature, you can disable from api to mitigate your errors , look the swagger api