hydra: Optimize janitor's slow queries on requests tables

Problem description

The new hydra janitor command introduced very important database cleanup routines in order to tackle production environments intense grow rate on tables for tokens and requests. The queries simply delete obsolete records in hydra_oauth2_access and hydra_oauth2_refresh tables and rejected timed out requests records in hydra_oauth2_authentication_request and hydra_oauth2_consent_request. While the tokens tables DELETE queries look quite efficient if they run periodically on tables which are actually cleaned up and whose sizes are limited, requests tables grow indefinitely, thus making DELETE on that tables slower and slower every day.

Main solution

I think that the problem is related to the conditions in the queries, which are mainly based on the timestamp column requested_at, as one can see here. If we could avoid making the DELETE queries directly depend on that field, we would at least lock the tables for a short amount of time. My first idea is of separating the writing part of the query from the reading part, which obviously is the main reason for the query to be slower and slower every day. We could just SELECT deletable records with whatever condition we like and then DELETE ... WHERE id IN (...) using identifiers (primary keys) from the result set of the first query.

Alternatives

The proposed solution does not actually optimize the queries, but it prevents long time locks on that tables, which completely block hydra service for hours, as the tables grow. A real optimization would maybe require an explicit indexed column which marks obsolete records for delete, thus implementing a logical delete.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 23 (23 by maintainers)

Most upvoted comments

Hi @pharman, I am already working on this in my fork. I had some issues in replicating the safe delete logic in a separate query, but I am working on it. I am still experiencing some regressions, but as soon as I will have all tests passing, I will create a draft PR and you can comment / contribute there!

Hey @flavioleggio,

I could maybe help you with some pointers, just message me on slack 😃