delayed_job: With 500K jobs in the delayed_jobs table, it gets really slow
With around 500K jobs in the delayed_jobs table, it gets really slow, to the point of being unusable. My slow query log is filled with these:
SET timestamp=1379681371;
UPDATE delayed_jobs SET locked_at = ‘2013-09-20 12:49:20’, locked_by = ‘delayed_job host:node1365 pid:20668’ WHERE ((run_at <= ‘2013-09-20 12:49:20’ AND (locked_at IS NULL OR locked_at < ‘2013-09-20 08:49:20’) OR locked_by = ‘delayed_job host:node1365 pid:20668’) AND failed_at IS NULL) ORDER BY priority ASC, run_at ASC LIMIT 1;
// Time: 130920 8:49:36
// Query_time: 4.683968 Lock_time: 0.000069 Rows_sent: 0 Rows_examined: 1
SET timestamp=1379681376;
DELETE FROM delayed_jobs WHERE delayed_jobs.id = 5233997;
// Time: 130920 8:50:55
// Query_time: 6.367763 Lock_time: 0.000082 Rows_sent: 3 Rows_examined: 595935
SET timestamp=1379681455;
SELECT COUNT(*) AS count_all, priority AS priority FROM delayed_jobs WHERE (run_at < ‘2013-09-20 12:50:48’ and failed_at is NULL) GROUP BY priority;
I have the following indexes (added some more to speed up some queries that were showing up in the slow query log): mysql> show index from delayed_jobs; ±-------------±-----------±----------------------±-------------±------------±----------±------------±---------±-------±-----±-----------±--------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | ±-------------±-----------±----------------------±-------------±------------±----------±------------±---------±-------±-----±-----------±--------+ | delayed_jobs | 0 | PRIMARY | 1 | id | A | 628375 | NULL | NULL | | BTREE | | | delayed_jobs | 1 | delayed_jobs_priority | 1 | priority | A | 16 | NULL | NULL | YES | BTREE | | | delayed_jobs | 1 | delayed_jobs_priority | 2 | run_at | A | 24168 | NULL | NULL | YES | BTREE | | | delayed_jobs | 1 | delayed_jobs_lock | 1 | locked_at | A | 9 | NULL | NULL | YES | BTREE | | | delayed_jobs | 1 | delayed_jobs_lock | 2 | locked_by | A | 9 | 100 | NULL | YES | BTREE | | | delayed_jobs | 1 | locked_by | 1 | locked_by | A | 9 | NULL | NULL | YES | BTREE | | | delayed_jobs | 1 | failed_at | 1 | failed_at | A | 97 | NULL | NULL | YES | BTREE | | | delayed_jobs | 1 | run_at | 1 | run_at | A | 13660 | NULL | NULL | YES | BTREE | | ±-------------±-----------±----------------------±-------------±------------±----------±------------±---------±-------±-----±-----------±--------+
But still, it is extremely slow.
About this issue
- Original URL
- State: open
- Created 11 years ago
- Comments: 28 (3 by maintainers)
In Postgres, you can significantly speed it up just with 2 trivial steps:
Use
SELECT ... FOR UPDATE SKIP LOCKEDinstead of justSELECT ... FOR UPDATE(the code is here https://github.com/collectiveidea/delayed_job_active_record/blob/master/lib/delayed/backend/active_record.rb)Add this index:
@albus522 isn’t it time to reconsider it?
Less than in one year, 9.5 will become the oldest community-supported version of Postgres, see https://www.postgresql.org/support/versioning/. Already now the most versions which are supported by PGDG have
SKIP LOCKEDfeature. And MySQL also has it since version 8.Have you checked the index I proposed in https://github.com/collectiveidea/delayed_job/issues/581#issuecomment-289920478? I believe it is helpful if not to all cases, but to majority of them.
Indexing is definitely the way to go. Here’s what ours looks like:
With about 200,000 rows, our queries went from about 500ms to 0.7ms, a 714x improvement.
Glad that I could be of help! 😃
@amitsaxena Ha! You’re right. I got the same performance improvement just by using a single
deleted_atcolumn in the index, which makes sense because that filters things down to single digit records instead of 200,000+.Thanks for the tip and reminder!
@albus522 thank you for your response, but it was not a change request, it was advice for those who uses DJ with Postgres (I hope with 9.6 or at least 9.5, otherwise one more advice – upgrade!).
Without those two improvements, you will have big performance issues if you need to process millions of jobs per day.
I considered to create a pull request but changed my mind when saw how many of PRs are left unprocessed for years in this repo.
I added an index on
delayed_jobs(priority,run_at)and switched to thedefault_sqlstrategy, which is pretty stable now with about 1M records.For MySQL 5.6, adding an index on the
failed_atcolumn helped considerably.You can try
Delayed::Backend::ActiveRecord.configuration.reserve_sql_strategy = :default_sqlUsing that can effect performance both ways, but for some people it helps.Also is that 280k active jobs or is that mostly old failed jobs?
If all those are active what is your average job runtime? DJ is not the ideal candidate if you are running massive numbers of very fast jobs.
If they are old failed jobs, do you really still need them?
Can we learn something from que?