delayed_job: With 500K jobs in the delayed_jobs table, it gets really slow

With around 500K jobs in the delayed_jobs table, it gets really slow, to the point of being unusable. My slow query log is filled with these:

SET timestamp=1379681371; UPDATE delayed_jobs SET locked_at = ‘2013-09-20 12:49:20’, locked_by = ‘delayed_job host:node1365 pid:20668’ WHERE ((run_at <= ‘2013-09-20 12:49:20’ AND (locked_at IS NULL OR locked_at < ‘2013-09-20 08:49:20’) OR locked_by = ‘delayed_job host:node1365 pid:20668’) AND failed_at IS NULL) ORDER BY priority ASC, run_at ASC LIMIT 1; // Time: 130920 8:49:36 // Query_time: 4.683968 Lock_time: 0.000069 Rows_sent: 0 Rows_examined: 1 SET timestamp=1379681376; DELETE FROM delayed_jobs WHERE delayed_jobs.id = 5233997; // Time: 130920 8:50:55 // Query_time: 6.367763 Lock_time: 0.000082 Rows_sent: 3 Rows_examined: 595935 SET timestamp=1379681455; SELECT COUNT(*) AS count_all, priority AS priority FROM delayed_jobs WHERE (run_at < ‘2013-09-20 12:50:48’ and failed_at is NULL) GROUP BY priority;

I have the following indexes (added some more to speed up some queries that were showing up in the slow query log): mysql> show index from delayed_jobs; ±-------------±-----------±----------------------±-------------±------------±----------±------------±---------±-------±-----±-----------±--------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | ±-------------±-----------±----------------------±-------------±------------±----------±------------±---------±-------±-----±-----------±--------+ | delayed_jobs | 0 | PRIMARY | 1 | id | A | 628375 | NULL | NULL | | BTREE | | | delayed_jobs | 1 | delayed_jobs_priority | 1 | priority | A | 16 | NULL | NULL | YES | BTREE | | | delayed_jobs | 1 | delayed_jobs_priority | 2 | run_at | A | 24168 | NULL | NULL | YES | BTREE | | | delayed_jobs | 1 | delayed_jobs_lock | 1 | locked_at | A | 9 | NULL | NULL | YES | BTREE | | | delayed_jobs | 1 | delayed_jobs_lock | 2 | locked_by | A | 9 | 100 | NULL | YES | BTREE | | | delayed_jobs | 1 | locked_by | 1 | locked_by | A | 9 | NULL | NULL | YES | BTREE | | | delayed_jobs | 1 | failed_at | 1 | failed_at | A | 97 | NULL | NULL | YES | BTREE | | | delayed_jobs | 1 | run_at | 1 | run_at | A | 13660 | NULL | NULL | YES | BTREE | | ±-------------±-----------±----------------------±-------------±------------±----------±------------±---------±-------±-----±-----------±--------+

But still, it is extremely slow.

About this issue

  • Original URL
  • State: open
  • Created 11 years ago
  • Comments: 28 (3 by maintainers)

Most upvoted comments

In Postgres, you can significantly speed it up just with 2 trivial steps:

  1. Use SELECT ... FOR UPDATE SKIP LOCKED instead of just SELECT ... FOR UPDATE (the code is here https://github.com/collectiveidea/delayed_job_active_record/blob/master/lib/delayed/backend/active_record.rb)

  2. Add this index:

CREATE INDEX i_delayed_jobs_priority_partial
  ON delayed_jobs USING btree (priority, run_at, queue) WHERE failed_at IS NULL;

@albus522 isn’t it time to reconsider it?

SKIP LOCKED was introduced in postgres 9.5, so that feature requires a relatively new postgres install

Less than in one year, 9.5 will become the oldest community-supported version of Postgres, see https://www.postgresql.org/support/versioning/. Already now the most versions which are supported by PGDG have SKIP LOCKED feature. And MySQL also has it since version 8.

Indices are probably something we need to revisit, but results have not been consistent for any suggested index update so far.

Have you checked the index I proposed in https://github.com/collectiveidea/delayed_job/issues/581#issuecomment-289920478? I believe it is helpful if not to all cases, but to majority of them.

Indexing is definitely the way to go. Here’s what ours looks like:

add_index "delayed_jobs", ["deleted_at", "locked_at", "failed_at", "queue", "run_at", "locked_by"], name: "deleted_at_locked_at_failed_at_queue_run_at_locked_by", using: :btree

With about 200,000 rows, our queries went from about 500ms to 0.7ms, a 714x improvement.

Glad that I could be of help! 😃

@amitsaxena Ha! You’re right. I got the same performance improvement just by using a single deleted_at column in the index, which makes sense because that filters things down to single digit records instead of 200,000+.

Thanks for the tip and reminder!

@albus522 thank you for your response, but it was not a change request, it was advice for those who uses DJ with Postgres (I hope with 9.6 or at least 9.5, otherwise one more advice – upgrade!).

Without those two improvements, you will have big performance issues if you need to process millions of jobs per day.

I considered to create a pull request but changed my mind when saw how many of PRs are left unprocessed for years in this repo.

I added an index on delayed_jobs(priority,run_at) and switched to the default_sql strategy, which is pretty stable now with about 1M records.

Delayed::Backend::ActiveRecord.configuration.reserve_sql_strategy = :default_sql

For MySQL 5.6, adding an index on the failed_at column helped considerably.

You can try Delayed::Backend::ActiveRecord.configuration.reserve_sql_strategy = :default_sql Using that can effect performance both ways, but for some people it helps.

Also is that 280k active jobs or is that mostly old failed jobs?

If all those are active what is your average job runtime? DJ is not the ideal candidate if you are running massive numbers of very fast jobs.

If they are old failed jobs, do you really still need them?

Can we learn something from que?