good_job: Work is not being picked up at the expected rate

Hi @bensheldon

I recently switched from the default queue configuration (*) to having dedicated thread pools for separate queues. I should say that I don’t know for sure if that was the change that introduced what I’m seeing now but I’m fairly confident that it is.

I’m experiencing that my workers are very slow at picking up new jobs and it feels like not all the threads are actually working. This is my config/initializers/good_job.rb:

Rails.application.configure do
  # Use the same logger as rails is using
  config.good_job.logger = Rails.logger

  # Set the parent class for all our active record models
  config.good_job.active_record_parent_class = "DirectConnection"

  # To begin with we keep the job records for inspection
  # but we should not preservice jobs once we're confident
  # everything is running smoothly.
  config.good_job.preserve_job_records = true

  # If GoodJob receives an error we want to log it to Sentry.
  config.good_job.retry_on_unhandled_error = false
  config.good_job.on_thread_error = ->(exception) do
    Sentry.capture_exception(exception)
  end

  # The plus sign means that the queues are prioritized from first to last
  # The star at the end will perform any other queue last. This will catch e.g. if no queue name
  # is given, in which case the queue name is 'default'.
  # The number specifies the number of threads to dedicated to that queue. The default thread count is 5.
  # This configuration will result in 2 + 2 + 5 = 9 threads.
  config.good_job.queues = "+high_priority:2;low_priority:2;*"

  config.good_job.enable_cron = true
  config.good_job.cron = { <remove for brevity but I have 10 cron entries> }

  # Use high_priority queue for emails
  config.action_mailer.deliver_later_queue_name = "high_priority"
end

# Any configuration in ApplicationJob will have to
# be duplicated on ActionMailer::MailDeliveryJob because
# ActionMailer uses a custom class, ActionMailer::MailDeliveryJob,
# which inherits from ActiveJob::Base, rather than your
# applications ApplicationJob.
ActionMailer::MailDeliveryJob.retry_on(
  StandardError,
  wait: :exponentially_longer,
  attempts: 10,
)

# Log all mail delivery errors in jobs to sentry
ActionMailer::MailDeliveryJob.around_perform do |_job, block|
  block.call
rescue StandardError => e
  Sentry.capture_exception(e)
  raise
end

I run in :external execution mode locally and I can see the 1 worker correctly:

Screenshot 2023-01-16 at 20 13 21

But in production I don’t see the processes:

Screenshot 2023-01-16 at 20 12 28

Also notice that there’s only 1 running job while there’s a lot of queued jobs waiting to be picked up. With my configuration I would expect that there was around 7 running jobs (2 from the dedicated threads on the low_priority queue and 5 from the wildcard (*)).

I’m running with one worker instance (replica). I tried to run 3 worker pods and the count of running jobs did not change significantly.

Do you have any clues on what could be going on?

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 16 (16 by maintainers)

Most upvoted comments

I literally discovered that the moment you wrote that post!