bullmq: Worker doesn't start processing after reconnect

I am getting following error lately:

You have triggered an unhandledRejection, you may have forgotten to catch a Promise rejection:
Error: Missing Job 17568662 when trying to move from active to delayed
     at Function.moveToDelayed (/usr/app/analytics-server/node_modules/bullmq/dist/classes/scripts.js:178:23)
     at runMicrotasks (<anonymous>)
     at processTicksAndRejections (internal/process/task_queues.js:97:5)
 exited with code [1] via signal [SIGINT]
 starting in -cluster mode-
 Listening for monitoring :4443
 Error: getaddrinfo ENOTFOUND private-db-redis-XXXXXXXXX.db.ondigitalocean.com
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:66:26) {
   errno: 'ENOTFOUND',
   code: 'ENOTFOUND',
   syscall: 'getaddrinfo',
   hostname: 'ENOTFOUND private-db-redis-XXXXXXXXX.db.ondigitalocean.com'
 }

After the container restarts it can’t connect to Redis anymore. Bull doesn’t even recognize that it can’t reconnect so the process stays alive.

What is the best way handling this?

How can I make sure that there is no unhandledRejection in that case?
Is there a way to reconnect a worker?

My healthcheck is calling queue.getWaitingCount() and this seems to work fine after the restart but the worker won’t start processing again

Thanks!

Version: bullmq@1.15.1

About this issue

Original URL
State: open
Created 3 years ago
Reactions: 1
Comments: 18 (9 by maintainers)

Most upvoted comments

UPDATE It was my fault. I had a typo in the name of the QueueScheduler, the names for Worker, Queue and QueueScheduler must match.

I am testing the restart of the jobs by manually killing the process (CTRL+C) because my server will restart the job every day, I am doing a proper closing of the Queue, QueueScheduler, Worker, Redis. When I restart the process, I can query it in the active state but it does not seem to be re-processing the job.

Is there something else I have to do to get the jobs restarted?

eltoroit on Apr 23, 2021

I cannot find anything wrong in BullMQ, but I wonder if you are attaching an error handler to your worker?, like:

myworker.on('error', (err) => console.error(err));

Because the way NodeJS works, if you are lacking such a listener, the process will exit with an error… (yes I also think this is an awkward behaviour).

manast on Apr 20, 2021

ok, so the problem is not reconnection as the title of the issue, it is that despite being connected it does not process anymore right?

manast on Apr 20, 2021