kue: Jobs Getting Stuck in Active State

Hi @behrad I am still experiencing jobs being stuck in the active state, similar to issue #391

I am using kue@0.8.11 and redis 2.8.17

I have tried to carefully setup all recommended steps to avoid this. Here are some code snippets. If I’m doing something wrong I would love to hear about it. I’ve just torn out the bits of source that deal with kue/jobs here and redacted stuff that wasn’t relevant.

producer – separate node process

var jobs = kue.createQueue({ disableSearch: true, redis: config.redis });

var dying = false;
process.on('uncaughtException', function (err) {
   log.error('uncaught exception', err.stack);
   die();
   dying = true;
});

process.on('SIGTERM', function () {
  log.error('SIGTERM');
  die();
  dying = true;
});

function die() {
  if (!dying) {
    jobs.shutdown(function (err) {
      if (err) { log.error('Kue DID NOT shutdown gracefully', err); }
      else { log.info('Kue DID shutdown gracefully'); }
      process.exit(1);
  }
}

// createJob( ) is called on an interval as needed

function createJob(data) {
   var job = jobs.create('deliver:sub', { title: 'Deliver Subscription', sub: data })
    .removeOnComplete(true)
    .attempts(5)
    .backoff({type: 'exponential'})
    .save(function (err) {
      if (err) { return log.error('Error creating deliver:sub', err); }
      log.info(util.format('job[%s] created', job.id));
  });

  job.on('complete', function() {
    /* logic here */
  });

  job.on('failed', function () {
    log.error(util.format('job[%s] failed ', job.id));
  });
}

consumer – separate node process

var jobs = kue.createQueue({ disableSearch: true, redis: config.redis });

try {
  kue.app.listen(3001);
} catch (err) {
  log.error('Error: could not start kue express server', err);
}

// called in master b/c we use job retries
jobs.promote();

jobs.watchStuckJobs();

var dying = false;
process.on('uncaughtException', function (err) {
  log.error('uncaught exception', err.stack);
  die();
  dying = true;
});

process.on('SIGTERM', function () {
  log.error('SIGTERM');
  die();
  dying = true;
});

function die() {
  if (!dying) {
    jobs.shutdown(function (err) {
      if (err) { log.error('Kue DID NOT shutdown gracefully ', err); }
      else { log.info('Kue DID shutdown gracefully'); }
      process.exit(1);
    });
  }
}

jobs.process('deliver:sub', 5, function (job, done) {
  var domain = require('domain').create();

  domain.on('error', function (err) {
    log.error('domain error for deliver:sub', err);
    done(err);
  });

  domain.run(function () {
    process.nextTick(function () {
      /* logic here that ends with either */
      done();
      /* OR */
      done(err);
    });
  });

About this issue

Most upvoted comments

99% percent of a job being stuck in ACTIVE state is user applications miss to call done So first you should try to trace whats happened to your stuck job! and node.js process and worker 😃

I had the same problem for a while : when the server is stopped or crashes while a job is being processed, the job stays forever in active state after the server restarts and subsequent jobs are never processed.

I tried, upon server initialization, to put all crashed active jobs in inactive state like described in Programmatic Job Management but it didn’t work very well for me. It unblocked the queue but the crashed jobs that were put in inactive state were never processed, even when I changed their “attempts” option to 10.

Here is how I fixed it :

Upon initialization, I check for any active jobs using kue.active() or kue.Job.rangeByState(). For every active job found, I create a new job with the same data and I call job.complete() on the old one. Note that you need to do this upon server initialization and before queue.process() is called.

Here is the code I use to do so :

 kue.Job.rangeByState( 'active', 0, 1000, 'asc', function( err, jobs ) {
      jobs.forEach(function(job) {
         job.complete();
         queue.create(job.type, job.data).save(); 
      })
 });

Note that this code is not async so, if you call queue.process() soon after, you’ll need to make it async in order to be sure that all active jobs are processed before running the queue.

I think what is causing problems is the call to KEYS in watchStuckJobs, click here to see that line in context.

Here’s the problematic part of the Lua script:

var script =
'local msg = redis.call( "keys", "' + prefix + ':jobs:*:inactive" )\n\

The KEYS command shouldn’t be run in a production environment. For more information, check the warning in the command’s documentation.