bullmq: Parent job does not execute when child job fails
I have a FlowProducer that runs a parent job after a set of children jobs have been executed. There are about 10k children that are being run concurrently. I’ve set the option removeOnFail to true on both the children and the parents but it seems that if a child fail the execution of the parent just hangs.
Could this be a bug or is it a configuration issue on my side?
Here is what the code that create the job looks like:
export const initializeCollectionStats = async () => {
const flowProducer = new FlowProducer({connection: redisClient, sharedConnection: true});
const totalSupply = 10000
const tokenJobs = [];
for (let i = firstTokenID; i <= totalSupply; i++) {
tokenJobs.push({
name: 'child',
data: {
hello: 'world'
},
queueName: "queueName",
opts: {
removeOnFail: true,
attempts: 2,
timeout: 3000
}
})
}
const initJob = await flowProducer.add({
name: 'initProject',
data: {
contractAddress
},
queueName: PROJECT_INIT_QUEUE,
children: tokenJobs,
opts: {
attempts: 1
}
})
return initJob
}
and the worker that run the children job looks like this:
const worker = new Worker(QUEUE_NAME, async (job) => {
const { hello } = job.data
try {
const metadata = await doSomethingAsync(hello)
return "done"
} catch (e) {
logger.error(e)
return null
}
}, { connection: redisClient, concurrency: 300, sharedConnection: true});
About this issue
- Original URL
- State: open
- Created 3 years ago
- Comments: 28 (10 by maintainers)
This is in my pending list, this week I can work on this feature đź‘€
Hi! Thanks for your quick answer!
It does, in part.
There are still use cases where it would be nice to allow child jobs to fail if they are not critical to the success (or partial success) of the parent. Users would then be expected to handle this “partial” state in their business logic. This is useful to keep failed child job history while retaining the possibility to have the parent job executed.
Some sort of job report can then be generated by the app, listing the successful and failed child tasks.
Hi, Is there any way to solve this problem?, I have the same problem, if the child job fails or stops, the parent job does not run
Currently by design, until all child jobs have been completed the parent job will not be processed.
The workaround you suggest is precisely what we do.
Then we have some helper utils to check for the special “succeeded but actually failed” return value, and allow the parent to continue or failed based on things like the percentage of child jobs that truly succeeded or failed.
hey @theDanielJLewis, we have a pr for it https://github.com/taskforcesh/bullmq/pull/1953 You also can take a look on that one, @manast and I are evaluating this new feature
I’m in the same boat as everyone else.
It seems odd that the only two options right now for when a child fails are:
failParentOnFailureand the parent will always fail when a child fails, even if there were successful children to process.I wish we had, either as the default or a different option, something like
continueParentOnFailurethat would allow children to fail but still add the parent to the queue and let the parent process separately.Adding this option to children would allow us to let some children actually fail the parent, while letting others not affect the parent.
But doing nothing—not even appearing in the queue—just seems strange and not an obvious result.
We would like to keep a history of recently failed jobs within Redis. Otherwise
removeOnFailwould work, I guess.