dagger: Engine fails, hangs or becomes really slow when several operations are chained
Some users (cc @cpuguy83 ) have observed that when chaining several WithMountedDirectory calls, sometimes the engine becomes extremely slow or completely unresponsive.
Discord thread: https://discord.com/channels/707636530424053791/1075544944388882522
@cpuguy83 managed to come up with a simple repro here.
Repro snippet: https://gist.github.com/cpuguy83/b61f22d05fd05a6407008d944f5858f4
cc @sipsma
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 1
- Comments: 16 (14 by maintainers)
Commits related to this issue
- Upgrade to buildkit w/ performance fix. The fix in question is for the problem here: https://github.com/dagger/dagger/issues/4620 Signed-off-by: Erik Sipsma <erik@dagger.io> — committed to sipsma/dagger by sipsma a year ago
- Upgrade to buildkit w/ performance fix. The fix in question is for the problem here: https://github.com/dagger/dagger/issues/4620 Signed-off-by: Erik Sipsma <erik@dagger.io> — committed to sipsma/dagger by sipsma a year ago
- Upgrade to buildkit w/ performance fix. (#4817) * Upgrade to buildkit w/ performance fix. The fix in question is for the problem here: https://github.com/dagger/dagger/issues/4620 Signed-off-... — committed to dagger/dagger by sipsma a year ago
Okay, I found this commit that was introduced in buildkit v0.11 which, when reverted, results in the problem no longer occurring, I can run the reproducer w/
count=20in 10 seconds: https://github.com/sipsma/buildkit/commit/bf809b76f6f76dc83b5b882109115a59a131bf12 cc @cpuguy83Going to go try to understand exactly what regressed there but feel reasonably confident this is the root cause, should be fixable upstream.
This looks very related to an issue Kyle and I dove into back in December: https://discord.com/channels/707636530424053791/1054474081593995355/1054474081593995355
I also found that if we reduce the chaining by using intermediate containers and just grab the rootfs and put it into the original container it helps a lot.
I have been having this issue as well, and it seems to happen when I have many chained
WithExeccalls. I’ll try to put together a minimal reproduction if it will help diagnose.doesn’t seem to be the case. I’m on
5.19.0-31-genericand no errors indmesg