mamba: Micromamba sporadically hangs (QEMU + ARM)
Edit: Summary
Initial report
Over at Jupyter Docker Stacks, we have been trying create the initial Conda environment for our base-notebook by using Micromamba instead of Miniforge. We have been plagued by sporadic hangs in our multi-arch Docker build process on GH Actions at a rate of ~20% of all runs when for aarch64 we RUN the command
./micromamba install --root-prefix=/opt/conda --prefix=/opt/conda -vv --yes python=3.9 mamba notebook jupyterhub jupyterlab
The hang occurs sometime between the end of extraction and the beginning of linking. The last message printed during an unsuccessful run is something like
#29 188.0 debug libmamba Extracted to '/opt/conda/pkgs/mistune-0.8.4-py39h14843e3_1005'
without proceeding to a line like
#29 178.0 Linking pandoc-2.17.1.1-h8af1aa0_0
Execution hangs without any further output until the 6h limit is reached.
References
(in chronological order)
- Mamba sporadically hangs when building jupyter/docker-stacks arm64 image
- Original PR in jupyter/docker-stacks to replace MiniForge with Micromamba.
- Partially understood/isolated the problem
- This bug report
- @jonashaag patches my jupyter/docker-stacks PR to disable concurrency, and I confirm that it appears to work
- @jonashaag patches libmamba to disable subprocesses when
extract_threads=1 - Discussion returns to this thread
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 61 (33 by maintainers)
Commits related to this issue
- Comprehensive tests that workarounds work <https://github.com/mamba-org/mamba/issues/1611#issuecomment-1112970540> — committed to maresb/docker-stacks by maresb 2 years ago
I can reproduce this, now looking into causes.
Confusion party! Yes indeed I assumed with latest
masterno workarounds would be required anymore. Will start a few more runs to debug further…Maybe it is because we aren’t using
reproc::sink::thread_safe::string(out)here https://github.com/mamba-org/mamba/blob/d1a568b72f451e0b560fc45f32dcd33e4c2211d4/libmamba/src/core/package_handling.cpp#L454-L455Can you please open another issue. This one is very specific to running in QEMU.
Can confirm that the deadlock happens in
reproc::process.start.