bioconda-recipes: Conda solver slowdown FAQ and recommendations
Hi all,
this issue is intended keep the community up-to-date about the recent state of the conda solver, how you can improve things, and what we are working on to make it better.
What is the problem?
Conda currently uses an SAT (boolean satisfiability) solver to figure out the correct, and hopefully working, set of packages required to construct a functional environment. This means downloading the package index, cutting down the search-space, iterating the graph, inspecting the pinnings and so on.
Conda/Bioconda is special in that we have 1000s of Python and R packages. Recently, we’ve begun adding entire Bioconductor releases, with thousands of packages. Conda supports mixed environments, like Python+R+Perl, and does not remove old packages from the index. On the one hand, this enables reproducibility in the future (Need an old version of an R package or deepTools? No problem.), on the other hand it results in an incredibly large search space for the dependency solver to traverse. So in contrast to other package managers, Conda is constantly growing and we are currently not cutting out dead wood.
So we do face a special situation in Conda. Please take this into account when considering Conda’s performance. Yes, Conda is slow and will probably never be as fast as other package managers because Conda is vastly larger and supports scientific use-cases that others do not support. However, we are aware of this and multiple people are working on it. See our tips below.
How to improve solver performance
Conda is especially slow if R is involved. This has historical reasons, as most of the packages are in all 3 supported channels (anaconda, conda-forge, bioconda). This was our fault. However, things should improve dramatically if you install the latest version available, e.g. bioconductor-deseq2=1.22.1. We’ve learned from past issues and now pin to one particular R version. However, old packages are still around for the sake of reproducibility.
Use pins, install packages with versions. Even conda create -n foo python=3 deeptools
will help. You will magically solve all your R envs by simply adding r-base=3.5.1
to your package install list.
Recommendations
A few recommendations, especially for environments with R inside:
- use conda >=4.6.x
- For bioconda packages, use the recommended channel order
- try the new experimental
pycryptosat
* solver (https://www.anaconda.com/conda-4-6-release …)
conda install pycryptosat
conda config --set sat_solver pycryptosat
- use
--strict-channel-priority
conda config --set channel_priority strict
- Do not use
conda install
useconda create
- Use environment.yaml files where ever you can. These include exact package versions, removing much of the solver’s workload and drastically speeding things up.
*
Different people from the community are trying to improve the solver or using different strategies to improve the situation. This is, and probably always be, a work in progress. Conda will grow and Anaconda and the community will improve things as we go.
cutting down the search space
Please have a look at https://github.com/regro/conda-metachannel. Conda Metachannels are work in progress but will allow users to specify the portion of the graph they care about upfront. It is very rare that users will actually need ALL of the packages in bioconda/conda-forge. Think about it like a constrained channel, only a specific set of your packages appear in this special channel. All others are not available, so you can not recreate a 3 years old environment with this channel. However, if you have this use case you can just switch back to the normal channels.
Maybe we should have this at some point for our community. The idea could be, having all recent (~2 years) packages in this space but all others still available to reproduce old envs. Start a discussion!
Bioconda is prepared
Very early on we recognised the special challenges that Conda is trying to face and we are prepared for the special use-case of long-term reproducibility - BioContainers. The containers are frozen sets of conda environments. A BioContainer is created for every Bioconda package, but you can also create your own. https://usegalaxy.eu is maintaining 1034 environments currently using BioContainers and it works well in that demanding environment. Read more about this in our manuscript.
I recommend BioContainers for static/reproducible environments. For flexible environments we could use a metachannel in the future if we want to maintain this.
That said, I use conda on a daily basis and with the above recommendation I do not need a metachannel, as the normal conda solver is fast enough for me. However, I believe the conda community is prepared for the future.
Feedback
We would like to get feedback, benchmarks and examples do help us. What does slow mean? Considering what Conda is doing for you behind the scenes, is 30s or a minute really slow? Please provide numbers and the exact installation command.
Last but not least I would like to thank the conda-forge team, Anaconda and the @bioconda/core team that are constantly working on all the packages and trying to keep things fast and reliable even with 100k packages.
About this issue
- Original URL
- State: open
- Created 5 years ago
- Reactions: 40
- Comments: 19 (16 by maintainers)
Links to this issue
Commits related to this issue
- Follow conda solver recommendations. From: https://github.com/bioconda/bioconda-recipes/issues/13774 — committed to broadinstitute/viral-ngs by yesimon 5 years ago
- Follow conda solver recommendations. From: https://github.com/bioconda/bioconda-recipes/issues/13774 — committed to broadinstitute/viral-ngs by yesimon 5 years ago
- Follow conda solver recommendations. From: https://github.com/bioconda/bioconda-recipes/issues/13774 — committed to broadinstitute/viral-ngs by yesimon 5 years ago
- Follow conda solver recommendations. From: https://github.com/bioconda/bioconda-recipes/issues/13774 — committed to broadinstitute/viral-ngs by yesimon 5 years ago
- Follow conda solver recommendations. From: https://github.com/bioconda/bioconda-recipes/issues/13774 — committed to broadinstitute/viral-ngs by yesimon 5 years ago
- bioconda priorities According to https://github.com/bioconda/bioconda-recipes/issues/13774 this should speed up conda resolutions — committed to gdv/dotfiles by gdv 5 years ago
- Follow conda solver recommendations. (#931) From: https://github.com/bioconda/bioconda-recipes/issues/13774 — committed to broadinstitute/viral-ngs by yesimon 5 years ago
- Force strict channel priority when building images This is best practice as described here: https://github.com/bioconda/bioconda-recipes/issues/13774 The following command: conda install -c conda... — committed to rhpvorderman/galaxy by rhpvorderman 3 years ago
- Force strict channel priority when building images This is best practice as described here: https://github.com/bioconda/bioconda-recipes/issues/13774 The following command: conda install -c conda... — committed to rhpvorderman/galaxy by rhpvorderman 3 years ago
mamba also does a great job of speeding up the solving step. For
conda create -y --quiet --override-channels --channel iuc --channel conda-forge --channel bioconda --channel defaults --name __rpy2@2.9.4 rpy2=2.9.4
I just lost patience after some minutes and triedmambda create -y --quiet --override-channels --channel iuc --channel conda-forge --channel bioconda --channel defaults --name __rpy2@2.9.4 rpy2=2.9.4
and the solving step finished immediately.are there any plans moving the bioconda build-system to mamba, like conda forge did?
Give conda 4.7 a try. https://www.anaconda.com/how-we-made-conda-faster-4-7/
Make sure to put conda-forge before BioConda in your channel list.
Sent from my iPhone
@jakevc the recommended channels order changed a few months back when using bioconda and conda-forge, could you reorder your channels as below?