pip: pip uses backtracking when dependency installation fails

Description

When installation of a dependency fails, pip uses the backtracking feature to try other versions of the package (even if the failure is not due to a version conflict)

Expected behavior

I understand that the backtracking is useful to solve version conflicts. Trying different versions when the installation fails for another reason than a version conflict is IMO not useful most of the time, as it often indicates a missing system package.

I find this particular annoying during CI tests, as it takes forever before the test actually fails. If this is intended behavior, it would be great to have a flag to disable it.

pip version

21.0.1

Python version

3.7.10

OS

arch linux

How to Reproduce

As an example, I install scikit-bio into a clean environment (which fails, because the package doesn’t properly declare the numpy dependency)

conda create -n test_skbio python=3.7 pip
conda activate test_skbio
pip install scikit-bio

Output

Collecting scikit-bio
  Using cached scikit-bio-0.5.6.tar.gz (8.4 MB)
    ERROR: Command errored out with exit status 1:
     command: /home/sturm/anaconda3/envs/test_skbio/bin/python3.7 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/sturm/tmp/pip-install-0wj1ulih/scikit-bio_e1c29669eff64467acdb675f656b2ef2/setup.py'"'"'; __file__='"'"'/home/sturm/tmp/pip-install-0wj1ulih/scikit-bio_e1c29669eff64467acdb675f656b2ef2/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /home/sturm/tmp/pip-pip-egg-info-95p85io3
         cwd: /home/sturm/tmp/pip-install-0wj1ulih/scikit-bio_e1c29669eff64467acdb675f656b2ef2/
    Complete output (5 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/sturm/tmp/pip-install-0wj1ulih/scikit-bio_e1c29669eff64467acdb675f656b2ef2/setup.py", line 20, in <module>
        import numpy as np
    ModuleNotFoundError: No module named 'numpy'
    ----------------------------------------
WARNING: Discarding https://files.pythonhosted.org/packages/66/b0/054ef21e024d24422882958072973cd192b492e004a3ce4e9614ef173d9b/scikit-bio-0.5.6.tar.gz#sha256=48b73ec53ce0ff2c2e3e05f3cfcf93527c1525a8d3e9dd4ae317b4219c37f0ea (from https://pypi.org/simple/scikit-bio/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
  Using cached scikit-bio-0.5.5.tar.gz (8.3 MB)
    ERROR: Command errored out with exit status 1:
     command: /home/sturm/anaconda3/envs/test_skbio/bin/python3.7 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/sturm/tmp/pip-install-0wj1ulih/scikit-bio_ef40087a894243eea6e9ba7506c90c26/setup.py'"'"'; __file__='"'"'/home/sturm/tmp/pip-install-0wj1ulih/scikit-bio_ef40087a894243eea6e9ba7506c90c26/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /home/sturm/tmp/pip-pip-egg-info-p8h2qvwu
         cwd: /home/sturm/tmp/pip-install-0wj1ulih/scikit-bio_ef40087a894243eea6e9ba7506c90c26/
    Complete output (5 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/sturm/tmp/pip-install-0wj1ulih/scikit-bio_ef40087a894243eea6e9ba7506c90c26/setup.py", line 20, in <module>
        import numpy as np
    ModuleNotFoundError: No module named 'numpy'
    ----------------------------------------
WARNING: Discarding https://files.pythonhosted.org/packages/2d/ff/3a909ae8c212305846f7e87f86f3902408b55b958eccedf5d4349e76c671/scikit-bio-0.5.5.tar.gz#sha256=9fa813be66e88a994f7b7a68b8ba2216e205c525caa8585386ebdeebed6428df (from https://pypi.org/simple/scikit-bio/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
  Using cached scikit-bio-0.5.4.tar.gz (8.3 MB)
    ERROR: Command errored out with exit status 1:
     command: /home/sturm/anaconda3/envs/test_skbio/bin/python3.7 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/sturm/tmp/pip-install-0wj1ulih/scikit-bio_aa90daa04e0549fbbd36b29262ef299e/setup.py'"'"'; __file__='"'"'/home/sturm/tmp/pip-install-0wj1ulih/scikit-bio_aa90daa04e0549fbbd36b29262ef299e/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /home/sturm/tmp/pip-pip-egg-info-d6wu69n6
         cwd: /home/sturm/tmp/pip-install-0wj1ulih/scikit-bio_aa90daa04e0549fbbd36b29262ef299e/
    Complete output (5 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/sturm/tmp/pip-install-0wj1ulih/scikit-bio_aa90daa04e0549fbbd36b29262ef299e/setup.py", line 20, in <module>
        import numpy as np
    ModuleNotFoundError: No module named 'numpy'
    ----------------------------------------

Code of Conduct

I agree to follow the PSF Code of Conduct.

About this issue

Original URL
State: closed
Created 3 years ago
Reactions: 4
Comments: 20 (13 by maintainers)

Most upvoted comments

I’ll keep working with the same isolated example: One package is pinned to an old version and another more recent package dependency has a reverse dependency to the same package, but with a new version.

I agree that the behaviour here is bad, but I don’t have enough information to understand why it’s happening yet. Let’s come back to that, though.

Regarding your proposed solution:

Have a global counter of the number of steps performed in backtracking

We have that. It’s called max_rounds and is set here, to 2000000. That may seem a lot, but it was originally set much lower, and we got users complaining that pip gave up too soon. We found that in terms of time spent, the number of rounds could be increased a lot without the time being affected too badly, so we increased to the current value.

The problem you have appears to be that in your case, the time spent is not because of too many rounds. But we don’t know what it is.

So we need more information. If you were to profile your case, and identify:

Where the time is actually being spent.
How many times pip does a step that gets thrown away by backtracking (please be careful here, we need details - trying to build 100 dependencies, finding a conflict and throwing them away is one backtrack, even if it takes many hours and 100 package builds were thrown away).
In particular, what proportion of time did pip spend building stuff just to extract metadata (dependency information). Our best theory at the moment for all of these “pip takes ages” cases is that pip is building heaps of stuff because the only way to get dependency information for a sdist is to build it.
What information pip has available when a backtrack occurs, and how much help that is in “pruning” the list of options remaining (hint: we’ve done this, and it’s really hard - see previous comment about “pip doesn’t know that a build failed because of missing system headers”)

Then, we might be able to determine where the problem lies in your case. Without trying to pre-judge, I’m fairly certain that the answer won’t be something pip can address easily (typically, it’s builds that take a long time to complete).

Some workarounds which I’m sure aren’t acceptable, but may give you some food for thought:

Hit CTRL-C after the install has been going for 30 minutes. At that point, as a first step, you can assume that pip has gone into some sort of backtracking spiral, so add constraints to fix that. If you can’t work out how to do that, even with pip’s verbose log information, consider why you believe pip can. Equally, if you don’t know whether 30 minutes is the right length of time to wait, consider how pip could know any better than you.
Pre-build any packages you might need for the install. The pip can just install wheels, which is extremely unlikely to be slow. That might be a pain, because you have to track dependencies to work out what’s needed - but that’s what pip has to do, so maybe that’s where the cost lies?
A combination - kill the process, look at what pip needed to do, prebuild stuff, repeat.

None of these will fix the issue, but they may give you insights, and possibly even suggest a way forward. If you produce a proof of concept fix from that which helps your issue, we’d love to know.

Maybe I haven’t noticed cases where backtracking was active and helpful because it Just Worked®?

Quite probably. We have many millions of people using pip daily. And we’ve had people comment that the new resolver was a significant benefit for them. Honestly, do you really think we would have released the new resolver if we’d had feedback that it was a net loss? This is probably the most extensively publicised feature pip has ever released, and we did more user research on it than we ever had before (thanks to the funding we received). So yes, I’m afraid you are in a small minority here. I know that’s no help to you personally, but as pip maintainers we have to look at the wider picture.

When you find that people have benefited from backtracking, how many steps have typically run?

We have no idea. Nobody tells us anything when things work well. Maybe you can imagine how demoralising that can be? Particularly when people who raise issues assume we have all that information to hand 🙁

I think we’re just going round in circles now (ironic, really 😉). I suggest that if you want to make progress with this, you profile where pip is spending its time, as I suggested above, and give us some feedback on precisely what pip (or the build tool) is doing in all that time.

pfmoore on Apr 7, 2021

I see! I still believe something like --fail-fast would be useful for CI builds.

grst on Apr 2, 2021

This is basically an impossible problem. The first design was actually to fail the entire installation on build failures, and we got a flood of requests for the current behaviour, so we implemented it. Judging from the initial backlash (and the quietness after we released the change until this issue), I am assuming more people find the current behaviour more useful.

uranusjr on Apr 2, 2021

Merging into #10655 since it covers this problem, and we don’t really need two issues on this.

uranusjr on Dec 9, 2021

I have also reported the problem, and I don’t understand: there is no reason to not fail fast in any circumstances, when the outcome is anyway to fail:

it’s better for the quick understanding and fix of the issue,
it’s better for your cloud budget,
it’s better for the planet.

when other people complained, it was probably because there were still several issues inter-mixing in code and minds around the new resolver. when you can detect that it will fail for a reason, no need to look for other reasons.

Like in Chess: if I do that move I will loose my King, … but maybe I can still take his Queen ?

stonebig on Apr 4, 2021

Yeah, a flag like that makes a lot of sense. Once the legacy resolver is removed entirely, we can start implementing various “strategy flags” for the resolver, and this is one of the first I’m looking to have as well.

uranusjr on Apr 2, 2021