pip: Pip fails to install some satisfiable requirements since extras optimization
Description
In Pip 23.3, an extras optimization was introduced to improve the speed of backtracking when dealing with extras. However, under specific circumstances, this optimization can lead to Pip failing to install valid requirements.
In PR https://github.com/kedro-org/kedro/pull/3182 it linked to a Pip issue https://github.com/pypa/pip/issues/12317 as a cause of resolution failure, but investigating I found:
- The issue started occurring in Pip 23.3, not Pip 23.1.
- After examining the relevant code sarugaku/resolvelib#134, it seemed it was not responsible for triggering this error and therefore the error the PR is working around is not issue #12317
To further investigate I tested various Pip branches, and I believe the problem arises when the following warning is encountered:
WARNING: <package> <version> does not provide the extra '<extra>'
I hypothesize that the optimization is being applied when no extra actually exists. This occurs because extras are optional, and Pip will simply emit a warning when an extra is not present.
Expected behavior
Optimization should not cause failure
pip version
pip 23.3.1
Python version
3.11
OS
Linux
How to Reproduce
- git clone https://github.com/kedro-org/kedro
- cd kedro
- python3.11 -m venv .venv
- source .venv/bin/activate
- python -m pip install pip --upgrade
- pip install --dry-run .[test]
Output
...
WARNING: import-linter 1.8.0 does not provide the extra 'toml'
ERROR: Cannot install dask[complete]==2021.12.0 and kedro[test]==0.18.14 because these package versions have conflicting dependencies.
The conflict is caused by:
kedro[test] 0.18.14 depends on dask~=2021.10; extra == "test"
dask[complete] 2021.12.0 depends on dask 2021.12.0 (from https://files.pythonhosted.org/packages/15/6d/99c63be3ea8a4a651d845addeea1f1b3bb8e5c6730bc26cfb6176631adf7/dask-2021.12.0-py3-none-any.whl (from https://pypi.org/simple/dask/) (requires-python:>=3.7))
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
Code of Conduct
- I agree to follow the PSF Code of Conduct.
About this issue
- Original URL
- State: closed
- Created 8 months ago
- Reactions: 2
- Comments: 15 (15 by maintainers)
Okay, to finally round this out, I remembered there was another set of requirements (tested on Python 3.8 Linux) that failed to installed prior to the extas optimization but installed after it:
"aicsimageio==4.9.1" "napari-aicsimageio==0.7.2" "napari"
I created another branch to test with the extras optimization removed: https://github.com/notatallshaw/pip/compare/main...no-extras-optimization
If I test this I get:
IMO then the extras optimization alters what requirements surface the correctness issue in resolvelib (seemingly caused by the backjumping optimization), but does not necessarily even make it more likely, and the work to fix this should be on the resolvelib side.
Ultimately it’s up to Pip maintainers, but I am going to close this in favor of https://github.com/pypa/pip/issues/12317
I don’t disagree it’s possible, but to summarize what we’ve found so far:
So IMO given this, and debugging work done, I don’t think it’s valid to point the finger at extras optimization unless something new is found
I created a branch which reverted out the backjump optimization and for me this now resolves: https://github.com/notatallshaw/pip/compare/main...remove-backjump
@sanderr when you get a chance would you mind testing this branch with your examples you give in your comment: https://github.com/notatallshaw/pip/tree/remove-backjump
I’m now of the opinion this is a correctness issue with the backjump optimization added to resolvelib, and not with the extras optimization added to Pip. And unless anyone objects we can mark this a duplicate of https://github.com/pypa/pip/issues/12317.
I’m going to add a comment to https://github.com/sarugaku/resolvelib/issues/134 and spend a little time seeing if I can spot where the backjump is incorrectly skipping over a correct state.
A note on this, my understanding is
get_preferences
should be able to produce any preference and the backtrack should eventually complete.I looked into this for some time and I think I understand more or less what’s going on. First, this package has a huge and complicated dependency tree with many upper bounds, resulting in complicated resolution process. The dependency conflict seems to be an interplay between constraints on
adlfs
,dask
andgcsfs
, each of which have their own constraints onfsspec
.I am relatively confident that the root cause is that resolvelib can not reliably do the proper backtracking required to come to a satisfactory solution. That said, the specific case that resolvelib has trouble with used to be one that would only arise when dependencies with and without extra are present for the same package. Pip now inserts those itself, meaning that this behavior is triggered more easily. So it should probably be addressed.
Some more details, and a smaller reproduction follow below.
If you strip everything but the dependencies on
adlfs
,dask
andgcsfs
from thekedro
setup.py
andpyproject.toml
, you can still reproduce the issue withpip install --dry-run .[test]
. If you then addfsspec>=2021.3, <=2023.1
to the dependencies, the issue disappears. This additional constraint makesresolvelib
backtrack on other packages, leading to a successful resolution (I only half understand the exact logic behind this so I can’t really go into more details about that).On pip 23.2.1, if you add
dask~=2021.10
(without extra) to the dependencies, you get the same issue. This highlights that the issue is only triggered by the presence of both the extra and extra-less package. Other than that, the root cause has always been there.#12095 made an attempt to couple extra and extra-less dependencies more tightly. However, it still has some limitations. One of them is that
resolvelib
still considers each a “name” to be resolved in its own right, even though both have to be the exact same version. Therefore, when a version is picked for both,resolvelib
can no longer backtrack on just one of them, it would have to backtrack on both. I believe (not entirely certain) it now tries to backtrack on the one with the extra, but none of the other candidates for it are valid because the extra-less package is still pinned.To be clear: I believe the only part of #12095 that really affects this regression is the injection of the implicit extra-less dependency. The rest of the pull request (tighter coupling of extra and extra-less dependencies) is unrelated as far as I can see.
I do not have a solution in mind yet. There are still some details that are not entirely clear to me. Perhaps some changes to the
get_preferences
implementation might do the trick.I don’t think so, which is why I didn’t add this example to that issue. And testing your branch I get the same error:
CC: @uranusjr @sanderr @sbidoul
FYI to be clear this is a regression since Pip 23.3, not sure how critical you consider it.