pip: Pip fails to install some satisfiable requirements since extras optimization

Description

In Pip 23.3, an extras optimization was introduced to improve the speed of backtracking when dealing with extras. However, under specific circumstances, this optimization can lead to Pip failing to install valid requirements.

In PR https://github.com/kedro-org/kedro/pull/3182 it linked to a Pip issue https://github.com/pypa/pip/issues/12317 as a cause of resolution failure, but investigating I found:

  1. The issue started occurring in Pip 23.3, not Pip 23.1.
  2. After examining the relevant code sarugaku/resolvelib#134, it seemed it was not responsible for triggering this error and therefore the error the PR is working around is not issue #12317

To further investigate I tested various Pip branches, and I believe the problem arises when the following warning is encountered:

WARNING: <package> <version> does not provide the extra '<extra>'

I hypothesize that the optimization is being applied when no extra actually exists. This occurs because extras are optional, and Pip will simply emit a warning when an extra is not present.

Expected behavior

Optimization should not cause failure

pip version

pip 23.3.1

Python version

3.11

OS

Linux

How to Reproduce

  1. git clone https://github.com/kedro-org/kedro
  2. cd kedro
  3. python3.11 -m venv .venv
  4. source .venv/bin/activate
  5. python -m pip install pip --upgrade
  6. pip install --dry-run .[test]

Output

...
WARNING: import-linter 1.8.0 does not provide the extra 'toml'
ERROR: Cannot install dask[complete]==2021.12.0 and kedro[test]==0.18.14 because these package versions have conflicting dependencies.

The conflict is caused by:
    kedro[test] 0.18.14 depends on dask~=2021.10; extra == "test"
    dask[complete] 2021.12.0 depends on dask 2021.12.0 (from https://files.pythonhosted.org/packages/15/6d/99c63be3ea8a4a651d845addeea1f1b3bb8e5c6730bc26cfb6176631adf7/dask-2021.12.0-py3-none-any.whl (from https://pypi.org/simple/dask/) (requires-python:>=3.7))

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

Code of Conduct

About this issue

  • Original URL
  • State: closed
  • Created 8 months ago
  • Reactions: 2
  • Comments: 15 (15 by maintainers)

Most upvoted comments

Okay, to finally round this out, I remembered there was another set of requirements (tested on Python 3.8 Linux) that failed to installed prior to the extas optimization but installed after it: "aicsimageio==4.9.1" "napari-aicsimageio==0.7.2" "napari"

I created another branch to test with the extras optimization removed: https://github.com/notatallshaw/pip/compare/main...no-extras-optimization

If I test this I get:

  • Pip main: installs
  • Pip with extras optimization but without backjumping: installs
  • Pip without extras optimization but with backjumping: ResolutionImpossible message (that is incorrect)

IMO then the extras optimization alters what requirements surface the correctness issue in resolvelib (seemingly caused by the backjumping optimization), but does not necessarily even make it more likely, and the work to fix this should be on the resolvelib side.

Ultimately it’s up to Pip maintainers, but I am going to close this in favor of https://github.com/pypa/pip/issues/12317

FWIW, I think it can also be an inaccuracy in how the extras optimisation works.

I don’t disagree it’s possible, but to summarize what we’ve found so far:

  1. The backjumping optimization causes failures to install satisfiable requirements without extras involved
  2. Removing backjumping but keeping extras optimization allows every example involving extras found to install

So IMO given this, and debugging work done, I don’t think it’s valid to point the finger at extras optimization unless something new is found

I created a branch which reverted out the backjump optimization and for me this now resolves: https://github.com/notatallshaw/pip/compare/main...remove-backjump

@sanderr when you get a chance would you mind testing this branch with your examples you give in your comment: https://github.com/notatallshaw/pip/tree/remove-backjump

I’m now of the opinion this is a correctness issue with the backjump optimization added to resolvelib, and not with the extras optimization added to Pip. And unless anyone objects we can mark this a duplicate of https://github.com/pypa/pip/issues/12317.

I’m going to add a comment to https://github.com/sarugaku/resolvelib/issues/134 and spend a little time seeing if I can spot where the backjump is incorrectly skipping over a correct state.

I do not have a solution in mind yet. There are still some details that are not entirely clear to me. Perhaps some changes to the get_preferences implementation might do the trick.

A note on this, my understanding is get_preferences should be able to produce any preference and the backtrack should eventually complete.

I looked into this for some time and I think I understand more or less what’s going on. First, this package has a huge and complicated dependency tree with many upper bounds, resulting in complicated resolution process. The dependency conflict seems to be an interplay between constraints on adlfs, dask and gcsfs, each of which have their own constraints on fsspec.

I am relatively confident that the root cause is that resolvelib can not reliably do the proper backtracking required to come to a satisfactory solution. That said, the specific case that resolvelib has trouble with used to be one that would only arise when dependencies with and without extra are present for the same package. Pip now inserts those itself, meaning that this behavior is triggered more easily. So it should probably be addressed.

Some more details, and a smaller reproduction follow below.

If you strip everything but the dependencies on adlfs, dask and gcsfs from the kedro setup.py and pyproject.toml, you can still reproduce the issue with pip install --dry-run .[test]. If you then add fsspec>=2021.3, <=2023.1 to the dependencies, the issue disappears. This additional constraint makes resolvelib backtrack on other packages, leading to a successful resolution (I only half understand the exact logic behind this so I can’t really go into more details about that).

On pip 23.2.1, if you add dask~=2021.10 (without extra) to the dependencies, you get the same issue. This highlights that the issue is only triggered by the presence of both the extra and extra-less package. Other than that, the root cause has always been there.

#12095 made an attempt to couple extra and extra-less dependencies more tightly. However, it still has some limitations. One of them is that resolvelib still considers each a “name” to be resolved in its own right, even though both have to be the exact same version. Therefore, when a version is picked for both, resolvelib can no longer backtrack on just one of them, it would have to backtrack on both. I believe (not entirely certain) it now tries to backtrack on the one with the extra, but none of the other candidates for it are valid because the extra-less package is still pinned.

To be clear: I believe the only part of #12095 that really affects this regression is the injection of the implicit extra-less dependency. The rest of the pull request (tighter coupling of extra and extra-less dependencies) is unrelated as far as I can see.

I do not have a solution in mind yet. There are still some details that are not entirely clear to me. Perhaps some changes to the get_preferences implementation might do the trick.

I can not try for myself right now but this looks to me like the same root cause as #12372 (comment). I already have a potential fix to it (linked in that discussion).

I don’t think so, which is why I didn’t add this example to that issue. And testing your branch I get the same error:

$ pip install https://github.com/sanderr/pip/archive/refs/heads/issue/12373-bugfix-install-local-extra.zip
...
$ pip install --dry-run .[test]
...
WARNING: import-linter 1.8.0 does not provide the extra 'toml'
ERROR: Cannot install dask[complete]==2021.12.0 and kedro[test]==0.18.14 because these package versions have conflicting dependencies.

The conflict is caused by:
    kedro[test] 0.18.14 depends on dask~=2021.10; extra == "test"
    dask[complete] 2021.12.0 depends on dask 2021.12.0 (from https://files.pythonhosted.org/packages/15/6d/99c63be3ea8a4a651d845addeea1f1b3bb8e5c6730bc26cfb6176631adf7/dask-2021.12.0-py3-none-any.whl (from https://pypi.org/simple/dask/) (requires-python:>=3.7))

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

CC: @uranusjr @sanderr @sbidoul

FYI to be clear this is a regression since Pip 23.3, not sure how critical you consider it.