pip: Resolver preformance regression in 21.2
Description
As of 21.2, the resolver decides it needs to download every possible version, to determine which version is compatible with other requirements.
, for multiple dependencies.
Note that this works fine for some dependencies (coverage, pytest_cov), as it only downloads a few.
For dependencies botocore
and boto3
, however, it downloads every possible version.
A minimal reproducible example can be found here: https://github.com/bblommers/pipresolverbug
Not sure what the offending dependency/installation method is - I could only reproduce it for this specific workflow.
The CI for this project runs this specific example for both 21.1 and 21.2. Version 21.1 completes in 15 seconds - 21.2 is still running after 20 minutes.
The CI logs can be found here: https://github.com/bblommers/pipresolverbug/actions/runs/1064350454
Expected behavior
To only download the latest version of a dependency
pip version
21.2
Python version
3.7
OS
Linux (Ubuntu)
How to Reproduce
git clone git@github.com:bblommers/pipresolverbug.git
pip install -r req.txt
Output
Collecting botocore>=1.12.201
Using cached botocore-1.21.2-py3-none-any.whl (7.7 MB)
Collecting boto3>=1.9.201
Using cached boto3-1.18.1-py3-none-any.whl (131 kB)
Collecting botocore>=1.12.201
Using cached botocore-1.21.1-py3-none-any.whl (7.7 MB)
Collecting boto3>=1.9.201
Using cached boto3-1.18.0-py3-none-any.whl (131 kB)
Collecting botocore>=1.12.201
Using cached botocore-1.21.0-py3-none-any.whl (7.7 MB)
Collecting boto3>=1.9.201
Using cached boto3-1.17.112-py2.py3-none-any.whl (131 kB)
Collecting s3transfer<0.5.0,>=0.4.0
Using cached s3transfer-0.4.2-py2.py3-none-any.whl (79 kB)
Collecting botocore>=1.12.201
Using cached botocore-1.20.112-py2.py3-none-any.whl (7.7 MB)
INFO: pip is looking at multiple versions of boto3 to determine which version is compatible with other requirements. This could take a while.
Collecting boto3>=1.9.201
Using cached boto3-1.17.111-py2.py3-none-any.whl (131 kB)
Collecting botocore>=1.12.201
Code of Conduct
- I agree to follow the PSF Code of Conduct.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 8
- Comments: 101 (77 by maintainers)
Yes, of the ones I could reproduce. Some I simply could never reproduce and some have had their dependencies fixed (In those cases I’ve done my best to keep the test cases working by adding {package} <= {last known bad version} to my test requirement files).
I agree on closing, and once #10481 lands I think all pip issues related to heavy backtracking should be closed (e.g. #10373 #9187 #9215), as any new reports should be tested against that and if they experiencing heavy backtracking still it will take further analysis and the reasoning will not be the same.
I have summarized my proposal to assist with dependency trees like this and linked the PRs here: https://github.com/pypa/pip/issues/10479
If anyone has any ideas of how to improve or where I need to make a stronger argument let me know.
This new resolver just destroyed our CI process, which went from 4 minutes to timing out at 6 hours.
Not taken the wrong way, I would really like to test this and find examples. But a couple of points: 1) If the resolver never backtracks there will be no difference, 2) As my latest version only changes the preference order then the worst possible times are the same.
In theory, I think, the worst possible case for my optimization (and therefore presumably slower than the current resolver) would be: Package A version n depends on Package B and Package C, but they are both 100% incompatible , Package A version n-1 depends on 2 different packages Package D and Package E, there are also incompatible, etc. until eventually package A version 1 installs. Packages B, C, D, E, … have similar dependency incompatibilities to Package A.
But I don’t think this represents a real world dependency tree, the assumption of the optimization is if package A version n depends on package B, then package A version n-1 probably also dependends on package B. Where this doesn’t hold the optimization will be slower, but I think changing dependencies usually only happens once in a while and when the dependencies do change the user presumably changes them to a valid installable set of dependencies, in which case my optimization should still be faster.
I have been backtracking through open and closed issues that look performance based and trying to find any real world reproducible examples, there are shockingly few, most users just write something to the effect of “takes too long” and don’t provide an actual reproducible example.
If you have specific cases that improved from 21.1 to 21.2 I would love to test them. But I would like to point out my optimization is only adding 1 extra parameter to the pinning preference, most of the improvements that came with 21.2 should also be included in my optimization (in fact even better I fixed a bug in one of the improvements: https://github.com/pypa/pip/issues/10415 )
An idea I had was trying to build a tool that takes a real world test case and turns it in to an offline repeatable test case. I imagine this would be pretty tricky but hopefully possible.
But as of right now I can not get pip’s test suite to even start to run on my machine, I spent an hour trying yesterday and tox/pytest just keep throwing me cryptic errors. I’ll give it another shot at the weekend when I have more time to work on it.
@edmorley I think the issue is having any kinda of benchmark suite right now, yes the performance should probably be measured in iterations not time.
@bblommers Hmm interesting, there must be something different about your exact projects vs. the way I’ve built the reproducible examples. I will take a look when I get a chance. This is a very hacky approach I am taking, once I formalize my ideas a bit and have time to better understand the pip code base I will maybe have a better test.
@pradyunsg Thanks, appreciated. I don’t think I’ll be able to schedule anything fixed this weekend or next. But I’ll take a look at at my calendar and book/msg you.
One thing that I don’t think is commonly known¹ and which might be useful for people hitting problems with long resolution times involving boto, is that if you have a constraint file containing version limits for a project, those limits are applied when pip fetches the list of versions from the package index, i.e., before the resolution and backtracking step. Conversely, version limits that are in a requirements file and/or individual project dependencies, are only applied when the dependency is first encountered (which may be after the version list is fetched).
So as a rule of thumb:
So, for example, in the case here, try adding
to a
constraints.txt
file, and see if that helps.¹ By which I mean that I thought it would be a useful option to add, went and looked at the code and discovered that I’d implemented it when I first added constraint support to the new resolver, but had forgotten about it 🙂
I want to emphasis again, please post what you are installing and the output you get. Dropping a comment without any details won’t get your issue fixed. We are developers here, not psychics.
I put together another commit that focuses only on Hypothesis 2, i.e. it tries to resolve the current conflicts before anything else.
It prefers the causes and parents of causes of the last known conflicts. In my testing it fixes OPs issue and other test cases I have to hand. The hypothesis here is that “If package A version n depends on package B then package A version n-1 probably also depends on package B”. Therefore “If package A and C both depend on package B but on different versions of package B the resolution will probably be faster it it focuses on packages A and C and not on some package X which does not depend on package B”.
Feel free to try here, I make no promises it will improve your particular case:
python -m pip install git+git://github.com/notatallshaw/pip@third_attempt_at_prefer_non_conflicts
Code diff: https://github.com/notatallshaw/pip/compare/21.2.3...notatallshaw:third_attempt_at_prefer_non_conflicts
Compared to before it doesn’t try to hack the resolution steps of Pip, so it’s worst case is the same as the current worst case. I think this approach is now simple enough that it’s worth me trying to work a real PR rather than just PoC commits. FYI it also includes a fix for #10415 which I think may also help some use cases.
Hi, thanks all for offering up potential additional test cases. I will take a look at them over the weekend when I have spare time.
Just a quick question though, what version of pip are you using?
If you are using the tag I created there is no guarantee it will perform as expected, it is very much doing hacky things to the internals of the pip Resolver and I have made no attempt to verify correctness.
Thanks so much! I’ll take a closer look at the logic you implemented, but my intuition is this may be worthwhile to pursue, if I understand your logic correctly from a high level. The main criterion should be would it be easy enough for a user to identify and avoid the worst case scenario. The current logic does not perform very well because (using the
importlib-metadata
example in https://github.com/pypa/pip/issues/9215#issuecomment-897259778) the resolver is working on seemingly unrelated stuff and the user cannot easy tell what exactly is causing the backtracks before the resolver stops to emit an error. So it’d be worthwhile to have pip outputs on those made-up examples to see if it can point users to the issue. Pip’s test suite has some useful functions you can use to create made-up packages (create_basic_wheel
something) for this purpose.Okay I had a bit of time today and I’ve developed a proof of concept for this strategy!
Hypothesis: When resolving requirements of a group of packages and there is a conflict then it is likely to find a solution faster by delaying trying new candidates of packages which are unrelated to the found conflicts.
Here is my PoC commit, it is based on Pip 21.2.1 and the code is very rough and definitely has some bugs in the approach, but is just supposed to show case the approach for the 2 reproducible test cases (listed below): https://github.com/notatallshaw/pip/commit/d085824b52bdb11efd6b46be92fe60a9c4040bd4
Reproducible test case 1:
Reproducible test case 2 (python 3.7 only):
For those who wish to try if it improves their performance regression you can install with the following command (there are zero guarantees it will not crash):
python -m pip install git+git://github.com/notatallshaw/pip@first_attempt_at_prefer_non_conflicts
Constructed counter example: I want to highlight that this hypothesis is probabilistic, you can construct cases where this approach is slower, but all the constructions I have been able to come up with seem highly esoteric.
For example let’s say you have 4 packages:
a
(v1 - v3),b
(v1 - v10),c
(v1 - v10),d
(v1 - v3) , your requirements area
,b
,c
, and all versions ofb
andc
required
. Nowb
v2 - v9 depends ond==v3
, andc
v2 - v9 depends ond==v2
but bothb
v1 andc
v1 depend ond==v1
, finallya
v2 and v3 don’t depend on anything buta
v1 depends onb==v1
andc==v1
. Therefore only solution to this isa
v1,b
v1,c
v1, andd
v1.The current resolver method would backtrack twice on
a
to get toa
v1, find the requirement forb==v1
andc==v1
and then find their requirements ford==v1
and return the solution. However the resolver method I am proposing would not backtrack ona
as it does not present any conflicts untilb
andc
have been exhausted, therefore being much slower in this constructed example.@uranusjr Given my hypothesis, PoC commit, and my admitted example where this approach would actually be slower, is it worth pursuing this further to the point where I (or someone else) can try and submit a PR? Is this something you would be interested in?
At this point I think this optimization would be a significant amount of work to implement in a way that wasn’t super hacky. So there’s no point proceeding if you don’t agree with the approach. That said I am personally of the opinion this would be a significant optimization for pip in many situations where it currently performs very badly. I actually also thinks it might negate the
setuptools
hack in almost all situations (though I do not have any data for this yet).I haven’t been able to take a good look into this, but a couple of notes:
I’m not part of pip team but I’d like to point out this isn’t true, I have lots of requirements that don’t pin a specific version of a package and I don’t have this issue.
As already stated by others if you don’t post specifically what you’re doing (giving the command and requirements) then your problem probably won’t be helped.
If no one else does by this evening (EST) I’ll attempt to reproduce the issue and then try adding those packages to the list and report the results.
We only resolve packages with “tighter constraints” first, but don’t actually count the number of releases. I think the main reason is there’s currently no mechanism to do that; the information is of course available on the index page, but we don’t have a way to bubble it up past PackageFinder.