uv: "Slow" performance resolution for requirements
Honestly, this isn’t that slow, feel free to close this as “working as expected”, but throwing all the difficult real world resolution cases I have at uv this is the slowest set of requirments I have found so far (at least until https://github.com/astral-sh/uv/issues/1342 is fixed):
uv pip compile requirements.in --exclude-newer 2023-10-01T00:00:00Z --no-cache
Where requirements.in
is:
click >= 7.0
click-loglevel ~= 0.2
dandi >= 0.24.0
psutil ~= 5.9
pyyaml
selenium
On my machine takes 1m 21s.
If the issue is similiar to why Pip is slow for these requirements then the issue may be that urllib3 2.0 is pinned early in the resolution process and boto3
/botocore
put an upper bound on urllib3
, forcing backtracking on increasingly early versions of boto3
/botocore
on which there are hundreds to collect.
If uv instead backtracked on urllib3 then it would save having to collect metadata from hundreds of versions of boto3
/botocore
. I do wonder if there’s some heuristic that could help here.
For Pip I have a speculative solution for encountering this kind of issue https://github.com/pypa/pip/issues/12523, although I have not had chance to make a PR so I don’t yet have evidence it actually helps. And I’ve not looked at uvs resolution process so I don’t know if this idea translates at all.
About this issue
- Original URL
- State: closed
- Created 5 months ago
- Reactions: 1
- Comments: 40 (11 by maintainers)
uv seems to have made significant speed improvements since this was originally posted, running my original command on different versions of uv today:
0.1.1: ~72s 0.1.29: ~27s 0.1.31: ~11s
That’s a ~6.5x performance improvement since I opened this ticket, with the latest jump coming from https://github.com/astral-sh/uv/pull/2452, that’s very impressive from the uv team!
At this point I’m not sure what to do with this issue, while I feel that resolution could be improved a lot by 1) Intelligently collecting less packages, and 2) Avoiding picking “bad” resolutions, my original example has at least mitigated if not in fact fixed.
Upon finding a good example I will open a new issue for “2)”, so the nuance of what is “bad” can be discussed more directly, and if I can find an even more pathological example of my original requirements I may open a new issue for “1)”.
The pubgrub paper suggestion is based on advice in “answer set programming”, which itself is generalized common wisdom. Generally when looking at a backtracking algorithm “fewest versions first” is exponentially faster than “most versions first”, but “generally” is doing a lot of work in that statement. For one thing loading in data is it most linearly expensive, but in this use case the linear term is often bigger than the potentially exponential term.
You absolutely want to make sure “no available options” is resolved first, as its conflict. There is no advantage in delaying “one available version” as it’s inevitable. It is easy to generalize that to “fewest versions first”, but beyond those first two cases there’s lots of good room for experimentation.
pubgrub-rs
should make it extremely easy foruv
to experiment with this based on any index based heuristic it would like. It does not currently provide useful diagnostics about which packages are “causing problems” nor a method for a user to say “let’s try backtracking further than strictly necessary”. I would love to add such API/diagnostics but it still at the brainstorming state.I’d also prefer the default to be to not add upper bounds. If uv’s high level project workflow involves a distinction between applications and libraries, maybe the default of adding upper bounds or not could depend on that (since false positive upper bounds in libraries are often only felt by the rest of the ecosystem, not the authors).
FYI, adding upper bounds on a library is strongly discouraged in the Python community. You’ve linked this blog post so you probably already know but the tl;dr is:
I would be unhappy to use uv as a project management tool if upper bounds were added by default. And in general, while some Python users use project management tools, especially software developer engineer teams, my personal experience is that a lot of Python users are not so advanced in this area and find project management tools too complicated, so none of this will help them.
As a smaller side note, assuming semver for bounds requirements seems pretty unrealistic, I don’t know any Python project that strictly follows semver.
I strongly disagree that in general False positives are better than False negatives for many reasons:
I believe this approach of “let’s be over restrictive” works well in languages that can install multiple versions of the same library (e.g. rust, and nodejs), and is common practise and accepted by the respective ecosystem, but I do not believe it would work well in the Python packaging world.
We’ll soon be in the situation that we will have to set defaults in uv, so let me try to answer this as we would in our docs:
When you start a new project or add a new dependency, the default should be starting with the current version, let’s call it x.y.z, and it’s compatible version range. When using
uv add foo
, for simplicity we’llfoo >=x.y.z,<(x+1).0.0
, but you are encouraged to check with the project’s versioning policy.For the project structure we recommend a ci job
minimal-versions
, our project templating will generate one. Your policy on bumping minimal version can be different depending on the project, most will bump whenever they want to use a newer feature, while core libraries and frameworks will have policies for wide ranges. If you add compatibility for foo x+1, you are encouraged to keep the old lower if it is compatible, in our examplefoo >=x.y.z,<(x+2).0.0
.Similar to testing minimal dependency versions, we recommend testing on all python versions you support, or at least min and max versions, so the min version job catches the problem. Our lockfiles will have the ability to model using different versions of a dependency based on the python version if the case arises (but with the recommendation to pick dependencies that support at least as many python versions as you do). The cli will make it easy to test a different version, something like
uv run -p 3.9 pytest
, and we’ll try to make the new venv creation blazingly fast 😉Let me go back to the
numpy >= 2
: The core problem is that a new major release of a fundamental library causes an ecosystem split. Let us generalize this as a library B has a major update and a library A that requires B in version range R. We can model this as a predictor where a given version V, we want to predict the target “A would break” with the predictor “outside the version range R”. If we do this for all libraries A that require B, we get a confusion matrix (sorry for pulling in statistics here but it’s the clearest way for me to express this tradeoff):For context, the usual usage is:
When everybody puts bounds on their dependencies (this kinda applies to both lower and upper bounds), we get false positives (“That library clearly works on numpy 1 and numpy 2, why is this breaking my dependency resolution?”), if nobody puts bounds we will get false negatives (“Why did updating torch suddenly break my app?”). The remedy for too strong bounds are overrides, the remedy for missing bounds are constraint files (or just adding them as direct dependencies). (And before overrides existed, caps couldn’t be fixes by users, so no bounds was the only option really)
From my personal development style, i prefer having bounds, even if they are too strong. It means i can generally just update dependencies and only need to worry about incorrect constraints when resolution fails (and i get the offending package in the resolver error). Without bounds, i would have to carefully check all package updates in my dependency trees for compatibility and that we didn’t get into a pathological backtracking case picking wrong versions. I prefer the confidence of maintainer-declared compatibility, even if means more work for overriding this when it fails.
No. But it seems a plausible idea.
Apart from anything else there is an argument that backtracking boto3 (say) in the hope of finding a version that doesn’t declare itself incompatible with recent urllib3 (say) is likely to give bad answers - Is ye-olde-version of boto3 really compatible with recent urllib3, or is it just that they didn’t know at the time that it would turn out not to be?
This issue has fractured into many different discussions, I feel it’s even difficult to respond at this point in a useful way. To summarize the different threads:
Personally, I think it would be more fruitful to continue these discussions in specific issues, as though are brought up, with real examples or as/when uv adds relevant features. So I’m going to close this issue, but if there is disagreement on this I am not going to object to it being reopened.
re lower bounds: your tone suggests that you think you are disagreeing with me, but I am not sure this is the case. The points you make are fully consistent with what I wrote.
re upper bounds: I think we are saying nothing new at this point. Adding a bound, or not, is a guess about future compatibility which will certainly sometimes be wrong - with negative consequences either way.
In practice I find that it makes very little difference when dealing with maintained projects - either a bound will be relaxed, or an incompatibility will be fixed. It is the unmaintained projects that are problematic and - as earlier - one should prefer to avoid those anyway.
On backtracking pains to ancient versions I’d also be happier with lower bounds being auto-added with some heuristic of say oldest version in last ~2/3 years. Lower bounds tend to cause fewer issues. Not always correct, but I’d prefer over automatic upper bounds. Part of it is that very old backtracking is risky in it’s own ways as default.
Yeah that specific assumption has tendency to leave poetry based projects as having too tight/fake dependency constraints. I’ve seen a lot of libraries have next major version as upper bound even though it works fine in practice on our unit/integration test suite. I’m very distrustful of python library upper bounds. Sometimes they are real bound based on actual error/issue, but too often I dig into repository and find no explanation on bound and it behaves fine with it relaxed. It’s caused several times fake dependency conflicts where I wind up forking wheel and rewriting dependency metadata to be more relaxed to make pip happy. UV doing it by default on
uv add
command would just increase my distrust for upper bounds as meaningful.Even “real” upper bounds of actual bugs/crashes vary in impact. Sometimes upper bound is for actual major change that will lead to crashes most of time. I’ve occasionally seen major libraries do upper bound for an issue that only appears in certain platform (windows for example) and as mac/linux user the upper bound was unnecessary entirely. Other times upper bound only matters for very specific usage patterns of that library my tests confirm is not a concern. It’s difficult to know why upper bound constraint was added unless you read repository history directly and even there often pr adds upper bound with no commit explanation/description.
UV does have advantage of overrides file existing so I can more easily ignore conflicts caused by upper bounds. It does sorta promote though that is a user relies on uv add, that a different user that uses pip where overrides are not thing will have trouble dealing with these assumed upper bounds.
edit:
I have no familiarity with this specific package, but package trying to follow
semvar
means very little in my experience. Different packages have inconsistent opinions of what counts as breaking enough. Many libraries will deprecate/remove uncommonly/rarely used functions in minor versions while claiming breaking changes is for major versions. Or breaking change is a bug fix itself. My own experience is most upgrades work, but upgrades that fail are big toss up in what version change was and often inconsistent with libraries claims of compatibility.The article linked on upper bounds goes into this in more depth that notion of semvar/true breaking changes is very inconsistent in python ecosystem.
Okay, I’ve made a very hacky experimental proof of concept branch on pip side that implements the idea of preferring upper bounded requirements: https://github.com/pypa/pip/compare/main...notatallshaw:pip:backjump-to-upper-bounded
For the set of requirements
click >= 7.0 click-loglevel ~= 0.2 dandi >= 0.24.0 psutil ~= 5.9 pyyaml selenium
on midnight of 1st October 2023 it reduces the number of packages pip has to collect backtracking down from 831 to 111, on my machine it reduces the amount of time pip has to resolve from ~6 mins 20 seconds to ~1 min 20 seconds. Bear in mind this is viapypi-timemachine
which adds a huge amount of latency on to every HTTP call, so in effect for this set of requirements on this date this branch of pip is faster than uv without cache.I did some mild testing for other requirements and found that this branch was at least able to successfully resolve them.
That said, to get this to work for resolvelib I had to add a completely new API, which substantially changes the path of the algorithm, and it would need to be carefully checked it can still soundly resolve any given requirement, I don’t expect I’ll ever be able to contribute this to either the resolvelib or pip project.
Edit: Also I should mention I did have to hack resolvelib into understanding a bit about extras, as part of the core issue here is both the requirement on
urllib3
andurllib3[socks]
. Not sure how you handle extras and optimizations in resolutions.A little bit of attribution, this upper bound idea was recently mentioned by rip devs thinking about it (https://github.com/prefix-dev/rip/issues/191), and the example I give above comes from a real world use case reported to Pip (https://github.com/pypa/pip/issues/12274), although issues with boto3 and many other packages have come up many times, this was fairly easy to reproduce and particularly egregious.
poetry fairly recently adopted the heuristic “first try to decide the package with the most available versions”, motivated by this case
this is the opposite of what the pubgrub paper suggests, some discussion at https://github.com/python-poetry/poetry/pull/8255
(it doesn’t always work, there are still examples where the solver resolves urllib3 before it even learns that boto3 is going to be wanted)