uv: "Slow" performance resolution for requirements

Honestly, this isn’t that slow, feel free to close this as “working as expected”, but throwing all the difficult real world resolution cases I have at uv this is the slowest set of requirments I have found so far (at least until https://github.com/astral-sh/uv/issues/1342 is fixed):

uv pip compile requirements.in --exclude-newer 2023-10-01T00:00:00Z --no-cache

Where requirements.in is:

click >= 7.0
click-loglevel ~= 0.2
dandi >= 0.24.0
psutil ~= 5.9
pyyaml
selenium

On my machine takes 1m 21s.

If the issue is similiar to why Pip is slow for these requirements then the issue may be that urllib3 2.0 is pinned early in the resolution process and boto3/botocore put an upper bound on urllib3, forcing backtracking on increasingly early versions of boto3/botocore on which there are hundreds to collect.

If uv instead backtracked on urllib3 then it would save having to collect metadata from hundreds of versions of boto3/botocore. I do wonder if there’s some heuristic that could help here.

For Pip I have a speculative solution for encountering this kind of issue https://github.com/pypa/pip/issues/12523, although I have not had chance to make a PR so I don’t yet have evidence it actually helps. And I’ve not looked at uvs resolution process so I don’t know if this idea translates at all.

About this issue

Original URL
State: closed
Created 5 months ago
Reactions: 1
Comments: 40 (11 by maintainers)

Most upvoted comments

uv seems to have made significant speed improvements since this was originally posted, running my original command on different versions of uv today:

0.1.1: ~72s 0.1.29: ~27s 0.1.31: ~11s

That’s a ~6.5x performance improvement since I opened this ticket, with the latest jump coming from https://github.com/astral-sh/uv/pull/2452, that’s very impressive from the uv team!

At this point I’m not sure what to do with this issue, while I feel that resolution could be improved a lot by 1) Intelligently collecting less packages, and 2) Avoiding picking “bad” resolutions, my original example has at least mitigated if not in fact fixed.

Upon finding a good example I will open a new issue for “2)”, so the nuance of what is “bad” can be discussed more directly, and if I can find an even more pathological example of my original requirements I may open a new issue for “1)”.

notatallshaw on Apr 9, 2024

poetry fairly recently adopted the heuristic “first try to decide the package with the most available versions”, motivated by this case this is the opposite of what the pubgrub paper suggests,

The pubgrub paper suggestion is based on advice in “answer set programming”, which itself is generalized common wisdom. Generally when looking at a backtracking algorithm “fewest versions first” is exponentially faster than “most versions first”, but “generally” is doing a lot of work in that statement. For one thing loading in data is it most linearly expensive, but in this use case the linear term is often bigger than the potentially exponential term.

You absolutely want to make sure “no available options” is resolved first, as its conflict. There is no advantage in delaying “one available version” as it’s inevitable. It is easy to generalize that to “fewest versions first”, but beyond those first two cases there’s lots of good room for experimentation.

pubgrub-rs should make it extremely easy for uv to experiment with this based on any index based heuristic it would like. It does not currently provide useful diagnostics about which packages are “causing problems” nor a method for a user to say “let’s try backtracking further than strictly necessary”. I would love to add such API/diagnostics but it still at the brainstorming state.

Eh2406 on Feb 19, 2024

I’d also prefer the default to be to not add upper bounds. If uv’s high level project workflow involves a distinction between applications and libraries, maybe the default of adding upper bounds or not could depend on that (since false positive upper bounds in libraries are often only felt by the rest of the ecosystem, not the authors).

hauntsaninja on Apr 12, 2024

When using uv add foo, for simplicity we’ll foo >=x.y.z,<(x+1).0.0, but you are encouraged to check with the project’s versioning policy.

FYI, adding upper bounds on a library is strongly discouraged in the Python community. You’ve linked this blog post so you probably already know but the tl;dr is:

Capping dependencies has long term negative effects, especially for libraries, and should never be taken lightly

I would be unhappy to use uv as a project management tool if upper bounds were added by default. And in general, while some Python users use project management tools, especially software developer engineer teams, my personal experience is that a lot of Python users are not so advanced in this area and find project management tools too complicated, so none of this will help them.

As a smaller side note, assuming semver for bounds requirements seems pretty unrealistic, I don’t know any Python project that strictly follows semver.

When everybody puts bounds on their dependencies (this kinda applies to both lower and upper bounds), we get false positives (“That library clearly works on numpy 1 and numpy 2, why is this breaking my dependency resolution?”), if nobody puts bounds we will get false negatives (“Why did updating torch suddenly break my app?”). The remedy for too strong bounds are overrides, the remedy for missing bounds are constraint files (or just adding them as direct dependencies). (And before overrides existed, caps couldn’t be fixes by users, so no bounds was the only option really)

I strongly disagree that in general False positives are better than False negatives for many reasons:

Currently the most popular Python package install tools do not support overrides so most users are not helped by this feature
The way uv has implemented overrides I believe is currently unproven (as it’s global rather than package/version specific like in other language package tools), there could be situations where it can not solve the problem without the user pinning most of their library versions of many projects they’ve had to manually figure out, in which case the user may as well just pin everything in your requirement file with “–no-deps”
A library author can not express overrides to their user when it is installed from a wheel, they must tell every user what to override
For complicated dependencies it’s not always clear what override would fix things, it can take a very in depth understanding of the dependency graph
Even for simple dependency graphs the solution is non-obvious to users inexperienced with investigating dependency graphs
What a user means by compatible and what a library author means by compatible are often two different things, and putting extra restrictions on a dependency because it doesn’t work in some cases may cause problems for users who do not rely on those use cases
In general the failure mode of a False positive is the user gets a nasty error from installer and they don’t know how to proceed, where as the failure mode of a False negative is either nothing or their logic doesn’t work and they need to add stricter constraints on their requirements
With sufficiently smart heuristics a resolver may have the ability to avoid a False negative automatically, the resolver can do nothing automatically about False positives

I believe this approach of “let’s be over restrictive” works well in languages that can install multiple versions of the same library (e.g. rust, and nodejs), and is common practise and accepted by the respective ecosystem, but I do not believe it would work well in the Python packaging world.

notatallshaw on Apr 12, 2024

We’ll soon be in the situation that we will have to set defaults in uv, so let me try to answer this as we would in our docs:

When you start a new project or add a new dependency, the default should be starting with the current version, let’s call it x.y.z, and it’s compatible version range. When using uv add foo, for simplicity we’ll foo >=x.y.z,<(x+1).0.0, but you are encouraged to check with the project’s versioning policy.

For the project structure we recommend a ci job minimal-versions, our project templating will generate one. Your policy on bumping minimal version can be different depending on the project, most will bump whenever they want to use a newer feature, while core libraries and frameworks will have policies for wide ranges. If you add compatibility for foo x+1, you are encouraged to keep the old lower if it is compatible, in our example foo >=x.y.z,<(x+2).0.0.

Libraries may have a very wide Python support policy, e.g. over 5 versions of Python, just because an old version of a dependency isn’t compatible with a new version of Python doesn’t mean your users aren’t still on an older version of Python using those older versions

Similar to testing minimal dependency versions, we recommend testing on all python versions you support, or at least min and max versions, so the min version job catches the problem. Our lockfiles will have the ability to model using different versions of a dependency based on the python version if the case arises (but with the recommendation to pick dependencies that support at least as many python versions as you do). The cli will make it easy to test a different version, something like uv run -p 3.9 pytest, and we’ll try to make the new venv creation blazingly fast 😉

Let me go back to the numpy >= 2: The core problem is that a new major release of a fundamental library causes an ecosystem split. Let us generalize this as a library B has a major update and a library A that requires B in version range R. We can model this as a predictor where a given version V, we want to predict the target “A would break” with the predictor “outside the version range R”. If we do this for all libraries A that require B, we get a confusion matrix (sorry for pulling in statistics here but it’s the clearest way for me to express this tradeoff):

True positives: Library A declared as incompatible that would break with V
False positives: Library A declared as incompatible that would be fine with V
True negative: Library A declared as compatible that would be fine with V
False negative: Library A declared as compatible that would break with V

For context, the usual usage is:

True positives: Test says you have a disease and you have that disease (good)
False positives: Test says you have a disease but you don’t have that disease (bad)
True negative: Test says you don’t have a disease and you don’t that disease (good)
False negative: Test says you don’t have a disease but you do have that disease (bad)

When everybody puts bounds on their dependencies (this kinda applies to both lower and upper bounds), we get false positives (“That library clearly works on numpy 1 and numpy 2, why is this breaking my dependency resolution?”), if nobody puts bounds we will get false negatives (“Why did updating torch suddenly break my app?”). The remedy for too strong bounds are overrides, the remedy for missing bounds are constraint files (or just adding them as direct dependencies). (And before overrides existed, caps couldn’t be fixes by users, so no bounds was the only option really)

From my personal development style, i prefer having bounds, even if they are too strong. It means i can generally just update dependencies and only need to worry about incorrect constraints when resolution fails (and i get the offending package in the resolver error). Without bounds, i would have to carefully check all package updates in my dependency trees for compatibility and that we didn’t get into a pathological backtracking case picking wrong versions. I prefer the confidence of maintainer-declared compatibility, even if means more work for overriding this when it fails.

konstin on Apr 12, 2024

No. But it seems a plausible idea.

Apart from anything else there is an argument that backtracking boto3 (say) in the hope of finding a version that doesn’t declare itself incompatible with recent urllib3 (say) is likely to give bad answers - Is ye-olde-version of boto3 really compatible with recent urllib3, or is it just that they didn’t know at the time that it would turn out not to be?

dimbleby on Feb 16, 2024

This issue has fractured into many different discussions, I feel it’s even difficult to respond at this point in a useful way. To summarize the different threads:

uv has got much faster at resolving since it became public
uv might be able to improve further, such as using upper bounds in a conflict as heuristic of a pathological resolution, or running multiple resolutions with different optimistic contraints (etc.), but there are limits to the flexibility of pubgrub-rs
Library authors should put some lower bounds on requirements
There are reasons that putting strong lower bounds can be undesirable, or users are not sufficently educated about the topic
uv might be able to help here in the future as more people understand uv’s different resolution options and/or more tooling is built on top that
Strong disagreement about putting upper bounds on requirements, especially by default, especially especially for library code

Personally, I think it would be more fruitful to continue these discussions in specific issues, as though are brought up, with real examples or as/when uv adds relevant features. So I’m going to close this issue, but if there is disagreement on this I am not going to object to it being reopened.

notatallshaw on Apr 17, 2024

re lower bounds: your tone suggests that you think you are disagreeing with me, but I am not sure this is the case. The points you make are fully consistent with what I wrote.

re upper bounds: I think we are saying nothing new at this point. Adding a bound, or not, is a guess about future compatibility which will certainly sometimes be wrong - with negative consequences either way.

In practice I find that it makes very little difference when dealing with maintained projects - either a bound will be relaxed, or an incompatibility will be fixed. It is the unmaintained projects that are problematic and - as earlier - one should prefer to avoid those anyway.

dimbleby on Apr 13, 2024

On backtracking pains to ancient versions I’d also be happier with lower bounds being auto-added with some heuristic of say oldest version in last ~2/3 years. Lower bounds tend to cause fewer issues. Not always correct, but I’d prefer over automatic upper bounds. Part of it is that very old backtracking is risky in it’s own ways as default.

hmc-cs-mdrissi on Apr 12, 2024

Yeah that specific assumption has tendency to leave poetry based projects as having too tight/fake dependency constraints. I’ve seen a lot of libraries have next major version as upper bound even though it works fine in practice on our unit/integration test suite. I’m very distrustful of python library upper bounds. Sometimes they are real bound based on actual error/issue, but too often I dig into repository and find no explanation on bound and it behaves fine with it relaxed. It’s caused several times fake dependency conflicts where I wind up forking wheel and rewriting dependency metadata to be more relaxed to make pip happy. UV doing it by default on uv add command would just increase my distrust for upper bounds as meaningful.

Even “real” upper bounds of actual bugs/crashes vary in impact. Sometimes upper bound is for actual major change that will lead to crashes most of time. I’ve occasionally seen major libraries do upper bound for an issue that only appears in certain platform (windows for example) and as mac/linux user the upper bound was unnecessary entirely. Other times upper bound only matters for very specific usage patterns of that library my tests confirm is not a concern. It’s difficult to know why upper bound constraint was added unless you read repository history directly and even there often pr adds upper bound with no commit explanation/description.

UV does have advantage of overrides file existing so I can more easily ignore conflicts caused by upper bounds. It does sorta promote though that is a user relies on uv add, that a different user that uses pip where overrides are not thing will have trouble dealing with these assumed upper bounds.

edit:

django-geojson: Breaking changes only seen in major version bumps

I have no familiarity with this specific package, but package trying to follow semvar means very little in my experience. Different packages have inconsistent opinions of what counts as breaking enough. Many libraries will deprecate/remove uncommonly/rarely used functions in minor versions while claiming breaking changes is for major versions. Or breaking change is a bug fix itself. My own experience is most upgrades work, but upgrades that fail are big toss up in what version change was and often inconsistent with libraries claims of compatibility.

The article linked on upper bounds goes into this in more depth that notion of semvar/true breaking changes is very inconsistent in python ecosystem.

hmc-cs-mdrissi on Apr 12, 2024

Okay, I’ve made a very hacky experimental proof of concept branch on pip side that implements the idea of preferring upper bounded requirements: https://github.com/pypa/pip/compare/main...notatallshaw:pip:backjump-to-upper-bounded

For the set of requirements click >= 7.0 click-loglevel ~= 0.2 dandi >= 0.24.0 psutil ~= 5.9 pyyaml selenium on midnight of 1st October 2023 it reduces the number of packages pip has to collect backtracking down from 831 to 111, on my machine it reduces the amount of time pip has to resolve from ~6 mins 20 seconds to ~1 min 20 seconds. Bear in mind this is via pypi-timemachine which adds a huge amount of latency on to every HTTP call, so in effect for this set of requirements on this date this branch of pip is faster than uv without cache.

I did some mild testing for other requirements and found that this branch was at least able to successfully resolve them.

That said, to get this to work for resolvelib I had to add a completely new API, which substantially changes the path of the algorithm, and it would need to be carefully checked it can still soundly resolve any given requirement, I don’t expect I’ll ever be able to contribute this to either the resolvelib or pip project.

Edit: Also I should mention I did have to hack resolvelib into understanding a bit about extras, as part of the core issue here is both the requirement on urllib3 and urllib3[socks]. Not sure how you handle extras and optimizations in resolutions.

notatallshaw on Feb 19, 2024

A little bit of attribution, this upper bound idea was recently mentioned by rip devs thinking about it (https://github.com/prefix-dev/rip/issues/191), and the example I give above comes from a real world use case reported to Pip (https://github.com/pypa/pip/issues/12274), although issues with boto3 and many other packages have come up many times, this was fairly easy to reproduce and particularly egregious.

notatallshaw on Feb 16, 2024

poetry fairly recently adopted the heuristic “first try to decide the package with the most available versions”, motivated by this case

this is the opposite of what the pubgrub paper suggests, some discussion at https://github.com/python-poetry/poetry/pull/8255

(it doesn’t always work, there are still examples where the solver resolves urllib3 before it even learns that boto3 is going to be wanted)

dimbleby on Feb 16, 2024