uv: [perf] uv pip install resolution is slow when installing from VCS
Ciao! I am installing a library from VCS and I feel it should be faster.
The library in question is functime, a forecasting library I maintain. I installed first a version from a PR I am about to merge, but then I realised it’s just as slow if I just install from VSC.
I took the install command from the official pip
docs.
uv pip install --no-cache -- "functime[plot,lgb] @ git+https://github.com/functime-org/functime.git@refs/pull/181/head"
Updated https://github.com/functime-org/functime.git (7c699c2)
Resolved 21 packages in 37.10s
Built functime @ git+https://github.com/functime-org/functime.git@7c699c2118c96a7799b9e2bbb077b25149beeac2
Built lightgbm==4.3.0 Downloaded 21 packages in 1m 49s
Installed 21 packages in 295ms
+ cloudpickle==3.0.0
+ flaml==2.1.2
+ functime==0.9.5 (from git+https://github.com/functime-org/functime.git@7c699c2118c96a7799b9e2bbb077b25149beeac2)
+ holidays==0.47
+ joblib==1.4.0
+ kaleido==0.2.1
+ lightgbm==4.3.0
+ numpy==1.26.4
+ packaging==24.0
+ pandas==2.2.2
+ plotly==5.21.0
+ polars==0.20.22
+ python-dateutil==2.9.0.post0
+ pytz==2024.1
+ scikit-learn==1.4.2
+ scipy==1.13.0
+ six==1.16.0
+ tenacity==8.2.3
+ threadpoolctl==3.4.0
+ tqdm==4.66.2
+ tzdata==2024.1
And how much it takes from regular git repo:
uv pip install --no-cache -- "functime[plot,lgb] @ git+https://github.com/functime-org/functime.git"
Updated https://github.com/functime-org/functime.git (0608c78)
Resolved 21 packages in 37.95s
Built functime @ git+https://github.com/functime-org/functime.git@0608c78118b5defd42df73b812
Built lightgbm==4.3.0
Downloaded 21 packages in 1m 48s
Installed 21 packages in 282ms
+ cloudpickle==3.0.0
+ flaml==2.1.2
+ functime==0.9.5 (from git+https://github.com/functime-org/functime.git@0608c78118b5defd42df73b81288e0e7b32fdb59)
+ holidays==0.47
+ joblib==1.4.0
+ kaleido==0.2.1
+ lightgbm==4.3.0
+ numpy==1.26.4
+ packaging==24.0
+ pandas==2.2.2
+ plotly==5.21.0
+ polars==0.20.22
+ python-dateutil==2.9.0.post0
+ pytz==2024.1
+ scikit-learn==1.4.2
+ scipy==1.13.0
+ six==1.16.0
+ tenacity==8.2.3
+ threadpoolctl==3.4.0
+ tqdm==4.66.2
+ tzdata==2024.1
As a comparison, here is much it takes form PyPI, no cache:
uv pip install --no-cache -- 'functime[plot,lgb]'
Resolved 21 packages in 417ms
Built lightgbm==4.3.0
Downloaded 21 packages in 30.98s
Installed 21 packages in 284ms
+ cloudpickle==3.0.0
+ flaml==2.1.2
+ functime==0.9.5
+ holidays==0.47
+ joblib==1.4.0
+ kaleido==0.2.1
+ lightgbm==4.3.0
+ numpy==1.26.4
+ packaging==24.0
+ pandas==2.2.2
+ plotly==5.21.0
+ polars==0.20.22
+ python-dateutil==2.9.0.post0
+ pytz==2024.1
+ scikit-learn==1.4.2
+ scipy==1.13.0
+ six==1.16.0
+ tenacity==8.2.3
+ threadpoolctl==3.4.0
+ tqdm==4.66.2
+ tzdata==2024.1
About this issue
- Original URL
- State: open
- Created 2 months ago
- Comments: 18 (12 by maintainers)
Yes, you could use sparse checkout here effectively to speed things up quite substantially. Normally you’d do in the CLI in the following way:
git clone --filter=blob:none --depth 1 --sparse git@github.com:functime-org/functime.git
pyproject.toml
’s inside the cloned repo:git sparse-checkout set '**/pyproject.toml'
Note the first
--sparse
is needed to keep the repo with only top-level content.--filter=blob:none
is still needed to keep blobs out (this is what pip does) and saves a decent amount of space.--depth=1
or shallow is still nice to keep as it saves a more space depending on the history size.After that, you do
git sparse-checkout
to filter to keep only all thepyproject.toml
’s in the repo (if any).This recudes the clone size of functime down to
Receiving objects: 100% (11/11), 15.56 KiB | 1.56 MiB/s, done.
Then with the sparse checkout reduces the functime repo to virtually 3.4k since there’s a single pyproject.toml.I’ve been using a similar technique personally on very large repos at work in CI/CD for quite some time.
If we have a Git dependency, the first step is that we need to clone the repo. Then we read the
pyproject.toml
if it exists. Perhaps in theory we could try to only clonepyproject.toml
(we can’t know whether it exists in advance, but we could try), I don’t know if it’s even possible fetch a single file from Git though. Maybe if we know it’s on GitHub, we could try to add a fast path for it by downloading the file directly rather than using a Git client.Yeah, not sure how much performance impact this will have in general, but one advantage of enabling it is that it is well tested in pip as it has been enabled in for ~2.5 years now.
For that particular repo (functime.git @ main), it seems to be downloading 285 MB for a regular clone and 231 MB for filter=blob:none. That’s a surprisingly small difference.
(But I confirmed that we do read the metadata from
pyproject.toml
, and we don’t build the wheel, which is good.)Mmm I think for me basically the entire time is spent cloning the repo. That’s a bummer. Maybe a datapoint for @ibraheemdev when it comes to seeing if we can make clones any faster.
Ahh I see! Let me take a look – we should be able to clone the repo and read the metadata without building the wheel in this case. (But we do need to clone it, we don’t do selective reads (e.g., just checkout the
pyproject.toml
) from Git.)Ciao Charlie, thank you very much for the prompt reply.
Yes, we have some Rust plugins for Polars!
Indeed! I should’ve been more precise, sorry. What bugged me was the resolution time:
From VCS:
From PyPI:
You say that in the first case it’s 38s because it has to download and build the binary? Couldn’t
uv
try to fetchpyproject.toml
to perform resolution first? I guess the overall time would not change, since build would need to happen anyway.Feel free to close the issue.