uv: Small reads in uv_distribution::distribution_database::download producing v. slow download performance when installing torch
Hey, this is a successor to https://github.com/astral-sh/uv/pull/1978 as I’m still trying to integrate uv
with Modal’s internal mirror. But this time I don’t yet have a solution to this problem 🙂.
I’m observed extremely degraded install performance when installing pytorch against our mirror. The install command is uv pip install torch
.
On my test machine, installing without our mirror is fast, taking around 7 seconds.
But when I get our mirror involved, performance tanks and an install takes over 5 minutes. The command I’m running is this:
RUST_LOG="uv=trace,uv_extrace=trace,async_zip=debug" \
UV_NO_CACHE=true UV_INDEX_URL="http://172.21.0.1:5555/simple" \
uv --verbose pip install torch
Investigating…
Using debug logs on our mirror I see that there is a vast (~64x) difference in read sizes between uv pip install
and pip install
.
Our mirror serves .whl
files off disk like this:
use hyper::{Body, Response};
use tokio_util::codec::{BytesCodec, FramedRead};
// ...
let file tokio::fs::File::open(&filepath).await?;
debug!("cache hit: serving {} from disk cache", filepath.display());
let size = file.metadata().await.map(|m| Some(m.len())).unwrap_or(None);
let stream = FramedRead::new(file, BytesCodec::new());
let body = Body::wrap_stream(stream);
let mut response = Response::new(body);
By enabling debug logs (RUST_LOG=debug
) I can see that when using pip install
read chunks are large:
We can see in the screenshot of the debug logs that reads up to 128KiB are happening. Performance of the pip install
command is good, the whl download takes around 3 seconds.
The pip
command here was
PIP_TRUSTED_HOST="172.21.0.1:5555" \
PIP_NO_CACHE_DIR=off PIP_INDEX_URL="http://172.21.0.1:5555/simple" \
pip install torch
and pip --version
is pip 23.3.1 from /home/ubuntu/modal/venv/lib/python3.11/site-packages/pip (python 3.11)
.
The read sizes of pip
contrast strongly with uv pip install
, where I’m seeing read sizes between 11
and 1120
bytes.
I believe these relatively tiny read sizes when downloading are contributing to the slow download performance on the 720.55MiB torch .whl
file.
Looking at the trace
logs this is the region of code I’m in: https://github.com/astral-sh/uv/blob/043d72646d37a5d740edb2c68ebd26a78dc5eb08/crates/uv-distribution/src/distribution_database.rs#L161
uv
version
uv --version
uv 0.1.14
OS
uname -a
Linux ip-10-1-1-198 5.15.0-1052-aws #57~20.04.1-Ubuntu SMP Mon Jan 15 17:04:56 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
About this issue
- Original URL
- State: closed
- Created 4 months ago
- Comments: 28 (26 by maintainers)
Commits related to this issue
- fix: #2220 — committed to thundergolfer/uv by thundergolfer 4 months ago
- fix: #2220 — committed to thundergolfer/uv by thundergolfer 4 months ago
- Address #2220 (slow download perf against PyPi mirror) (#2319) ## Summary Addressing the extremely slow performance detailed in https://github.com/astral-sh/uv/issues/2220. There are two changes ... — committed to astral-sh/uv by thundergolfer 4 months ago
I tested the fix on my slow large wheel and it works great. Installation time is now fast for that 1 wheel and I guess my index server had same issue here. Thanks to both of you for continuing to improve uv’s performance.
By the way, I can get this released pretty quickly.
Makes sense. Looks like I could even check this myself by reading
BENCHMARKS.md
and going from there?Think this is a enough to make a PR. It’d be two things:
identity
, in the spirit of https://github.com/pypa/pip/pull/1688I think this is another
content-encoding
related issue.I added some extra
trace!
logs into hyper when it was runningencode_headers
and deciding on the encoding strategy.uv
pip
Seeing
gzip
in the headers cued me to think this was just the same issue as https://github.com/astral-sh/uv/pull/1978 showing up in a different place.By doing the same header change that’s done in https://github.com/astral-sh/uv/pull/1978 torch installs much, much faster against the mirror!
A 15.64s second download against the mirror is still slower than what
uv
gets against PyPI (around 10s) so there may be some additional change I can make to get performance of our mirror to parity with PyPi. It really should be faster as it’s serving over the local network.Checking the
pip
repository for its history of dealing withcontent-encoding
I see this https://github.com/pypa/pip/pull/1688We don’t use range requests for the wheel download IIRC, just for fetching wheel metadata, so hopefully not a factor here.