pants: Pants test timeout includes venv creation overhead causing timeouts

Describe the bug When a python_tests has heavy dependencies (e.g. pytorch) a test could timeout on a “cold” execution of the test, as during first execution of the test some additional overhead actions are performed to create the venv. This can be significant, we observe a ‘test time’ >30s for a test that takes 3s with a warm cache.

One solution would be to exclude this additional venv creation time for the test timeout, and only measure the actual execution time of the python process. This would make the timeout argument for python_tests more useful when large complex dependencies are used.

Not sure if there is a performance component that can be solved, potentially that should then be an issue in the pex repository.

Pants version 2.16.0 and 2.17.0rc5

OS MacOS + Linux

Additional info See gist for example: https://gist.github.com/JoostvDoorn/9c0f63ed5198544a36b477502eeac4fb

To test:

rm -rf ~/.cache/pants/named_caches/
pants test :: --keep-sandboxes=on_failure --test-force
✓ //test_file.py:root succeeded in 23.09s.

Second execution using pants test :: --keep-sandboxes=on_failure --test-force is significantly faster:

✓ //test_file.py:root succeeded in 3.12s.

The bug here is that the test execution timing should reflect the actual time it takes to execute the test, and not the creation of the virtual environment.

For the gist example the expectation is that would always work regardless of the state of the cache:

python_tests(
    name="root",
    timeout=15,
)

About this issue

  • Original URL
  • State: open
  • Created 10 months ago
  • Comments: 28 (24 by maintainers)

Commits related to this issue

Most upvoted comments

Ok, the lion’s share (~90%), of the remaining time is creating the (compressed) --layout packed wheel zips.

  1. Avoid re-packing:
$ rm -rf ~/.pex/{bootstrap_zips,built_wheels,installed_wheels,interpreters,isolated,venvs}
$ du -shc ~/.pex/*
709M    /home/jsirois/.pex/downloads
709M    /home/jsirois/.pex/packed_wheels
743M    /home/jsirois/.pex/pip
2.2G    total
$ time pex -v --exe exe.py --lock lock.json --python python3.10 --layout packed -o example.pex
...
pex:         Installing 58 distributions: 10896.7ms
Saving PEX file to example.pex
Previous binary unexpectedly exists, cleaning: example.pex

real    0m12.915s
user    0m42.100s
sys     0m5.047s
  1. Force re-packing with compression (the default - Pants does this):
$ rm -rf ~/.pex/{bootstrap_zips,built_wheels,installed_wheels,interpreters,isolated,venvs,packed_wheels}
$ du -shc ~/.pex/*
709M    /home/jsirois/.pex/downloads
743M    /home/jsirois/.pex/pip
1.5G    total
$ time pex -v --exe exe.py --lock lock.json --python python3.10 --layout packed -o example.pex
...
pex:         Installing 58 distributions: 10871.3ms
Saving PEX file to example.pex

real    1m8.286s
user    1m35.345s
sys     0m6.424s
  1. Force re-packing without compression:
$ rm -rf ~/.pex/{bootstrap_zips,built_wheels,installed_wheels,interpreters,isolated,venvs,packed_wheels}
$ du -shc ~/.pex/*
709M    /home/jsirois/.pex/downloads
743M    /home/jsirois/.pex/pip
1.5G    total
$ time pex -v --exe exe.py --lock lock.json --python python3.10 --layout packed -o example.pex --no-compress
...
pex:         Installing 58 distributions: 10971.2ms
Saving PEX file to example.pex

real    0m16.502s
user    0m45.362s
sys     0m6.502s

So the lion’s share of the overall time is dominated by the long pole of the download of torch in a parallel download of all needed wheels and the zip compression of packed wheel zips. The latter is serial; so that could be parallelized - but stepping back and trying a native zip of just the torch installed wheel shows:

$ time zip -r --quiet torch-packed.zip ~/.pex/installed_wheels/8ced00b3ba471856b993822508f77c98f48a458623596a4c43136158781e306a/torch-2.0.1-
cp310-cp310-manylinux1_x86_64.whl/

real    0m47.460s
user    0m46.773s
sys     0m0.680s

So there is not much to be gained there in this case.

Ok, makes sense from a clean Pex cache. I’m not sure there is any way to speed up venv creation from a cold start save for populating the venv site-packages in parallel. That step is currently serial. I can give that a quick try in the am and report timing differences for your requirement set case in the gist.