pipenv: Checking if installed packages are up to date is slow and use lots of CPU

I’ve been trying to migrate a project to use Pipenv but I’m slightly blocked on how much longer it takes for Pipenv to check if the installed dependencies are up to date, compared to pointing pip install at a requirements file.

In our setup we run tests inside Docker containers. The image we run the tests on is one that comes pre-installed with the dependencies our project has at the time the image is built. Before we run any tests we then make sure the dependencies are up to date, in case any new dependencies are needed for any new code/tests that might have been added. For this we have just been using pip install -r requirements.txt, which normally completes in around 30 seconds when there’s no new dependencies to install.

I then tried to switch this to Pipenv and pre-installed the dependences in the image using a Pipfile and Pipfile.lock and then running pipenv install --deploy --dev --system against the files. That works fine and I got an image created, but the problem comes to when we want to run tests and want to check if dependencies are up to date first. I’ve done this using the same pipenv install --deploy --dev --system command and instead of 30 seconds it now takes 5 minutes and 30 seconds! On top of that the CPU usage it much, much higher.

I’ve made a small test with the Pipfile and Pipfile.lock we are using (only slightly modified): https://github.com/Tenzer/pipenv-test. Some simple tests that can be run with it is for instance to first install the dependencies and then afterwards check that they are up to date in the local environment, than then see how long time and CPU the second operation takes:

$ docker run -it --rm -v $(pwd):/test python bash
root@9f6ecaf12cf8:/# cd /test
root@9f6ecaf12cf8:/test# pip install pipenv
[...]
root@9f6ecaf12cf8:/test# pipenv install --deploy --dev --system
Installing dependencies from Pipfile.lock (f4e26d)…
Ignoring appnope: markers 'sys_platform == "darwin"' don't match your environment
  🐍   ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 212/212 — 00:02:25
root@9f6ecaf12cf8:/test# time pipenv install --deploy --dev --system
Installing dependencies from Pipfile.lock (f4e26d)…
Ignoring appnope: markers 'sys_platform == "darwin"' don't match your environment
  🐍   ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 212/212 — 00:01:04

real	1m7.166s
user	1m49.520s
sys	0m15.000s

Note that this was run on my laptop rather than our CI system, and with a slightly simpler Pipfile, hence it’s faster than what I described above. It can however be compared to checking if all packages are installed with pip:

root@9f6ecaf12cf8:/test# pip freeze > requirements.txt
root@9f6ecaf12cf8:/test# pip install -r requirements.txt
[...]
real	0m1.836s
user	0m1.610s
sys	0m0.130s

So according to this non-scientific test, Pipenv is taking 36 times as long and using 94 times more CPU than pip.

I know that there’s a big difference between what’s going on under the hood, but my point here is that the vastly longer time and resource usage may be a deal breaker for some with lots of dependencies.

While digging into this, I noticed that Pipenv is spawning one pip process for each package, and I wonder how much of a slowdown that is compared to pip doing everything inside one process. Would it potentially make sense to split the list of dependencies into 16 (or whatever PIPENV_MAX_SUBPROCESS is set to), in order to avoid having to spawn 212 pip processes - like it’s the case here?

It might also be that this is all down to pip and trying to make it faster for the operations that Pipenv runs. I just thought I would start here and see if there perhaps could be some possible optimisations on the Pipenv side of things.

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 14
Comments: 18 (14 by maintainers)

Most upvoted comments

Would adding a flag or environment variable to change the behaviour be acceptable? Meaning that the current behaviour is kept as the default, and then people who want the speed boost instead of the progress bar can switch to using a batched behaviour instead.

It could be thought of as a feature flag and perhaps help assess how big a difference it makes to the package installation speed.

Tenzer on Jul 12, 2018