pipenv: Package locking is crazy slow for scikit-learn
I’m sorry I don’t have a reduced test case for this but this is so crazy slow it’s hard to actually debug.
Steps to reproduce
- Clone https://github.com/HearthSim/hearthsim-vagrant
- Inside that directory, clone https://github.com/HearthSim/HSReplay.net
- Run
docker-compose build django
(this builds an image based on the python-stretch docker image, which will also install the latest pipenv systemwide, cf. Dockerfile). - Finally, run
docker-compose run django
, which runspipenv install --dev
On linux, this stays stuck at Locking [packages]
for over 15 minutes, with no output even when run with --verbose
. Then after ~15 mins, it gives me the full output of what it’s been doing for all that time.
When run outside of docker, it still takes a couple of minutes on that step, but at most 1-2 mins. I have a pretty beefy CPU and SSD, so I don’t know why it would take this long in the first place.
I also see a lot of Warning: Error generating hash for ...
in the verbose output, I don’t know if that’s related.
Any idea? How can I debug this further?
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 10
- Comments: 43 (35 by maintainers)
you shouldn’t be running lock in docker in the first place…
😃
I fully understand what pipenv brings to the table. Just to explain why I’m using it in docker:
With that said, I’m not interested in solving my problem. I solved my problem by adding --skip-lock. I’m interested in solving, or helping solve, the egregious difference in performance between inside and outside of the container. Or at least coming out of this with a “there’s a very good reason for this difference and here it is”.
But yarn is also running inside that same container and managing 1-2 orders of magnitude more dependencies than pipenv, so I think we can do better. And if that takes me PRing setup.py/setup.cfg fixes to 30 different projects so be it 😃
@gsemet Docker is my dev env. I’m not using pipenv in production at the moment (and once I am, I will be following that workflow indeed).
I agree with the premise and with documenting it but let’s keep this issue on topic. I and the whole python community alike all want pipenv to be blazing fast if we’re going to use it daily 😃
Waiting to hear some thoughts re. adding dependencies to the lockfile.
I’ll invite you to look at the environment I posted in the original issue. It is about system dependencies.
Happy to discuss docker further by email if you have questions but at this point I’d like to ask people to keep it out of this particular Github thread and stay on topic.
@jleclanche the thing is, explicitly calling
lock
is a specific instruction to pipenv to inform it that the dependency graph needs to be recalculated. By nature that requires that we ask our index for updated dependencies. If you want to trust the lockfile as is and only install a new package, then you shouldn’t explicitly callpipenv lock
but ratherpipenv install
Now what I said in the paragraph above was that explicit calls to
pipenv lock
are a way of specifically telling pipenv to re-download and recalculate the dependency graph. I’m not completely sure about this so I am going to summon the magical genie @ncoghlan – do you have any thoughts or concerns about storing the dependency graph in a nested format (I know we already agreed to stop doing that), but this time organized hierarchically by top level dependency? You’ve thought a lot more about this than I have; can we safely store some info about top level packages such that we can trust their dependency graph if re-locking wouldn’t update that specific package? That would save the long setup times folks are seeing dealing with ephemeral~/.cache
folders in docker containersThe concern I would have here is if we ever decide to flatten the dependency graph to sub-dependencies, we’re right back to square one.