pipenv: Package locking is crazy slow for scikit-learn

I’m sorry I don’t have a reduced test case for this but this is so crazy slow it’s hard to actually debug.

Steps to reproduce

Clone https://github.com/HearthSim/hearthsim-vagrant
Inside that directory, clone https://github.com/HearthSim/HSReplay.net
Run docker-compose build django (this builds an image based on the python-stretch docker image, which will also install the latest pipenv systemwide, cf. Dockerfile).
Finally, run docker-compose run django, which runs pipenv install --dev

On linux, this stays stuck at Locking [packages] for over 15 minutes, with no output even when run with --verbose. Then after ~15 mins, it gives me the full output of what it’s been doing for all that time. When run outside of docker, it still takes a couple of minutes on that step, but at most 1-2 mins. I have a pretty beefy CPU and SSD, so I don’t know why it would take this long in the first place.

I also see a lot of Warning: Error generating hash for ... in the verbose output, I don’t know if that’s related.

Any idea? How can I debug this further?

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 10
Comments: 43 (35 by maintainers)

Most upvoted comments

you shouldn’t be running lock in docker in the first place…

+48

kennethreitz on Mar 19, 2018

Cheers @jleclanche I used to play hearthstone in its early days before there was much tooling around it so thanks for building some community tools!

😃

That’s the whole purpose here— you already have an isolated python environment in a virtualenv which is managed by pipenv, so it can handle dependency graphs across platforms, os and whatnot. S

I fully understand what pipenv brings to the table. Just to explain why I’m using it in docker:

I need docker because the stack I’m running locally is complex. It’s not just a single python app, it’s a python app, database, mock servers, redis server, etc. All these need to be available, cross-platform, consistently between all devs on the team. Docker solves that.
I need (want) pipenv because I need (want) to track my dependencies in a pipfile, rather than requirements.txt. That is to say, I’m moving the app to pipenv anyway. So now my choice is to either duplicate the dependencies, or use pipenv consistently in docker as well.

With that said, I’m not interested in solving my problem. I solved my problem by adding --skip-lock. I’m interested in solving, or helping solve, the egregious difference in performance between inside and outside of the container. Or at least coming out of this with a “there’s a very good reason for this difference and here it is”.

But yarn is also running inside that same container and managing 1-2 orders of magnitude more dependencies than pipenv, so I think we can do better. And if that takes me PRing setup.py/setup.cfg fixes to 30 different projects so be it 😃

+10

jleclanche on Mar 20, 2018

@gsemet Docker is my dev env. I’m not using pipenv in production at the moment (and once I am, I will be following that workflow indeed).

I agree with the premise and with documenting it but let’s keep this issue on topic. I and the whole python community alike all want pipenv to be blazing fast if we’re going to use it daily 😃

Waiting to hear some thoughts re. adding dependencies to the lockfile.

jleclanche on Mar 25, 2018

I’d like to better understand what the “docker as dev env” approach has that a traditional pipenv/virtualenv does not provide already

I’ll invite you to look at the environment I posted in the original issue. It is about system dependencies.

Happy to discuss docker further by email if you have questions but at this point I’d like to ask people to keep it out of this particular Github thread and stay on topic.

jleclanche on Mar 25, 2018

@jleclanche the thing is, explicitly calling lock is a specific instruction to pipenv to inform it that the dependency graph needs to be recalculated. By nature that requires that we ask our index for updated dependencies. If you want to trust the lockfile as is and only install a new package, then you shouldn’t explicitly call pipenv lock but rather pipenv install

Now what I said in the paragraph above was that explicit calls to pipenv lock are a way of specifically telling pipenv to re-download and recalculate the dependency graph. I’m not completely sure about this so I am going to summon the magical genie @ncoghlan – do you have any thoughts or concerns about storing the dependency graph in a nested format (I know we already agreed to stop doing that), but this time organized hierarchically by top level dependency? You’ve thought a lot more about this than I have; can we safely store some info about top level packages such that we can trust their dependency graph if re-locking wouldn’t update that specific package? That would save the long setup times folks are seeing dealing with ephemeral ~/.cache folders in docker containers

The concern I would have here is if we ever decide to flatten the dependency graph to sub-dependencies, we’re right back to square one.

techalchemy on Mar 26, 2018