statsmodels: missing `requires_dist` in PyPI JSON makes installation slow
Bug description
Python package managers like poetry (and presumably pip now that it resolves dependencies as of v20.3) depend on packages listing their dependencies in the requires_dist section of the JSON returned by the PyPI API. Most packages provide this information, but the statsmodels package does not.
> curl -s https://pypi.org/pypi/scipy/json | jq ".info.requires_dist"
[
"numpy (<1.25.0,>=1.17.3)"
]
> curl -s https://pypi.org/pypi/patsy/json | jq ".info.requires_dist"
[
"six",
"numpy (>=1.4)",
"pytest ; extra == 'test'",
"pytest-cov ; extra == 'test'",
"scipy ; extra == 'test'"
]
> curl -s https://pypi.org/pypi/statsmodels/json | jq ".info.requires_dist"
null
Without this information, package mangers must download and inspect the whls of every version of statsmodels in order to extract their dependencies and then resolve them against any current constraints. This can make installation of statsmodels very slow; it took me ~2.5 hours to resolve its dependencies against the others in my environment using poetry.
References
See https://github.com/aws/aws-cli/issues/5701 for a similar situation in which this was a problem. See the poetry documentation and https://github.com/python-poetry/poetry/issues/2094 for a more detailed explanation of why this is a problem for poetry. See https://github.com/pypa/pip/issues/9187#issuecomment-736318672 for a discussion of why this is a problem for pip.
Cause of bug
Based on steps 6 and 8 of the statsmodels team maintainer notes, it seems like the dev team uses twine to upload an sdist to PyPI before it uploads its wheels, so https://github.com/pypa/twine/issues/761 may be the cause of the issue.
The solution in that case is pretty simple: just change the order. Upload the wheels before the sdists, instead.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 16 (9 by maintainers)
Looks to be fixed by releasing 0.13.5.
just to the history
I ran into the problem more like around 2013 (I think that’s around the time pip started to replace easyinstall). https://stackoverflow.com/questions/15280896/how-to-prevent-tox-from-deleting-installed-packages oldest issue comment I find https://github.com/statsmodels/statsmodels/issues/1267#issuecomment-31142004
At that time I had 3 to 5 local virtualenv to do all the testing and debugging of different versions of dependencies. numpy and especially scipy where still much more buggy and less stable at the time.
All this has changed a lot in the last 9 years