statsmodels: missing `requires_dist` in PyPI JSON makes installation slow

Bug description

Python package managers like poetry (and presumably pip now that it resolves dependencies as of v20.3) depend on packages listing their dependencies in the requires_dist section of the JSON returned by the PyPI API. Most packages provide this information, but the statsmodels package does not.

> curl -s https://pypi.org/pypi/scipy/json | jq ".info.requires_dist"
[
  "numpy (<1.25.0,>=1.17.3)"
]
> curl -s https://pypi.org/pypi/patsy/json | jq ".info.requires_dist"
[
  "six",
  "numpy (>=1.4)",
  "pytest ; extra == 'test'",
  "pytest-cov ; extra == 'test'",
  "scipy ; extra == 'test'"
]
> curl -s https://pypi.org/pypi/statsmodels/json | jq ".info.requires_dist"
null

Without this information, package mangers must download and inspect the whls of every version of statsmodels in order to extract their dependencies and then resolve them against any current constraints. This can make installation of statsmodels very slow; it took me ~2.5 hours to resolve its dependencies against the others in my environment using poetry.

References

See https://github.com/aws/aws-cli/issues/5701 for a similar situation in which this was a problem. See the poetry documentation and https://github.com/python-poetry/poetry/issues/2094 for a more detailed explanation of why this is a problem for poetry. See https://github.com/pypa/pip/issues/9187#issuecomment-736318672 for a discussion of why this is a problem for pip.

Cause of bug

Based on steps 6 and 8 of the statsmodels team maintainer notes, it seems like the dev team uses twine to upload an sdist to PyPI before it uploads its wheels, so https://github.com/pypa/twine/issues/761 may be the cause of the issue.

The solution in that case is pretty simple: just change the order. Upload the wheels before the sdists, instead.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 16 (9 by maintainers)

Most upvoted comments

Looks to be fixed by releasing 0.13.5.

 curl -s https://pypi.org/pypi/statsmodels/json | jq ".info.requires_dist"
[
  "pandas (>=0.25)",
  "patsy (>=0.5.2)",
  "packaging (>=21.3)",
  "scipy (>=1.3) ; (python_version > \"3.9\" or platform_system != \"Windows\" or platform_machine != \"x86\") and python_version < \"3.12\"",
  "numpy (>=1.17) ; python_version != \"3.10\" or platform_system != \"Windows\" or platform_python_implementation == \"PyPy\"",
  "numpy (>=1.22.3) ; python_version == \"3.10\" and platform_system == \"Windows\" and platform_python_implementation != \"PyPy\"",
  "scipy (<1.8,>=1.3) ; python_version == \"3.7\"",
  "scipy (<1.9,>=1.3) ; python_version == \"3.8\" and platform_system == \"Windows\" and platform_machine == \"x86\"",
  "scipy (<1.9,>=1.3) ; python_version == \"3.9\" and platform_system == \"Windows\" and platform_machine == \"x86\"",
  "cython (>=0.29.32) ; extra == 'build'",
  "cython (>=0.29.32) ; extra == 'develop'",
  "cython (<3.0.0,>=0.29.32) ; extra == 'develop'",
  "setuptools-scm[toml] (~=7.0.0) ; extra == 'develop'",
  "oldest-supported-numpy (>=2022.4.18) ; extra == 'develop'",
  "matplotlib (>=3) ; extra == 'develop'",
  "colorama ; extra == 'develop'",
  "joblib ; extra == 'develop'",
  "Jinja2 ; extra == 'develop'",
  "pytest (~=7.0.1) ; extra == 'develop'",
  "pytest-randomly ; extra == 'develop'",
  "pytest-xdist ; extra == 'develop'",
  "flake8 ; extra == 'develop'",
  "isort ; extra == 'develop'",
  "pywinpty ; (os_name == \"nt\") and extra == 'develop'",
  "sphinx ; extra == 'docs'",
  "nbconvert ; extra == 'docs'",
  "jupyter-client ; extra == 'docs'",
  "ipykernel ; extra == 'docs'",
  "matplotlib ; extra == 'docs'",
  "nbformat ; extra == 'docs'",
  "numpydoc ; extra == 'docs'",
  "pandas-datareader ; extra == 'docs'"
]

just to the history

I ran into the problem more like around 2013 (I think that’s around the time pip started to replace easyinstall). https://stackoverflow.com/questions/15280896/how-to-prevent-tox-from-deleting-installed-packages oldest issue comment I find https://github.com/statsmodels/statsmodels/issues/1267#issuecomment-31142004

At that time I had 3 to 5 local virtualenv to do all the testing and debugging of different versions of dependencies. numpy and especially scipy where still much more buggy and less stable at the time.

All this has changed a lot in the last 9 years