pip: Build dependencies doesn't use correct pinned version, installs numpy twice during build-time
Environment
- pip version: 21.0.1
- Python version: 3.9.0
- OS: linux
Description
Using pyproject.toml build-dependencies installs the latest version of a library, even if the same pip command installs a fixed version. in very some cases (binary compilation) this can lead to errors like the below when trying to import the dependency.
RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "xxx/.venv/lib/python3.9/site-packages/utils_find_1st/__init__.py", line 3, in <module>
from .find_1st import find_1st
ImportError: numpy.core.multiarray failed to import
Expected behavior
Build process should use the pinned version of numpy (1.19.5) instead of the latest version (1.20.0 at time of writing). This way, the installation process will be coherent, and problems like this are not possible.
How to Reproduce
- create new environment
- install numpy and py_find_1st (both with pinned dependencies)
python -m venv .venv
. .venv/bin/activate
pip install -U pip
pip install --no-cache numpy==1.19.5 py_find_1st==1.1.4
python -c "import utils_find_1st"
# To make the above work, upgrade numpy to the latest version (which is the one py_find_1st is compiled against).
pip install -U numpy
Output
$ python -m venv .venv
$ . .venv/bin/activate
$ pip install -U pip
Collecting pip
Using cached pip-21.0.1-py3-none-any.whl (1.5 MB)
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 20.2.3
Uninstalling pip-20.2.3:
Successfully uninstalled pip-20.2.3
Successfully installed pip-21.0.1
$ pip install --no-cache numpy==1.19.5 py_find_1st==1.1.4
Collecting numpy==1.19.5
Downloading numpy-1.19.5-cp39-cp39-manylinux2010_x86_64.whl (14.9 MB)
|████████████████████████████████| 14.9 MB 10.4 MB/s
Collecting py_find_1st==1.1.4
Downloading py_find_1st-1.1.4.tar.gz (8.7 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing wheel metadata ... done
Building wheels for collected packages: py-find-1st
Building wheel for py-find-1st (PEP 517) ... done
Created wheel for py-find-1st: filename=py_find_1st-1.1.4-cp39-cp39-linux_x86_64.whl size=30989 sha256=c1fa1330f733111b2b8edc447bec0c54abf3caf79cd5f386f5cbef310d41885c
Stored in directory: /tmp/pip-ephem-wheel-cache-94uzfkql/wheels/1e/11/33/aa4db0927a22de4d0edde2a401e1cc1f307bc209d1fdf5b104
Successfully built py-find-1st
Installing collected packages: numpy, py-find-1st
Successfully installed numpy-1.19.5 py-find-1st-1.1.4
$ python -c "import utils_find_1st"
RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/xmatt/development/cryptos/freqtrade_copy/.venv/lib/python3.9/site-packages/utils_find_1st/__init__.py", line 3, in <module>
from .find_1st import find_1st
ImportError: numpy.core.multiarray failed to import
In verbose mode, the installation of numpy 1.20.0 can be observed, however, even with “-v”, the output is VERY verbose.
....
changing mode of /tmp/pip-build-env-js9tatya/overlay/bin/f2py3.9 to 755
Successfully installed numpy-1.20.0 setuptools-52.0.0 wheel-0.36.2
Removed build tracker: '/tmp/pip-req-tracker-9anxsz9d'
Installing build dependencies ... done
....
An attached version can be found below (created with pip install --no-cache numpy==1.19.5 py_find_1st==1.1.4 -v &> numpy_install.txt).
About this issue
- Original URL
- State: open
- Created 3 years ago
- Reactions: 18
- Comments: 35 (16 by maintainers)
Links to this issue
Commits related to this issue
- use oldest-supported-numpy as build requirements Otherwise due to a pip-bug, at build time, the latest numpy is installed, which can break if the environment uses a older pinned version Explanation:... — committed to xmatthias/py_find_1st by xmatthias 3 years ago
- Use oldest-supported-numpy as build dependency Fixes incompatible numpy versions build vs runtime, as NumPy v1.20 is binary incompatible with older versions. See https://github.com/pypa/pip/issues/9... — committed to lrvdijk/hdmedians by lrvdijk 3 years ago
- Use oldest-supported-numpy as build dependency Fixes build errors due to incompatible NumPy versions. See https://github.com/pypa/pip/issues/9542 — committed to broadinstitute/StrainGE by lrvdijk 3 years ago
- Require oldest-supported-numpy when building This doesn't actually fix the problem I am seeing, but is a useful change https://github.com/pypa/pip/issues/9542 — committed to DougBurke/BHJet by DougBurke 2 years ago
- Add workaround to fix invalid build results due to a bug in pip Pip installs the deps as specified in the requirements file. When the cython modules are built, however, pip installs the latest versio... — committed to UM-RMRS/raster_tools by fbunt 2 years ago
- Fix compatibility issue with NumPy API There is a potential API compatibility issue with the NumPy. When building binary wheels, the latest NumPy is used. However, the user may already have installed... — committed to lofar-astron/PyBDSF by gmloose 2 years ago
- Fix numpy api compatibility issue (#175) * Fix compatibility issue with NumPy API There is a potential API compatibility issue with the NumPy. When building binary wheels, the latest NumPy is used... — committed to lofar-astron/PyBDSF by gmloose 2 years ago
- Depend on oldest-supported-numpy metapackage Encountered numpy ABI incompatibility during pip install, [this issue](https://github.com/pypa/pip/issues/9542), and resolved using oldest-supported-numpy... — committed to agriff86/rd-deconvolve by agriff86 2 years ago
- Fix our pyproject.toml to deal with the numpy build issue. Basically what happens is this: pyproject.toml specifies what versions of each library we need to build against. The requirements.txt file d... — committed to APrioriInvestments/typed_python by braxtonmckee 2 years ago
- Fix our pyproject.toml to deal with the numpy build issue. Basically what happens is this: pyproject.toml specifies what versions of each library we need to build against. The requirements.txt file d... — committed to APrioriInvestments/typed_python by braxtonmckee 2 years ago
- tests: build isolation issues with C/C++ ABI dependencies This commit provides a simple test that demonstrates the issues a resolver-unaware build isolation imposes on packages with C/C++ ABI depende... — committed to d1saster/pip by d1saster a year ago
Neither of those suggestions work super cleanly and are actually more difficult to understand and explain than “isolated builds are isolated”. As of today, you have 2 options: carefully pin the build dependencies, or tell pip to not do build isolation (i.e. you’ll manage the build dependencies in the environment).
Beyond that, I’m not excited by the idea of additional complexity in the dependency resolution process, that makes isolated builds depend on existing environment details – both of your suggestions require adding additional complexity to the already NP-complete problem of dependency resolution, and that code is already complex enough. And, they’re “solutions” operating with incomplete information which will certainly miss certain usecases (eg: custom compiled package that wasn’t installed via a wheel).
At the end of the day, pip isn’t going to be solving every use case perfectly, and this is one of those imperfect cases at the moment. For now, that means additional work on the user’s side, and I’m fine with that because we don’t have a good way to have the user communicate the complete complexity of build dependencies to pip.
Even with build-isolation, it’s at least downloaded twice (so it’s a bug in pip) which wouldn’t be necessary.
Both 1.19.5 and 1.20.0 are perfectly valid numpy versions to satisfy the build-dependencies, so if i instruct pip to donwload 1.19.5 - why download 1.20.0 too (and on top of that, cause a potential build-compatibility issue alongside that).
edit: I think there should be the following behaviour:
numpy==1.20.0) - then the build-installation should use that dependency, and install whatever is given otherwise in the “regular” environment.numpy>=1.13.0) - it should use as build-dependency what’s installed in the same command - and ONLY fall back to the latest version if that dependency is not correctly installed to begin with.Here’s a similar, but slightly different case:
numpyin it’s pyproject.toml as it’s only a runtime dependencynumpy==1.19.4in a requirements.txtnumpyas a build dependency (has C code in the package).numpyversionsThe above scenario produces the same results where the pinned version of
numpy==1.19.4in Package A is not used to build the dependency Package B that does neednumpy. Same error results.I had the same issue today. Since the release of numpy 1.20.0 yesterday, there is a new dimension to this problem.
For instance, I (mostly my users and CI services) usually install the package dclab with
dclab comes with a few cython extensions that need to be built during installation, which is a perfectly normal use-case. This is not one of those imperfect cases.
Now, the problem is that during installation of dclab, pip downloads numpy 1.20.0 and builds the extensions. But in the environment
env, pip installs numpy 1.19.5 (pinned by tensorflow). When I then try to import dclab, I get this error (GH Actions):As far as I can see, I have only three choices:
oldest-supported-numpyin pyproject.toml (which actually works for dclab)--no-build-isolation, which I cannot really expect from my users.The best solution to this problem, as far as I can see, would be for pip to be smart about choosing which version of the build dependency in pyproject.toml to install:
pip installcommand already came up with a certain version range for numpy, use the highest available version (e.g.pip install dclab[all] tensorflowwould tell pip that tensorflow needs numpy 1.19.5 and so it makes sense to use that when building the extensions)I know that pinning versions is not good, but tensorflow is doing it apparently, and many people use tensorflow.
[EDIT: found out that oldest-supported-numpy works for me]
numpy has a mechanism for dealing with this situation, that I mentioned in the first comment for this issue:
oldest-supported-numpy. Anyone who isn’t using that, use that and please push your upstream libraries use that as well.Other than that, I’ve also stated that this is a very difficult problem theoretically; even if we ignore the implementation complexities here. Looking at the votes on that comment BTW, I should note that I’d be very happy to receive a bunch of PRs that solve this without creating a special case for numpy. I think you can also submit that as your PhD thesis, while you’re at it. 😃
No, sadly this isn’t possible today, mostly because no one has stepped up to design + implement this. I’m pretty sure this has been discussed elsewhere in this tracker back when PEP 518 was being implemented, but I can’t find it now. 😦
So as a followup to my above comment, is there a way as a consumer of package B that I do not maintain but do depend on to control what version of a build dependency gets used by
pipin the isolated build? Concretely, is there actually any way to control which version ofnumpyis used in the isolated build env for a package that listsnumpyas a build dependency in its pyproject.toml?@rgommers OK, fair enough. But I still think it’s right for build isolation to be the default, and where non-isolated builds are better, people should opt in. That’s the comment I was uncomfortable with. I agree 100% that we need better protection for users who don’t have the knowledge to set things up for non-isolated builds so that they don’t get dumped with a complex build they weren’t expecting. But I don’t want anyone to have to install setuptools before they can install some simple pure-python package that the author just hasn’t uploaded a wheel for.
There hasn’t been much discussion in this issue lately, but for future reference I want to add that this is not only an issue for numpy and its ecosystem of dependent packages, but also for other packages. In helmholtz-analytics/mpi4torch#7 we face a similar issue with pytorch and I don’t think that the purported solution of creating a meta package like
oldest-supported-numpywould rectify the situation in our case, simply since pytorch is much more lenient regarding API/ABI compatibility across versions. So for me this issue mostly reads like “current build isolation implementation in pip breaks C/C++ ABI dependencies across different packages.”To be fair, pip’s behavior probably is fully PEP517/518 compliant, since these PEPs only specify “minimal build dependencies” and how to proceed with building a single package. What we are asking for is more: We want pip to install “minimal build dependencies compatible with the to-be installed set of other packages”.
This got me thinking that given pip calls itself to install the build dependencies in build_env.py, couldn’t one add sth. like “weak constraints” (weak in the sense that build dependencies according to PEP517/518 always take precedence) that contain the selected version-specified set of the other to-be-installed packages?
However, and that is probably where the snake bites its tail, the build environments AFAIK already need to be prepared for potential candidates of to-be-installed packages? As such we would not have the final set of packages available, and even for simply iterating over candidate sets one cannot only anticipate that this can become expensive, but there are probably some nasty corner cases. @pradyunsg Is this the issue you are refering to in your comment? If so, do you have an idea on how to fix this?
Structurally this needs better tooling as either the build time numpy needs to be pinned low, or the build process needs to generate wheels with updated requirements
However none of the tools is something in pip, this is a topic for numpy, setuptools and the build backends as far as I can tell
No additional metadata is needed I believe. Right now, this example from the issue description:
should error out if the runtime dependencies are correct in the
py_find_1stwheel - with an understandable error message. Something like “numpy==1.19.5andnumpy>=1.24.1(coming frompy_find_1st==1.1.4) constraints are incompatible”. Bonus points for pointing to the two possible solutions: usingPIP_CONSTRAINTor removing the explicit==1.19.5pin.There is no way to “fix” this in
pipby automatically changing something - the two constraints are actually incompatible.To be honest, I’ve lost the thread of what’s going on here. And a PR including just a test that claims to demonstrate “the problem”, without clearly explaining what the problem is in isolation (i.e., without expecting the reader to have followed this whole discussion) isn’t of much help here.
If someone can add a comment to the PR describing a way to reproduce the issue it’s trying to demonstrate, in terms of how to manually write a package that shows the problem, with step by step explanations, that would help a lot. I tried and failed to reverse engineer the logic of the test (the
unquoted_stringbusiness lost me).@pfmoore no typo, this really does work better without build isolation. Build isolation is a tradeoff, some things get better, some things get worse. Dealing with numpy-like ABI constraints is certainly worse (as this issue shows). There are other cases, for example when using
pip install .in a conda/spack/nix env, you never want build isolation if you deal with native dependencies. Same for editable installs, that arguably should disable build isolation.No worries, I am not planning to propose any changes to how things work today. You just have to be aware that it’s not clearcut and there are some conceptual issues with the current design.
On the contrary - I do it all the time, and so do the many users whose bug reports on NumPy and SciPy I deal with.
i believe at the first level its fair to define that any package that builds binary wheels which have stricter dependencies than the source package and don’t have the wheel reflect those stricter requirements is fundamentally wrong
at the second level i think there is need for a pep that allows packages to communicate that to tools and revolvers
I don’t think there are general docs on this, as it requires specific knowledge of the individual project. To give a broad summary, though:
For runtime dependencies,
pip freezedoes what you want. For build dependencies, you’ll need to use--no-build-isolation, and then create a new virtual environment of your own where you do the build. Then manually extract the data from the build requirements inpyproject.tomland use that as a starting point for your build environment. Modify and test as needed until you have a working build environment (you’ll have to do this, as the whole point here is thatpyproject.tomlisn’t correctly capturing some details of the build environment that’s needed). Then usepip freezeto capture the build environment details, and put that in a requirements.txt file that you’ll use in future to set up the build environment. Maintain that requirements file going forward as you would with any other requirements file.Agreed, this is a lot of manual work, but it’s basically what was needed before PEP 518 and
pyproject.toml, so it shouldn’t be that surprising that you need it ifpyproject.tomland isolated builds aren’t sufficient, I guess.Interesting. There’s a leaky abstraction or two in there somewhere. A Dockerfile aims to be a repeatable build but these steps inside it:
now quietly build a broken Python environment just because
numpy==1.20was released.--no-build-isolationnot the default? Does the build process need to install other packages temporarily?cvxpyusing the same release of numpy that’s installed and named in the currentpip installcommand?--no-binary=numpy(which compiles numpy from source, e.g. in order to link to a specific OpenBLAS) would thecvxpytemporarily installnumpythe same way? If not, could that also break thecvxpyinstallation?For me it installs numpy 1.17.3; the oldest-supported-numpy package on PyPI states:
I just checked with a
pip install numpy==1.20.0in my environment. Pip complains about about tensorflow being incompatible with it, but dclab imports and the tests run just fine. I assume that is because of the backwards compatibility (https://pypi.org/project/oldest-supported-numpy/):i don’t think you can point it to how the
pyproject.tomlis done.It’s pip that’s installing numpy twice (1.20.0 for building, and 1.19.5 as final version), so this can also happen with any other package combination in theory.
It works fine if you install numpy FIRST, and then the package depending on numpy, as then pip recognizes that a compatible version is available, and doesn’t install it again.
If it wasn’t with numpy but with another random package, you couldn’t point to “oldest-supported-numpy” either.
The build-dependency is specified as
"numpy>=1.13.0"- which allows every numpy > 1.13.0. Usingoldest-supported-numpymight even make it worse, as according to the documentation, that would pin the build-version asnumpy==1.13.0- which would break the install completely.In short, it’s pip that should resolve the build-dependency, detect that it’s a dependency that’s going to be installed anyway, and install numpy first (using this numpy installation for the build of the other package).
Please use numpy’s
oldest-supported-numpyhelper for declaring dependency on numpy in pyproject.toml.