scipy: BUG: meson fails to properly detect numpy, pybind11 and (sort of) pythran when cross compiling

The new dependency detection in scipy/meson.build works suitably for native builds of SciPy, but does not correctly detect build-arch dependencies for cross compilation. The meson rules invoke the host Python interpreter, there called py3, to import the numpy, pybind11 and pythran packages (all installed for the host arch) and deduce the proper paths for includes and, in the case of NumPy, infer a search path for the npymath and npyrandom libraries.

NOTE: the Pythran search is incorrect but doesn’t really cause a problem because the it is only used on the host to render files; the include path returned by the problematic search is never referenced by any subsequent meson build logic. The entire incdir_pythran block could be removed without adverse effects.

If the requisite dependencies are installed for the host arch (whether or not they are also installed for the build arch), the detection will return paths for the host arch, which might cause subtle problems in header files and fails outright when the linker attempts to link object files for the build arch with the native npymath or npyrandom libraries. If the dependencies are installed only for the build arch, the interpreter will fail entirely and meson will never even configure the build.

Resolution attempts

I have tried a couple simple workarounds, to no avail:

  1. Install only the dependencies only for the build arch. On Void Linux, we set a lot of environment variables to tell the host Python to use the sysconfig data for the build arch and also add the build root to PYTHONPATH, allowing the host Python to find these modules and grabbing relevant information (field sizes, shlib suffixes, etc.) for the build arch rather than the host. This would work with the existing meson detection for pybind11 and pythran, but does not work for numpy because import numpy triggers a bunch of shared object loads and the build arch libraries are incompatible with the host. (A more targeted import, such as from numpy.__config__ import get_include, might work in this situation, but I haven’t bothered to try.)
  2. Install dependencies for host and build arches, use the detection to find paths to the host, but then manually prepend the build root to the returned paths. This almost works on Void Linux because we install packages for the build arch under a /usr/<triple> prefix that otherwise mirrors the native layout. In generally, most of the compilation commands include the right -I flags to find headers for the build arch. However, some commands still include paths to the host interpreter, and the find_library calls to identify npymath and npyrandom somehow still pick up the host versions and trigger a linker failure. I don’t know enough about how meson sets the Python environment when searching for it to understand how these host paths are creeping in or why find_library still seems to prefer the host paths even though I add the correct paths to the search paths in that function. (I wouldn’t really expect find_library to dig into the numpy tree to find the libraries, so it seems an additional search path is creeping in before I add one explicitly.)

Possible fixes

Although I’m speculating, it seems a few approaches could be taken to resolve this issue, in order of decreasing “niceness”:

  1. Convince numpy (and, probably, pybind11) to ship pkg-config files. This is probably desirable, was mentioned in the [related meson issue], and sidesteps a lot of problems. The trouble with pybind11 is that it wants to be entirely self-contained within the Python package tree; however, even if it ships a .pc file within its package tree instead of in a system-specific path, Void can probably work around the issue with relative ease. (Void already wraps pkg-config for cross builds so that it loads descriptors for the build arch and manipulates the paths appropriately.) The determination for pythran should just be dropped altogether (also, rather than invoking a Python interpreter to read SCIPY_USE_PYTHRAN from the environment, Pythran should just be a meson build option). The trouble here is backwards compatibility; if SciPy will build with old versions of NumPy or pybind11, it will still need fallback detection. Hence…
  2. Existing logic can be improved, even if only as a fallback for older versions of dependencies. I’m not sure what this should look like, but reading variables from the environment might make sense (e.g., NUMPY_ROOT, PYBIND11_INCLUDE_DIR); when these variables are defined, they are used as-is; otherwise, the existing interpreter invocations can provide sensible defaults for native builds. Of course, this assumes that find_library can be made to find NumPy libraries for the build arch even though it now prefers the host versions.
  3. The existing search for a Python interpreter could allow a custom path rather than always using the default, which seems to be the same interpreter that is running meson. This might allow some clever wrapping of the interpreter but is probably an incomplete (and maybe completely ineffective) solution. For example, no amount of sensible wrapping will allow print(numpy.get_include()) to dump some modified path; however, a successful from numpy.__config__ import get_include using the build-arch NumPy might make this workable if ugly.

Related issue

This issue was opened in response to https://github.com/mesonbuild/meson/issues/9598#issuecomment-1201475862 as a means to track SciPy specifics and provide a link target for inclusion in https://github.com/scipy/scipy/issues/14812.

cc: @rgommers

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 24 (22 by maintainers)

Most upvoted comments

I wouldn’t say there aren’t places where you can use it (pybind11_global does use it!), but it’s somewhat discouraged due to the ability to do exactly what it was supposed to do - install to any arbitrary place at the root of your python environment - which includes / if you are not in a venv. That’s why we have a “safe” pybind11 package and a “you know what you are doing” pybind11_global package.

I think different people have different ideas of what it should mean, probably.

For example, I would consider the data directory an officially sanctioned method for messing with the base system on the grounds that it is my desire to mess with the base system.

Obviously, pip install package installs things outside of site-packages and people are okay with this, because console_scripts is installed outside of site-packages. It’s just installed to the “scripts” directory instead of the “data” directory, so it’s arbitrarily declared to be acceptable to install to the scripts directory specifically. After all, scripts are really common. Common enough that I suppose brew doctor doesn’t consider it something to complain about.

No, pip install package should not install things outside of site packages. If a user does this without being inside a virtual environment, it messes with the base system. Try this in brew then type “brew doctor”, and it will complain about mysterious files that it does not control showing up in the root. This is why pybind11 has a pybind11-global copy - the normal pybind11 doesn’t touch anything outside of site-packages. The “global” exists entirely for this purpose - it is allowed to put things outside of site-packages. It’s also safe to use as a build requirement, as that’s always in a venv.

There are two better options, IMO, though. One is to use an entry-point for pkg-config. Tools like meson-python and scikit-build-core could check for this entrypoint, and add the folders specified. I briefly had this for cmake files, but removed it while we design a proper name and system. But it did work, there’s a PR for pybind11 and history in scikit-build-core where it was working. (There are 2-3 ways to do this, and we haven’t picked one and settled on a name yet.)

Second option requires pkg-config’s search to be smart enough to detect <package>/... as a pkg-config file - CMake does this (<package>/share/...), so if you add site-packages to the correct CMake variable, it can detect pybind11’s config. This is a bit more limited (it has have the same name as the package it’s in), but works in CMake’s case, and is how scikit-build-core finds pybind11 currently. Not sure if pkg-config has the same search option.

I’ve really only looked at libnpymath for two minutes, but it leaves me scratching my head. It seems like a bunch of convenience wrappers to paper over implementation differences in low-level math functions that should be provided by a C runtime. While NumPy might need this compatibility layer for itself, I fail to understand why this should be exposed as a public library. The purpose of the NumPy C API is to allow compiled Python extensions (like SciPy) to interact with NumPy data. Let those extensions worry implementation disparities for low-level functions on their own.

In short, I agree with the argument in numpy/numpy#20880 that libnpymath should be stripped way back and become nothing more than a convenience header easily vendored by SciPy.

Re: pybind11, it was only recently that they moved everything into the package tree; earlier releases installed in system locations.

Both options are available. pybind11-global installs to the system locations (intended for use in pyproject.toml). You can get both via pybind11[global].

  1. Convince numpy (and, probably, pybind11) to ship pkg-config files. This is probably desirable, was mentioned in the [related meson issue], and sidesteps a lot of problems. The trouble with pybind11 is that it wants to be entirely self-contained within the Python package tree; however, even if it ships a .pc file within its package tree instead of in a system-specific path, Void can probably work around the issue with relative ease. (Void already wraps pkg-config for cross builds so that it loads descriptors for the build arch and manipulates the paths appropriately.)

By sheer coincidence, I had already submitted https://github.com/pybind/pybind11/pull/4077

Note that pybind11 already installs cmake files, and Meson can pick those up, although again it installs them to the self-contained python package tree, which is a bit painful. There is also a Meson wrap, so dependency('pybind11') will find something via cmake, if you manually set some cmake variables, and otherwise download a private copy. I’m generally hoping to make this better (particularly, you can install pybind11 via cmake ...... && make install, which installs to the system instead and which I believe is vastly superior).