LightGBM: Python package looks for library in wrong path
Trying to build the python package from source without installing it will somehow try to pick the wrong path for system libraries. I get an error about being unable to import scipy.sparse, even though I can import that library in the same session (this is after successfully building lib_lightgbm through the cmake system):
>>> import scipy.sparse
>>> import lightgbm
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/david/del/py_lightgbm/python-package/lightgbm/__init__.py", line 8, in <module>
from .basic import Booster, Dataset, Sequence, register_logger
File "/home/david/del/py_lightgbm/python-package/lightgbm/basic.py", line 126, in <module>
_LIB = _load_lib()
File "/home/david/del/py_lightgbm/python-package/lightgbm/basic.py", line 117, in _load_lib
lib = ctypes.cdll.LoadLibrary(lib_path[0])
File "/home/david/anaconda3/envs/py3/lib/python3.9/ctypes/__init__.py", line 460, in LoadLibrary
return self._dlltype(name)
File "/home/david/anaconda3/envs/py3/lib/python3.9/ctypes/__init__.py", line 382, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /home/david/anaconda3/envs/py3/lib/python3.9/site-packages/scipy/sparse/../../../../libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /home/david/del/py_lightgbm/lib_lightgbm.so)
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 17 (7 by maintainers)
Ok, so! I’ve been able to collect my thoughts on this.
Reproducible Example
Given the following Dockerfile, pinned to the
ubuntu:latest(Ubuntu 22.04) image as of two weeks aago.Dockerfile (click me)
Installing
lightgbmfrom source and loading it works without issue.But if you try
conda install-inglibstdcxx-ngfirst, it will produce the error mentioned in this issue.The use of a
condaPython distribution + the presence of aalibstdc++.so.6anywhere inconda’s library paths can cause this error to be thrown.Root Cause (short description)
When
lightgbmis compiled, it uses the systemgcc/g++and links against/usr/lib/x86_64-linux-gnu/libstdc++.so.6, which contains symbols from versions of GLIBCXX as new asGLIBCXX_3.4.30.When
lib_lightgbm.sois later loaded in a conda distribution of Python, alibstdc++.so.6in conda’slib/directory is found first, and it only contains GLIBCXX symbols up toGLIBCXX_3.4.29.Workarounds with no changes to LightGBM
1. Use conda's CMake and compilers to build LightGBM from source (click me)
From https://conda.io/projects/conda-build/en/latest/resources/compiler-tools.html#using-the-compiler-packages
2. point LD_PRELOAD at the non-conda lib/ directory prior to starting python (click me)
NOTE: this cannot be done from inside Python. The following code will fail.
3. Modify `lib_lightgbm.so`'s DT_RPATH tag so that it points at the place where it found `libstdc++.so.6` (click me)
See https://man7.org/linux/man-pages/man3/dlopen.3.html and https://stackoverflow.com/a/20333550/3986677.
rpath is a way to embed a hint about where to find include dirs in a shared object.
Root Cause (longer description)
I’ve found this topic very complicated (or at least, new to me), so have been capturing my running notes and example code snippets at https://github.com/jameslamb/lgb-glibc-demo.
Click below to see a summary of the issue that is more detailed than
Root Cause (short description)but less detailed than my notes in that repo.much longer description (click me)
Whenever
lightgbmis loaded withimport lightgbm, it usesctypes.dll.LoadLibrary()to load its compiled library,lib_lightgbm.so.https://github.com/microsoft/LightGBM/blob/416ecd5a8de1b2b9225ded3c919cb0d40ec0d9bd/python-package/lightgbm/basic.py#L117
The
ctypesdocumentation desccribes this process in detail.From “Finding shared libraries” in the
ctypesdocs (link)And from “loading shared libraries” (doc)
“underlying platform’s
dlopen” here refers to a standard C interface available on all operating systems.For example, see https://man7.org/linux/man-pages/man3/dlopen.3.html for Linux.
From those docs, when searching for a library, the following are checked in order:
condatries very hard to ensure that its directories are searched first whendlopen()tries to load a library. One mechanism it uses for this is setting the DT_RUNPATH on its distribution ofpython.Try this, using the container image built higher up in this description.
Which yields the following.
That says “look in
/root/miniforce/lib/first when loading libraries”!If you look at the copy of
pythonin a specific conda environment, you’ll see something similar.That shows the same output.
Which this time means “first look in
/root/miniforge/envs/test-env/lib/when loading libraries”.If you
pip install lightgbminside that environment, you’ll see it gets linked against alibstdc++.so.6outside of conda’s lib/ directories.That output contains a lot of information, including the following key line:
That says “
lib_lightgbm.sowas linked against/lib/x86_64-linux-gnu/libstdc++.so.6”. That directory is outside of the onescondalooks in first!!Changes LightGBM could make to mitigate this
I think the most reliable, portable way for LightGBM to handle this is to attach a DT_RPATH to
lib_lightgbm.sowhen it’s compiled. That way, whendlopen()loadslib_lightgbm.so, it will first look in the same directories that the linker chose when compilinglib_lightgbm.so.See CMake’s docs on this at https://gitlab.kitware.com/cmake/community/-/wikis/doc/cmake/RPATH-handling#default-rpath-settings.
And https://gitlab.kitware.com/cmake/community/-/wikis/doc/cmake/RPATH-handling#always-full-rpath
I haven’t tested that yet, but I think it’s worth exploring to try to mitigate this issue.
While working on #5169, I ran into this issue and have been doing some investigation. I think I’ve identified a fully reproducible example (using a container) and a strategy for fixing this, based on what I’ve learned about the way that
ctypesloads libraries.Will post those details here in the next few days, when I have time. Just wanted to post to let others here know I’m actively looking into this.
Thanks for that information. I’m not familiar with that pattern for Python projects, will look around for some examples and documentation on it. As discussed in #5061 , I think it’s possible that the package’s strategy for compiling
lib_lightgbmmight need to change substantially in the future.BUT…I also think, based on my investigation above, that setting
DT_RPATHonlib_lightgbm.soto point to the locations where the linker found libraries at compile time might be a quick way to make the issue reported here less likely.@jameslamb Wow, brilliant investigation, thanks a lot!
If attaching DT_RPATH to
lib_lightgbm.sois just a tip and not a strict rule, I think we can investigate this approach.I remember this conda behavior was the reason why we statically link
libstdc++on Windows during compiling with MinGW: #899 https://github.com/microsoft/LightGBM/blob/6de9bafaeb4de46b22c81e7199bb5de8b28e6174/CMakeLists.txt#L323-L325