LightGBM: Python package looks for library in wrong path

Trying to build the python package from source without installing it will somehow try to pick the wrong path for system libraries. I get an error about being unable to import scipy.sparse, even though I can import that library in the same session (this is after successfully building lib_lightgbm through the cmake system):

>>> import scipy.sparse
>>> import lightgbm
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/david/del/py_lightgbm/python-package/lightgbm/__init__.py", line 8, in <module>
    from .basic import Booster, Dataset, Sequence, register_logger
  File "/home/david/del/py_lightgbm/python-package/lightgbm/basic.py", line 126, in <module>
    _LIB = _load_lib()
  File "/home/david/del/py_lightgbm/python-package/lightgbm/basic.py", line 117, in _load_lib
    lib = ctypes.cdll.LoadLibrary(lib_path[0])
  File "/home/david/anaconda3/envs/py3/lib/python3.9/ctypes/__init__.py", line 460, in LoadLibrary
    return self._dlltype(name)
  File "/home/david/anaconda3/envs/py3/lib/python3.9/ctypes/__init__.py", line 382, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/david/anaconda3/envs/py3/lib/python3.9/site-packages/scipy/sparse/../../../../libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /home/david/del/py_lightgbm/lib_lightgbm.so)

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 17 (7 by maintainers)

Most upvoted comments

Ok, so! I’ve been able to collect my thoughts on this.


Reproducible Example

Given the following Dockerfile, pinned to the ubuntu:latest (Ubuntu 22.04) image as of two weeks aago.

Dockerfile (click me)
# pinning to specific version of ubuntu:22.04
FROM ubuntu@sha256:2a7dffab37165e8b4f206f61cfd984f8bb279843b070217f6ad310c9c31c9c7c

ENV CONDA=/root/miniforge \
    DEBIAN_FRONTEND=noninteractive \
    LANG="en_US.UTF-8" \
    LGB_COMMIT=416ecd5a8de1b2b9225ded3c919cb0d40ec0d9bd \
    LGB_SOURCE_DIR=/usr/local/src/LightGBM \
    PATH="/root/miniforge/bin:${PATH}" \
    PYTHON_VERSION=3.10

RUN apt-get update && \
    apt-get install \
        --no-install-recommends \
        -y \
            sudo && \
    sudo apt-get install \
        --no-install-recommends \
        -y \
            locales \
            software-properties-common && \
    sudo locale-gen ${LANG} && \
    sudo update-locale LANG=${LANG} && \
    sudo apt-get install \
        --no-install-recommends \
        -y \
            apt-utils \
            build-essential \
            ca-certificates \
            cmake \
            curl \
            git \
            iputils-ping \
            jq \
            libicu-dev \
            libcurl4 \
            libssl-dev \
            libunwind8 \
            locales \
            netcat \
            unzip \
            zip && \
    # install conda
    curl \
        -sL \
        -o miniforge.sh \
        https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-$(uname -m).sh && \
    sh miniforge.sh -b -p ${CONDA} && \
    conda config --set always_yes yes --set changeps1 no && \
    conda update -q -y conda && \
    git clone \
        --recursive \
        https://github.com/microsoft/LightGBM.git \
        "${LGB_SOURCE_DIR}" && \
    cd "${LGB_SOURCE_DIR}" && \
    git checkout ${LGB_COMMIT}

WORKDIR "${LGB_SOURCE_DIR}"
docker build \
    --no-cache \
    -t lgb-glibc-demo:local \
    - < ./Dockerfile

Installing lightgbm from source and loading it works without issue.

docker run \
    --rm \
    --workdir /usr/local/src/LightGBM/python-package \
    -it lgb-glibc-demo:local \
    /bin/bash -c "pip install . && python -c 'import lightgbm'"

But if you try conda install-ing libstdcxx-ng first, it will produce the error mentioned in this issue.

docker run \
    --rm \
    --workdir /usr/local/src/LightGBM/python-package \
    -it lgb-glibc-demo:local \
    /bin/bash -c "conda install -y -n base libstdcxx-ng && pip install . && python -c 'import lightgbm'"

OSError: /root/miniforge/bin/…/lib/libstdc++.so.6: version `GLIBCXX_3.4.30’ not found (required by /usr/local/src/LightGBM/python-package/compile/lib_lightgbm.so)

The use of a conda Python distribution + the presence of aa libstdc++.so.6 anywhere in conda’s library paths can cause this error to be thrown.


Root Cause (short description)

When lightgbm is compiled, it uses the system gcc / g++ and links against /usr/lib/x86_64-linux-gnu/libstdc++.so.6, which contains symbols from versions of GLIBCXX as new as GLIBCXX_3.4.30.

When lib_lightgbm.so is later loaded in a conda distribution of Python, a libstdc++.so.6 in conda’s lib/ directory is found first, and it only contains GLIBCXX symbols up to GLIBCXX_3.4.29.


Workarounds with no changes to LightGBM

1. Use conda's CMake and compilers to build LightGBM from source (click me)

From https://conda.io/projects/conda-build/en/latest/resources/compiler-tools.html#using-the-compiler-packages

Instead of gcc, the executable name of the compiler you use will be something like x86_64-conda_cos6-linux-gnu-gcc.

Many build tools such as make and CMake search by default for a compiler named simply gcc, so we set environment variables to point these tools to the correct compiler.

We set these variables in conda activate.d scripts, so any environment in which you will use the compilers must first be activated so the scripts will run. Conda-build does this activation for you using activation hooks installed with the compiler packages in CONDA_PREFIX/etc/conda/activate.d.

# install the problematic library
conda install -y -n base \
    libstdcxx-ng

# confirm that it results in a `libstdc++.so.6` being added in conda env
find / -name 'libstdc++.so.6'
# /root/miniforge/lib/libstdc++.so.6
# /root/miniforge/pkgs/libstdcxx-ng-11.2.0-he4da1e4_16/lib/libstdc++.so.6
# /usr/lib/x86_64-linux-gnu/libstdc++.so.6

# get conda compilers
conda install -y -n base \
    cmake \
    gcc_linux-64 \
    gxx_linux-64

# it's important to activate the target conda env, to set
# the relevant environment variables pointing to conda's compilers
source activate base

# you can see the effect of this by checking env variables
echo $CC
# /root/miniforge/bin/x86_64-conda-linux-gnu-cc

echo $CXX
# /root/miniforge/bin/x86_64-conda-linux-gnu-c++

cd /usr/local/src/LightGBM
pip uninstall -y lightgbm
rm -rf ./build
rm -f ./lib_lightgbm.so

cd ./python-package
pip install .

# confirm that importing works
python -c "import lightgbm; print(lightgbm.__version__)"
# 3.3.2.99

# confirm that the maximum GLIBCXX version is less than
# the one from the error message, and that the libstdc++.so.6 linked
# is the one from /root/miniforge
LIB_LIGHTGBM_IN_CONDA=$(
    find /root/miniforge -name 'lib_lightgbm.so' \
    | head -1
)
ldd -v \
    "${LIB_LIGHTGBM_IN_CONDA}"
2. point LD_PRELOAD at the non-conda lib/ directory prior to starting python (click me)
# install the problematic library
conda install -y -n base \
    libstdcxx-ng

# confirm that it resulted in a `libstdc++.so.6` being added in conda env
find / -name 'libstdc++.so.6'
# /root/miniforge/lib/libstdc++.so.6
# /root/miniforge/pkgs/libstdcxx-ng-11.2.0-he4da1e4_16/lib/libstdc++.so.6
# /usr/lib/x86_64-linux-gnu/libstdc++.so.6

# build LightGBM from source
cd /usr/local/src/LightGBM
pip uninstall -y lightgbm
rm -rf ./build
rm -f ./lib_lightgbm.so
cd ./python-package
pip install .

# try loading lightgbm (this will fail)
python -c "import lightgbm; print(lightgbm.__version__)"

# try loading lightgbm with LD_LIBRARY_PATH set to the same paths
# referenced in lib_lightgbm.so
LD_PRELOAD="${LD_PRELOAD}:/usr/lib/x86_64-linux-gnu/libstdc++.so.6" \
python -c "import lightgbm; print(lightgbm.__version__)"

NOTE: this cannot be done from inside Python. The following code will fail.

import os
os.environ["LD_PRELOAD"] = "/usr/lib/x86_64-linux-gnu/libstdc++.so.6"
import lightgbm
3. Modify `lib_lightgbm.so`'s DT_RPATH tag so that it points at the place where it found `libstdc++.so.6` (click me)

See https://man7.org/linux/man-pages/man3/dlopen.3.html and https://stackoverflow.com/a/20333550/3986677.

rpath is a way to embed a hint about where to find include dirs in a shared object.

cd /root/miniforge/lib/python3.9/site-packages/lightgbm/
cp lib_lightgbm.so lib_lightgbm2.so

# shows no rpath
chrpath -l lib_lightgbm2.so

# fails
python -c \
    "import ctypes; ctypes.cdll.LoadLibrary('/root/miniforge/lib/python3.9/site-packages/lightgbm/lib_lightgbm.so')"

# patch the rpath
patchelf --set-rpath '/usr/lib/x86_64-linux-gnu' lib_lightgbm2.so

# shows rpath
chrpath -l lib_lightgbm2.so

# succeeds!
python -c \
    "import ctypes; ctypes.cdll.LoadLibrary('/root/miniforge/lib/python3.9/site-packages/lightgbm/lib_lightgbm2.so')"

Root Cause (longer description)

I’ve found this topic very complicated (or at least, new to me), so have been capturing my running notes and example code snippets at https://github.com/jameslamb/lgb-glibc-demo.

Click below to see a summary of the issue that is more detailed than Root Cause (short description) but less detailed than my notes in that repo.

much longer description (click me)

Whenever lightgbm is loaded with import lightgbm, it uses ctypes.dll.LoadLibrary() to load its compiled library, lib_lightgbm.so.

https://github.com/microsoft/LightGBM/blob/416ecd5a8de1b2b9225ded3c919cb0d40ec0d9bd/python-package/lightgbm/basic.py#L117

The ctypes documentation desccribes this process in detail.

From “Finding shared libraries” in the ctypes docs (link)

When programming in a compiled language, shared libraries are accessed when compiling/linking a program, and when the program is run.

…the ctypes library loaders act like when a program is run, and call the runtime loader directly.

And from “loading shared libraries” (doc)

If you have an existing handle to an already loaded shared library, it can be passed as the handle named parameter, otherwise the underlying platform’s dlopen or LoadLibrary function is used to load the library into the process, and to get a handle to it.

“underlying platform’s dlopen” here refers to a standard C interface available on all operating systems.

For example, see https://man7.org/linux/man-pages/man3/dlopen.3.html for Linux.

From those docs, when searching for a library, the following are checked in order:

(ELF only) If the calling object (i.e., the shared library or executable from which dlopen() is called) contains a DT_RPATH tag, and does not contain a DT_RUNPATH tag, then the directories listed in the DT_RPATH tag are searched.

If, at the time that the program was started, the environment variable LD_LIBRARY_PATH was defined to contain a colon-separated list of directories, then these are searched.

(ELF only) If the calling object contains a DT_RUNPATH tag, then the directories listed in that tag are searched.

The cache file /etc/ld.so.cache (maintained by ldconfig(8)) is checked to see whether it contains an entry for filename.

The directories /lib and /usr/lib are searched (in that order).

conda tries very hard to ensure that its directories are searched first when dlopen() tries to load a library. One mechanism it uses for this is setting the DT_RUNPATH on its distribution of python.

Try this, using the container image built higher up in this description.

docker run \
    --rm \
    --workdir /usr/local/src/LightGBM/python-package \
    -it lgb-glibc-demo:local \
    /bin/bash

readelf -d /root/miniforge/bin/python \
| grep RPATH

Which yields the following.

 0x000000000000000f (RPATH)              Library rpath: [$ORIGIN/../lib]

That says “look in /root/miniforce/lib/ first when loading libraries”!

If you look at the copy of python in a specific conda environment, you’ll see something similar.

conda create --name test-env python=3.9
readelf -d /root/miniforge/envs/test-env/bin/python \
| grep RPATH

That shows the same output.

 0x000000000000000f (RPATH)              Library rpath: [$ORIGIN/../lib]

Which this time means “first look in /root/miniforge/envs/test-env/lib/ when loading libraries”.

If you pip install lightgbm inside that environment, you’ll see it gets linked against a libstdc++.so.6 outside of conda’s lib/ directories.

source activate test-env
cd /usr/local/src/LightGBM/python-package
pip install .
LIB_LIGHTGBM_IN_CONDA=$(
    find /root/miniforge -name 'lib_lightgbm.so' \
    | head -1
)
# /root/miniforge/envs/test-env/lib/python3.9/site-packages/lightgbm/lib_lightgbm.so

ldd -v ${LIB_LIGHTGBM_IN_CONDA}

That output contains a lot of information, including the following key line:

libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f9ee66e2000)

That says “lib_lightgbm.so was linked against /lib/x86_64-linux-gnu/libstdc++.so.6”. That directory is outside of the ones conda looks in first!!


Changes LightGBM could make to mitigate this

I think the most reliable, portable way for LightGBM to handle this is to attach a DT_RPATH to lib_lightgbm.so when it’s compiled. That way, when dlopen() loads lib_lightgbm.so, it will first look in the same directories that the linker chose when compiling lib_lightgbm.so.

See CMake’s docs on this at https://gitlab.kitware.com/cmake/community/-/wikis/doc/cmake/RPATH-handling#default-rpath-settings.

By default if you don’t change any RPATH related settings, CMake will link the executables and shared libraries with full RPATH to all used libraries in the build tree. When installing, it will clear the RPATH of these targets so they are installed with an empty RPATH.

And https://gitlab.kitware.com/cmake/community/-/wikis/doc/cmake/RPATH-handling#always-full-rpath

CMAKE_INSTALL_RPATH_USE_LINK_PATH…If this option is enabled, all these directories except those which are also in the build tree will be added to the install RPATH automatically.

I haven’t tested that yet, but I think it’s worth exploring to try to mitigate this issue.

While working on #5169, I ran into this issue and have been doing some investigation. I think I’ve identified a fully reproducible example (using a container) and a strategy for fixing this, based on what I’ve learned about the way that ctypes loads libraries.

Will post those details here in the next few days, when I have time. Just wanted to post to let others here know I’m actively looking into this.

Thanks for that information. I’m not familiar with that pattern for Python projects, will look around for some examples and documentation on it. As discussed in #5061 , I think it’s possible that the package’s strategy for compiling lib_lightgbm might need to change substantially in the future.

BUT…I also think, based on my investigation above, that setting DT_RPATH on lib_lightgbm.so to point to the locations where the linker found libraries at compile time might be a quick way to make the issue reported here less likely.

@jameslamb Wow, brilliant investigation, thanks a lot!

If attaching DT_RPATH to lib_lightgbm.so is just a tip and not a strict rule, I think we can investigate this approach.

I remember this conda behavior was the reason why we statically link libstdc++ on Windows during compiling with MinGW: #899 https://github.com/microsoft/LightGBM/blob/6de9bafaeb4de46b22c81e7199bb5de8b28e6174/CMakeLists.txt#L323-L325