LightGBM: LightGBM is incompatible with libomp 12 and 13 on macOS

Description

LightGBM cannot be used to fit multiple models in parallel using threads with the latest libomp. On 2014 MacBook Pro:

OMP: Error #13: Assertion failure at kmp_runtime.cpp(3689).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/.
[1]    17358 abort      python myfile2.py

On 2019 MacBook Pro:

OMP: Error #131: Thread identifier invalid.

Setting nthreads=1 doesn’t solve the problem.

Reproducible example

from lightgbm import LGBMClassifier
import numpy as np
from concurrent.futures import ThreadPoolExecutor

x = np.random.random((200, 4))
y = x.sum(axis=1) >= 2


def myfunc(a=7):
    test = LGBMClassifier().fit(x, y)
    print(test.predict(x))


with ThreadPoolExecutor(20) as tpe:
    print(list(tpe.map(myfunc, range(20))))

Environment info

LightGBM version or commit hash: 3.1.1 (with python 3.7.3) and 3.2.1 (with python 3.9.4)

brew install libomp

libomp: stable 12.0.0 (bottled)
LLVM's OpenMP runtime library
https://openmp.llvm.org/
/usr/local/Cellar/libomp/12.0.0 (9 files, 1.5MB) *
Poured from bottle on 2021-04-26 at 11:06:26
From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/libomp.rb

Command(s) you used to install LightGBM

pip install lightgbm

Additional Comments

The code does work with libomp version 11. Downgraded using

wget https://raw.githubusercontent.com/Homebrew/homebrew-core/fb8323f2b170bd4ae97e1bac9bf3e2983af3fdb0/Formula/libomp.rb
brew unlink libomp
brew install libomp.rb

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 23
  • Comments: 26

Commits related to this issue

Most upvoted comments

New major LLVM version 13 was released 4 days ago: https://github.com/llvm/llvm-project/releases/tag/llvmorg-13.0.0. And the latest Homebrew libomp formulae is pointing to that version now: https://github.com/Homebrew/homebrew-core/blob/4343aee9c28d28b9ed3208b5933df54c29b916fb/Formula/libomp.rb#L4.

But unfortunately this bug (https://github.com/microsoft/LightGBM/issues/4229#issuecomment-855839996) wasn’t fixed in stable 13 release. I’m going to reflect this fact in the issue’s title.

I have the same issue and did some testing: basically libomp 12.0 works with Catalina, but results in segfault for Big Sur. Downgrading to 11.1 worked for Big Sur (tested on Intel MBP and M1 MBP via rosetta2)

One suggested workaround in the upstream bug report without downgrading libomp version is to set some environmental variables:

LIBOMP_USE_HIDDEN_HELPER_TASK=0
LIBOMP_NUM_HIDDEN_HELPER_THREADS=0

https://bugs.llvm.org/show_bug.cgi?id=50579#c1

Moving the import statement import lightgbm as lgb to line 1 in my file actually got rid of the error. As per suggestion from https://github.com/dmlc/xgboost/issues/7039#issuecomment-860910066

libomp version /usr/local/Cellar/libomp/12.0.0

Error dump when loading booster model. Putting it out here in case it is useful:

Process:               Python [6481]
Path:                  /Library/Frameworks/Python.framework/Versions/3.7/Resources/Python.app/Contents/MacOS/Python
Identifier:            Python
Version:               3.7.3 (3.7.3)
Code Type:             X86-64 (Native)
Parent Process:        zsh [511]
Responsible:           iTerm2 [403]
User ID:               501

Date/Time:             2021-07-02 10:35:36.911 +0800
OS Version:            macOS 11.4 (20F71)
Report Version:        12
Bridge OS Version:     5.4 (18P4663)

Time Awake Since Boot: 14000 seconds
Time Since Wake:       1500 seconds

System Integrity Protection: enabled

Crashed Thread:        41

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000000000048
Exception Note:        EXC_CORPSE_NOTIFY

Termination Signal:    Segmentation fault: 11
Termination Reason:    Namespace SIGNAL, Code 0xb
Terminating Process:   exc handler [6481]

VM Regions Near 0x48:
--> 
    __TEXT                      10388e000-10388f000    [    4K] r-x/rwx SM=COW  /Library/Frameworks/Python.framework/Versions/3.7/Resources/Python.app/Contents/MacOS/Python

Thread 0:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib        	0x00007fff204dc206 _kernelrpc_mach_vm_protect_trap + 10
1   libsystem_kernel.dylib        	0x00007fff204df1da mach_vm_protect + 33
2   libsystem_pthread.dylib       	0x00007fff20512589 _pthread_create + 533
3   libomp.dylib                  	0x0000000183c99568 __kmp_create_worker + 264
4   libomp.dylib                  	0x0000000183c6f2a4 __kmp_allocate_thread + 954
5   libomp.dylib                  	0x0000000183c6ac21 __kmp_allocate_team + 1311
6   libomp.dylib                  	0x0000000183c6c51c __kmp_fork_call + 5365
7   libomp.dylib                  	0x0000000183c61295 __kmpc_fork_call + 293
8   lib_lightgbm.so               	0x00000001838d5036 LightGBM::ParallelPartitionRunner<int, false>::ParallelPartitionRunner(int, int) + 118
9   lib_lightgbm.so               	0x00000001838c9379 LightGBM::GBDT::GBDT() + 777
10  lib_lightgbm.so               	0x00000001838be0f1 LightGBM::Boosting::CreateBoosting(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, char const*) + 1745
11  lib_lightgbm.so               	0x0000000183abf490 LightGBM::Booster::Booster(char const*) + 400

On XGBoost we are also facing issues with updated libomp. It has internal error: https://github.com/dmlc/xgboost/pull/6912/checks?check_run_id=2459890229

FYI (not sure if this is common knowledge yet): when developing on LightGBM on Apple Silicon, I never turned off OpenMP but used gcc instead of clang for compilation (for me, that was):

export CXX=g++-13 CC=gcc-13

This fixed any problems I had 😅

Unfortunately, LLVM developers haven’t fixed this bug (https://github.com/microsoft/LightGBM/issues/4229#issuecomment-855839996) in 12.0.1 release.

Thanks @borchero , that’s helpful!

Looking into this a bit today, I also think that some of these failures might not actually be about incompatibility with particular versions of OpenMP, but rather related to #5106.

Fixing the search paths embedded in lib_lightgbm.so on macOS might eliminate some of these cases where programs segfault because multiple versions of libomp have been loaded.

details (click me)

Tried the following today on my intel mac:

  • OS: macOS 14.1.2 (Sonoma)
  • CPU: intel chip
  • compiler: AppleClang 13.0.0
  • Python: 3.11.7
  • OpenMP: 17.0.6
  1. build lib_lightgbm
rm -rf ./build
mkdir ./build
cd ./build
cmake ..
make -j2 _lightgbm
cd ..
  1. check what it linked against
# check what it's linked to
otool -L lib_lightgbm.so
# 
../lib_lightgbm.so:
    @rpath/lib_lightgbm.so (compatibility version 0.0.0, current version 0.0.0)
    /usr/local/opt/libomp/lib/libomp.dylib (compatibility version 5.0.0, current version 5.0.0)
    /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1200.3.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.0.0

Notice that even though I was building in an active conda environment, it found Homebrew’s OpenMP, /usr/local/opt/libomp/lib/libomp.dylib.

  1. install the Python library
sh build-python.sh install --precompile
  1. run an example

This segfaults, I think because it’s finding the llvm-openmp from conda:

python ./examples/python-guide/logistic_regression.py
Performance of `binary` objective with binary labels:
Segmentation fault: 11

Running with some debugging stuff set… it looks like that’s exactly what’s happening. 2 versions of OpenMP are being loaded.

DYLD_PRINT_LIBRARIES=1 \
python examples/python-guide/logistic_regression.py 2>&1 \
| grep libomp
dyld[32037]: <891B2F9B-F926-3D67-AA9C-D58D47668AFB> /Users/jlamb/mambaforge/envs/lgb-dev/lib/libomp.dylib
dyld[32037]: <C91365F6-6644-300A-9277-1946696E9E86> /usr/local/Cellar/libomp/17.0.4/lib/libomp.dylib

Looking a bit more closely, it seems that scikit-learn comes with an sklearn/utils/_openmp_helpers.cpython-311-darwin.so which has an RPATH entry that causes conda’s libomp.dylib to be loaded.

otool -L /Users/jlamb/mambaforge/envs/lgb-dev/lib/python3.11/site-packages/sklearn/utils/_openmp_helpers.cpython-311-darwin.so
/Users/jlamb/mambaforge/envs/lgb-dev/lib/python3.11/site-packages/sklearn/utils/_openmp_helpers.cpython-311-darwin.so:
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1197.1.1)
	@rpath/libomp.dylib (compatibility version 5.0.0, current version 5.0.0)

Patching out lib_lightgbm’s corresponding entry so that it will end up not loading a different version, the example runs without segfaulting.

install_name_tool \
    -change /usr/local/opt/libomp/lib/libomp.dylib \
    @rpath/libomp.dylib \
    /Users/jlamb/mambaforge/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/lib/lib_lightgbm.so
otool -L \
    /Users/jlamb/mambaforge/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/lib/lib_lightgbm.so
/Users/jlamb/mambaforge/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/lib/lib_lightgbm.so:
	@rpath/lib_lightgbm.so (compatibility version 0.0.0, current version 0.0.0)
	@rpath/libomp.dylib (compatibility version 5.0.0, current version 5.0.0)
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1200.3.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.0.0)
python examples/python-guide/logistic_regression.py
Performance of `binary` objective with binary labels:
{'time': 0.031093120574951172, 'correlation': 0.6012584922759894, 'logloss': 0.15545640415178236}
Performance of `xentropy` objective with binary labels:
{'time': 0.0031642913818359375, 'correlation': 0.6012584922759894, 'logloss': 0.15545640415178236}
Performance of `xentropy` objective with probability labels:
{'time': 0.006477832794189453, 'correlation': 0.884189150816587, 'logloss': 0.1551448517607808}
Best `binary` time: 0.002405881881713867
Best `xentropy` time: 0.0023250579833984375

Just stopping here for now to post my notes. I’ll continue working on this.

I found tonight that upgrading to the latest libomp shipped by Homebrew (v15.0.6), I was able to compile LightGBM, build the Python package, and run all of its tests without issue on my macbook (Intel chip, macOS 12.2.1).

brew install libomp
cd ./python-package
pip install .
cd ..
pytest tests/python_package_tests

MacBook Air (M1, 2020) running macOS Monterey version 12.5.1.

Trying to fit LightGBMModel gives me [1] 36565 segmentation fault python test_lightgbm.py.

libomp:

brew info libomp
==> libomp: stable 14.0.6 (bottled)
LLVM's OpenMP runtime library
https://openmp.llvm.org/
/usr/local/Cellar/libomp/11.1.0 (9 files, 1.4MB)
  Poured from bottle on 2022-09-13 at 15:01:22
/usr/local/Cellar/libomp/14.0.6 (7 files, 1.6MB)
  Poured from bottle on 2022-09-13 at 15:05:49

I tried:

wget https://raw.githubusercontent.com/Homebrew/homebrew-core/fb8323f2b170bd4ae97e1bac9bf3e2983af3fdb0/Formula/libomp.rb
brew unlink libomp
brew install libomp.rb

But it gives me:

Error: Failed to load cask: libomp.rb
Cask 'libomp' is unreadable: wrong constant name #<Class:0x00007fa9e92bb340>
Warning: Treating libomp.rb as a formula.
Warning: libomp 11.1.0 is already installed, it's just not linked.
To link this version, run:
  brew link libomp