scikit-learn: ImportError: dlopen: cannot load any more object with static TLS with torch built with gcc 5.5
I am not sure if this is a PyTorch bug, a scikit-learn bug or a numba, but this used to work in scikit-learn 0.20.3 and stopped working in the 0.21.0 series, so for now I am going to venture a guess that it is a regression in scikit learn.
When I do the following series of imports (minimized from the original import, which was import librosa
), loading the following program fails:
import torch
import soundfile
import scipy.signal
import numba
import sklearn
with
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/sklearn/__check_build/__init__.py", line 44, in <module>
from ._check_build import check_build # noqa
ImportError: dlopen: cannot load any more object with static TLS
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test_torch.py", line 5, in <module>
import sklearn
File "/opt/conda/lib/python3.6/site-packages/sklearn/__init__.py", line 75, in <module>
from . import __check_build
File "/opt/conda/lib/python3.6/site-packages/sklearn/__check_build/__init__.py", line 46, in <module>
raise_build_error(e)
File "/opt/conda/lib/python3.6/site-packages/sklearn/__check_build/__init__.py", line 41, in raise_build_error
%s""" % (e, local_dir, ''.join(dir_content).strip(), msg))
ImportError: dlopen: cannot load any more object with static TLS
___________________________________________________________________________
Contents of /opt/conda/lib/python3.6/site-packages/sklearn/__check_build:
_check_build.cpython-36m-x86_64-linux-gnu.so__pycache__ __init__.py
setup.py
___________________________________________________________________________
It seems that scikit-learn has not been built correctly.
If you have installed scikit-learn from source, please do not forget
to build the package before using it: run `python setup.py install` or
`make` in the source directory.
If you have used an installer, please check that it is suited for your
Python version, your operating system and your platform.
Downgrading to scikit-learn 0.20.3 makes the problem go away.
Versions
jenkins@260bf77532d0:~/workspace/test$ python
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sklearn; sklearn.show_versions()
System:
python: 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) [GCC 7.3.0]
executable: /opt/conda/bin/python
machine: Linux-4.15.0-29-generic-x86_64-with-debian-jessie-sid
BLAS:
macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
lib_dirs: /opt/conda/lib
cblas_libs: mkl_rt, pthread
Python deps:
pip: 19.1.1
setuptools: 41.0.1
sklearn: 0.21.2
numpy: 1.16.4
scipy: 1.1.0
Cython: None
pandas: None
Also, you may be interested in:
jenkins@260bf77532d0:~/workspace/test$ pip list | grep numba
numba 0.43.1
jenkins@260bf77532d0:~/workspace/test$ pip list | grep torch
torch 1.2.0a0+ab800ad
The build of torch must be done with gcc 5.5.0 to cause this problem; other versions of gcc are known not to cause this problem.
For ease of reproduction, you can use the following docker image ezyang/scikit-learn-tls-repro:1
https://cloud.docker.com/repository/registry-1.docker.io/ezyang/scikit-learn-tls-repro Once in, follow the reproduction instructions as described above. (EDIT At time of writing, the Docker image is still uploading. Should be done soon.)
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 22
- Comments: 21 (7 by maintainers)
Commits related to this issue
- requirements: Bumped sklearn to 0.20.3 Related to: - https://github.com/scikit-learn/scikit-learn/issues/14485 - https://discuss.mxnet.io/t/import-mxnet-throws-importerror-dlopen-cannot-load-any-more... — committed to spinalcordtoolbox/spinalcordtoolbox by jcohenadad 5 years ago
- .travis.yml: Allow failure for ubuntu 14.04 because of: https://github.com/scikit-learn/scikit-learn/issues/14485 — committed to spinalcordtoolbox/spinalcordtoolbox by jcohenadad 5 years ago
- Increased sensitivity of dependency testing (#2522) * sct_check_dependencies: Now importing PyQt5.QtCore to be sensitive to missing lib * sct_check_dependencies: Added comment * sct_check_depen... — committed to spinalcordtoolbox/spinalcordtoolbox by alexfoias 5 years ago
- Increased sensitivity of dependency testing (#2522) * sct_check_dependencies: Now importing PyQt5.QtCore to be sensitive to missing lib * sct_check_dependencies: Added comment * sct_check_depen... — committed to spinalcordtoolbox/spinalcordtoolbox by alexfoias 5 years ago
- Drop support for 2014-era Linux. Merging pytorch (6aa1f4e) caused > OSError: dlopen: cannot load any more object with static TLS on Debian 8 and Ubuntu Trusty. This is a known issue with pytorch, s... — committed to spinalcordtoolbox/spinalcordtoolbox by kousu 4 years ago
- Drop support for 2014-era Linux. Merging pytorch (6aa1f4e) caused > OSError: dlopen: cannot load any more object with static TLS on Debian 8 and Ubuntu Trusty. This is a known issue with pytorch, s... — committed to spinalcordtoolbox/spinalcordtoolbox by kousu 4 years ago
pip install scikit-learn
I solved it by import sklearn,then import tensorflow.The import order result in this error.
you should use ‘pip install scikit-learn==0.20.3’, and before that you’d better use ‘pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/’ . That works for me
Going through the threads for other projects related to this issue, I see a common resolution is to upgrade their version of
glibc
to be>=2.21
which includes this bug fix. Users that have Ubuntu >= 16.04 or Debian >= 9 would getglibc>=2.21
by default.@nscozzaro What version of
glibc
do you have running on your system? (ldd --version
)Thanks this comment helped me solve the issue.
Working thank you
The issue @ezyang linked has bunch of information on this TLS (thread local store) issue. Here’s some info I dug up before: https://github.com/pytorch/pytorch/issues/2575#issuecomment-369892859
;TLDR: Something in the chain of imports was not C/C++ compiled with
-gPIC
flag. Importing that library causes a problem that turns all imports to “static TLS”. There is a maximum amount of such “static TLS” slots (names I use here are surely incorrect). Exact N of slots depends on the OS, and how it was compiled.In the linked pytorch issue 2575, there is a mention that it is OpenMP which was compiled without the flag causing the cascade. This scikit-learn issue might be due to some new library being introduced or some change, eating just few more static TLS slots.
Note: Not a real expert. There might be other sources for this error than “one/some lib missing `-gPIC’ flag when it was compiled”. Haven’t found one though.
@ezyang you may want to share the
Dockerfile
if that’s possible.If anyone is interested in reproducing this error the right docker incantation to use is something like this:
Note that you need to specify the tag i.e.
1
explicitly otherwise you get a cryptic error message (the ‘latest’ tag does not exist):I have no idea why this would happen, but I have seem numerous bug reports related to this e.g. with pytorch and OpenCV https://github.com/pytorch/pytorch/issues/2083 or OpenCV and Tensorflow https://github.com/tensorflow/models/issues/523. All in all I would guess that this is not a scikit-learn bug.
The fact that it depends on the order of import is fishy, for exemple this works in your docker image: