scikit-learn: ImportError: dlopen: cannot load any more object with static TLS with torch built with gcc 5.5

I am not sure if this is a PyTorch bug, a scikit-learn bug or a numba, but this used to work in scikit-learn 0.20.3 and stopped working in the 0.21.0 series, so for now I am going to venture a guess that it is a regression in scikit learn.

When I do the following series of imports (minimized from the original import, which was import librosa), loading the following program fails:

import torch
import soundfile
import scipy.signal
import numba
import sklearn

with

Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/sklearn/__check_build/__init__.py", line 44, in <module>
    from ._check_build import check_build  # noqa
ImportError: dlopen: cannot load any more object with static TLS

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test_torch.py", line 5, in <module>
    import sklearn
  File "/opt/conda/lib/python3.6/site-packages/sklearn/__init__.py", line 75, in <module>
    from . import __check_build
  File "/opt/conda/lib/python3.6/site-packages/sklearn/__check_build/__init__.py", line 46, in <module>
    raise_build_error(e)
  File "/opt/conda/lib/python3.6/site-packages/sklearn/__check_build/__init__.py", line 41, in raise_build_error
    %s""" % (e, local_dir, ''.join(dir_content).strip(), msg))
ImportError: dlopen: cannot load any more object with static TLS
___________________________________________________________________________
Contents of /opt/conda/lib/python3.6/site-packages/sklearn/__check_build:
_check_build.cpython-36m-x86_64-linux-gnu.so__pycache__               __init__.py
setup.py
___________________________________________________________________________
It seems that scikit-learn has not been built correctly.

If you have installed scikit-learn from source, please do not forget
to build the package before using it: run `python setup.py install` or
`make` in the source directory.

If you have used an installer, please check that it is suited for your
Python version, your operating system and your platform.

Downgrading to scikit-learn 0.20.3 makes the problem go away.

Versions

jenkins@260bf77532d0:~/workspace/test$ python
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sklearn; sklearn.show_versions()

System:
    python: 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34)  [GCC 7.3.0]
executable: /opt/conda/bin/python
   machine: Linux-4.15.0-29-generic-x86_64-with-debian-jessie-sid

BLAS:
    macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
  lib_dirs: /opt/conda/lib
cblas_libs: mkl_rt, pthread

Python deps:
       pip: 19.1.1
setuptools: 41.0.1
   sklearn: 0.21.2
     numpy: 1.16.4
     scipy: 1.1.0
    Cython: None
    pandas: None

Also, you may be interested in:

jenkins@260bf77532d0:~/workspace/test$ pip list | grep numba
numba                  0.43.1         
jenkins@260bf77532d0:~/workspace/test$ pip list | grep torch
torch                  1.2.0a0+ab800ad

The build of torch must be done with gcc 5.5.0 to cause this problem; other versions of gcc are known not to cause this problem.

For ease of reproduction, you can use the following docker image ezyang/scikit-learn-tls-repro:1 https://cloud.docker.com/repository/registry-1.docker.io/ezyang/scikit-learn-tls-repro Once in, follow the reproduction instructions as described above. (EDIT At time of writing, the Docker image is still uploading. Should be done soon.)

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 22
Comments: 21 (7 by maintainers)

Commits related to this issue

requirements: Bumped sklearn to 0.20.3 Related to: - https://github.com/scikit-learn/scikit-learn/issues/14485 - https://discuss.mxnet.io/t/import-mxnet-throws-importerror-dlopen-cannot-load-any-more... — committed to spinalcordtoolbox/spinalcordtoolbox by jcohenadad 5 years ago
.travis.yml: Allow failure for ubuntu 14.04 because of: https://github.com/scikit-learn/scikit-learn/issues/14485 — committed to spinalcordtoolbox/spinalcordtoolbox by jcohenadad 5 years ago
Increased sensitivity of dependency testing (#2522) * sct_check_dependencies: Now importing PyQt5.QtCore to be sensitive to missing lib * sct_check_dependencies: Added comment * sct_check_depen... — committed to spinalcordtoolbox/spinalcordtoolbox by alexfoias 5 years ago
Increased sensitivity of dependency testing (#2522) * sct_check_dependencies: Now importing PyQt5.QtCore to be sensitive to missing lib * sct_check_dependencies: Added comment * sct_check_depen... — committed to spinalcordtoolbox/spinalcordtoolbox by alexfoias 5 years ago
Drop support for 2014-era Linux. Merging pytorch (6aa1f4e) caused > OSError: dlopen: cannot load any more object with static TLS on Debian 8 and Ubuntu Trusty. This is a known issue with pytorch, s... — committed to spinalcordtoolbox/spinalcordtoolbox by kousu 4 years ago
Drop support for 2014-era Linux. Merging pytorch (6aa1f4e) caused > OSError: dlopen: cannot load any more object with static TLS on Debian 8 and Ubuntu Trusty. This is a known issue with pytorch, s... — committed to spinalcordtoolbox/spinalcordtoolbox by kousu 4 years ago

Most upvoted comments

pip install scikit-learn

+56

ezyang on Jul 26, 2019

I solved it by import sklearn,then import tensorflow.The import order result in this error.

+27

Ningshiqi on May 25, 2020

pip install scikit-learn

you should use ‘pip install scikit-learn==0.20.3’, and before that you’d better use ‘pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/’ . That works for me

+13

Magic-oak on Jan 17, 2021

Going through the threads for other projects related to this issue, I see a common resolution is to upgrade their version of glibc to be >=2.21 which includes this bug fix. Users that have Ubuntu >= 16.04 or Debian >= 9 would get glibc>=2.21 by default.

@nscozzaro What version of glibc do you have running on your system? (ldd --version)

thomasjpfan on Apr 19, 2021

Going through the threads for other projects related to this issue, I see a common resolution is to upgrade their version of glibc to be >=2.21 which includes this bug fix. Users that have Ubuntu >= 16.04 or Debian >= 9 would get glibc>=2.21 by default.

@nscozzaro What version of glibc do you have running on your system? (ldd --version)

Thanks this comment helped me solve the issue.

Th3CracKed on Nov 29, 2021

pip install scikit-learn

you should use ‘pip install scikit-learn==0.20.3’, and before that you’d better use ‘pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/’ . That works for me

Working thank you

Arunpoochelvan on Mar 11, 2021

The issue @ezyang linked has bunch of information on this TLS (thread local store) issue. Here’s some info I dug up before: https://github.com/pytorch/pytorch/issues/2575#issuecomment-369892859

;TLDR: Something in the chain of imports was not C/C++ compiled with -gPIC flag. Importing that library causes a problem that turns all imports to “static TLS”. There is a maximum amount of such “static TLS” slots (names I use here are surely incorrect). Exact N of slots depends on the OS, and how it was compiled.

In the linked pytorch issue 2575, there is a mention that it is OpenMP which was compiled without the flag causing the cascade. This scikit-learn issue might be due to some new library being introduced or some change, eating just few more static TLS slots.

Note: Not a real expert. There might be other sources for this error than “one/some lib missing `-gPIC’ flag when it was compiled”. Haven’t found one though.

lautjy on Aug 17, 2019

@ezyang you may want to share the Dockerfile if that’s possible.

If anyone is interested in reproducing this error the right docker incantation to use is something like this:

docker run -it ezyang/scikit-learn-tls-repro:1 bash

Note that you need to specify the tag i.e. 1 explicitly otherwise you get a cryptic error message (the ‘latest’ tag does not exist):

Unable to find image 'ezyang/scikit-learn-tls-repro:latest' locally
docker: Error response from daemon: manifest for ezyang/scikit-learn-tls-repro:latest not found.

I have no idea why this would happen, but I have seem numerous bug reports related to this e.g. with pytorch and OpenCV https://github.com/pytorch/pytorch/issues/2083 or OpenCV and Tensorflow https://github.com/tensorflow/models/issues/523. All in all I would guess that this is not a scikit-learn bug.

The fact that it depends on the order of import is fishy, for exemple this works in your docker image:

python -c 'import torch; import sklearn; import soundfile; import scipy.signal; import numba'

lesteve on Aug 1, 2019

                
Collecting scikit-learn                                                                           
  Using cached https://files.pythonhosted.org/packages/85/04/49633f490f726da6e454fddc8e938bbb5bfed
2001681118d3814c219b723/scikit_learn-0.21.2-cp36-cp36m-manylinux1_x86_64.whl

ezyang on Jul 26, 2019