scikit-learn: BUG: ArgKmin64 on Windows with scipy 1.13rc1 or 1.14.dev times out

In MNE-Python our Windows pip-pre job on Azure has started reliably timing out (and a second example):

mne/preprocessing/tests/test_interpolate.py::test_find_centroid PASSED   [ 38%]
##[error]The Operation will be canceled. The next steps may not contain expected logs.
Fatal Python error: PyThreadState_Get: the function must be called with the GIL held, but the GIL is released (the current Python thread state is NULL)
Python runtime state: initialized

...
Thread 0x000014fc (most recent call first):
  File "C:\hostedtoolcache\windows\Python\3.11.8\x64\Lib\site-packages\sklearn\metrics\_pairwise_distances_reduction\_dispatcher.py", line 278 in compute
  File "C:\hostedtoolcache\windows\Python\3.11.8\x64\Lib\site-packages\sklearn\neighbors\_base.py", line 850 in kneighbors
  File "C:\hostedtoolcache\windows\Python\3.11.8\x64\Lib\site-packages\sklearn\neighbors\_lof.py", line 291 in fit
  File "C:\hostedtoolcache\windows\Python\3.11.8\x64\Lib\site-packages\sklearn\base.py", line 1474 in wrapper
  File "C:\hostedtoolcache\windows\Python\3.11.8\x64\Lib\site-packages\sklearn\neighbors\_lof.py", line 256 in fit_predict
  File "D:\a\1\s\mne\preprocessing\_lof.py", line 89 in find_bad_channels_lof
  File "<decorator-gen-627>", line 12 in find_bad_channels_lof
  File "D:\a\1\s\mne\preprocessing\tests\test_lof.py", line 31 in test_lof
...

Our code just calls the following (and hasn’t been changed):

    clf = LocalOutlierFactor(n_neighbors=n_neighbors, metric=metric)
    clf.fit_predict(data)

which eventually in the traceback points to the line:

https://github.com/scikit-learn/scikit-learn/blob/e5ce4bc0f6eb8fe21cd4e3dcebabefcc5485f907/sklearn/metrics/_pairwise_distances_reduction/_dispatcher.py#L278

18 hours ago all our tests passed in 40 minutes, then 3 hours ago it started failing 38% through the tests with a 70 minute timeout, and gets to the point only ~27 minutes into the build:

image

This suggests that the latest scientific-python-nightly-wheels upload of scikit-learn (and/or NumPy) 11 hours ago caused something in here to hang, so probably some recent PR to sklearn or NumPy is the culprit.

Not exactly a MWE – I’m not on Windows at the moment but could switch at some point – but maybe someone has an idea about why it’s happening…?

About this issue

  • Original URL
  • State: closed
  • Created 4 months ago
  • Reactions: 1
  • Comments: 24 (23 by maintainers)

Most upvoted comments

There was a bump to OpenBLAS 0.326 in https://github.com/scipy/scipy/pull/20215 maybe that’s it?

Hmmm maybe hard to tell … I am putting below what I learned so far, to be continued.

Here is a simpler scikit-learn snippet reproducing the hang, this seems to be related to ArgKmin in pairwise distances reductions. cc @jeremiedbb and @jjerphan in case they have some insights into this.

from sklearn.metrics._pairwise_distances_reduction import ArgKmin
import numpy as np
import threadpoolctl

# Uncommenting the next line fixes it, a similar line with OpenMP fixes it as well I think
# threadpoolctl.threadpool_limits(limits=1, user_api='blas')
X = np.zeros((20, 14000))
ArgKmin.compute(X=X, Y=X, k=10, metric='euclidean')

The sklearn show_versions info.

System:
    python: 3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:42:31) [MSC v.1937 64 bit (AMD64)]
executable: C:\Users\rjxQE\AppData\Local\miniforge3\envs\lof-issue\python.exe
   machine: Windows-10-10.0.19045-SP0

Python dependencies:
      sklearn: 1.4.1.post1
          pip: 24.0
   setuptools: 69.2.0
        numpy: 1.26.4
        scipy: 1.14.0.dev0
       Cython: None
       pandas: None
   matplotlib: None
       joblib: 1.3.2
threadpoolctl: 3.3.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
    num_threads: 4
         prefix: vcomp
       filepath: C:\Users\rjxQE\AppData\Local\miniforge3\envs\lof-issue\Lib\site-packages\sklearn\.libs\vcomp140.dll
        version: None

       user_api: blas
   internal_api: openblas
    num_threads: 4
         prefix: libopenblas
       filepath: C:\Users\rjxQE\AppData\Local\miniforge3\envs\lof-issue\Lib\site-packages\numpy.libs\libopenblas64__v
0.3.23-293-gc2f4bdbb-gcc_10_3_0-2bde3a66a51006b2b53eb373ff767a3f.dll
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: Haswell

       user_api: blas
   internal_api: openblas
    num_threads: 4
         prefix: libopenblas
       filepath: C:\Users\rjxQE\AppData\Local\miniforge3\envs\lof-issue\Lib\site-packages\scipy.libs\libopenblas_v0.3
.26-gcc_10_3_0-75ebbb8345f75277878db24d649d8b7e.dll
        version: 0.3.26
threading_layer: pthreads
   architecture: Haswell

Actually debugging a bit further it seems like this is due to OpenBLAS 0.3.26, I can reproduce the hang with conda-forge packages, see https://github.com/scipy/scipy/issues/20294#issuecomment-2009203677

I guess this will need to be reported to OpenBLAS, although putting together some kind of reproducer will be a bit of work.

In my VM:

$ python -c "import numpy; import scipy.linalg; import sklearn.neighbors; from threadpoolctl import threadpool_info; from pprint import pprint; pprint(threadpool_info())"
[{'architecture': 'Haswell',
  'filepath': 'C:\\Users\\tester\\mne-python\\1.6.1_0\\Lib\\site-packages\\numpy.libs\\libopenblas64__v0.3.23-293-gc2f4bdbb-gcc_10_3_0-2bde3a66a51006b2b53eb373ff767a3f.dll',
  'internal_api': 'openblas',
  'num_threads': 4,
  'prefix': 'libopenblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.23.dev'},
 {'architecture': 'Haswell',
  'filepath': 'C:\\Users\\tester\\mne-python\\1.6.1_0\\Lib\\site-packages\\scipy.libs\\libopenblas_v0.3.26-gcc_10_3_0-75ebbb8345f75277878db24d649d8b7e.dll',
  'internal_api': 'openblas',
  'num_threads': 4,
  'prefix': 'libopenblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.26'},
 {'filepath': 'C:\\Users\\tester\\mne-python\\1.6.1_0\\Lib\\site-packages\\sklearn\\.libs\\vcomp140.dll',
  'internal_api': 'openmp',
  'num_threads': 4,
  'prefix': 'vcomp',
  'user_api': 'openmp',
  'version': None}]

Azure:

[{'architecture': 'SkylakeX',
  'filepath': 'C:\\hostedtoolcache\\windows\\Python\\3.11.8\\x64\\Lib\\site-packages\\scipy.libs\\libopenblas_v0.3.26-gcc_10_3_0-75ebbb8345f75277878db24d649d8b7e.dll',
  'internal_api': 'openblas',
  'num_threads': 2,
  'prefix': 'libopenblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.26'},
 {'filepath': 'C:\\hostedtoolcache\\windows\\Python\\3.11.8\\x64\\Lib\\site-packages\\sklearn\\.libs\\vcomp140.dll',
  'internal_api': 'openmp',
  'num_threads': 2,
  'prefix': 'vcomp',
  'user_api': 'openmp',
  'version': None}]

And I can confirm OPENBLAS_NUM_THREADS=1 fixes it locally at least.