scikit-learn: BUG: ArgKmin64 on Windows with scipy 1.13rc1 or 1.14.dev times out
In MNE-Python our Windows pip-pre job on Azure has started reliably timing out (and a second example):
mne/preprocessing/tests/test_interpolate.py::test_find_centroid PASSED [ 38%]
##[error]The Operation will be canceled. The next steps may not contain expected logs.
Fatal Python error: PyThreadState_Get: the function must be called with the GIL held, but the GIL is released (the current Python thread state is NULL)
Python runtime state: initialized
...
Thread 0x000014fc (most recent call first):
File "C:\hostedtoolcache\windows\Python\3.11.8\x64\Lib\site-packages\sklearn\metrics\_pairwise_distances_reduction\_dispatcher.py", line 278 in compute
File "C:\hostedtoolcache\windows\Python\3.11.8\x64\Lib\site-packages\sklearn\neighbors\_base.py", line 850 in kneighbors
File "C:\hostedtoolcache\windows\Python\3.11.8\x64\Lib\site-packages\sklearn\neighbors\_lof.py", line 291 in fit
File "C:\hostedtoolcache\windows\Python\3.11.8\x64\Lib\site-packages\sklearn\base.py", line 1474 in wrapper
File "C:\hostedtoolcache\windows\Python\3.11.8\x64\Lib\site-packages\sklearn\neighbors\_lof.py", line 256 in fit_predict
File "D:\a\1\s\mne\preprocessing\_lof.py", line 89 in find_bad_channels_lof
File "<decorator-gen-627>", line 12 in find_bad_channels_lof
File "D:\a\1\s\mne\preprocessing\tests\test_lof.py", line 31 in test_lof
...
Our code just calls the following (and hasn’t been changed):
clf = LocalOutlierFactor(n_neighbors=n_neighbors, metric=metric)
clf.fit_predict(data)
which eventually in the traceback points to the line:
18 hours ago all our tests passed in 40 minutes, then 3 hours ago it started failing 38% through the tests with a 70 minute timeout, and gets to the point only ~27 minutes into the build:
This suggests that the latest scientific-python-nightly-wheels upload of scikit-learn (and/or NumPy) 11 hours ago caused something in here to hang, so probably some recent PR to sklearn or NumPy is the culprit.
Not exactly a MWE – I’m not on Windows at the moment but could switch at some point – but maybe someone has an idea about why it’s happening…?
About this issue
- Original URL
- State: closed
- Created 4 months ago
- Reactions: 1
- Comments: 24 (23 by maintainers)
Hmmm maybe hard to tell … I am putting below what I learned so far, to be continued.
Here is a simpler scikit-learn snippet reproducing the hang, this seems to be related to ArgKmin in pairwise distances reductions. cc @jeremiedbb and @jjerphan in case they have some insights into this.
The sklearn show_versions info.
Actually debugging a bit further it seems like this is due to OpenBLAS 0.3.26, I can reproduce the hang with conda-forge packages, see https://github.com/scipy/scipy/issues/20294#issuecomment-2009203677
I guess this will need to be reported to OpenBLAS, although putting together some kind of reproducer will be a bit of work.
In my VM:
Azure:
And I can confirm
OPENBLAS_NUM_THREADS=1
fixes it locally at least.maybe related to https://github.com/scipy/scipy/issues/20271