scikit-learn: Parallel K-Means hangs on Mac OS X Lion

I first noticed this when running ‘make test’ hanged. I tried with stable and bleeding edge scipy (I initially thought it was something arpack related).

The test sklearn.cluster.tests.test_k_means.test_k_means_plus_plus_init_2_jobs hangs the process.

Running in IPython something like KMeans(init='k-means++', n_jobs=2).fit(np.random.randn(100, 100)) hangs as well.

I thought maybe there was something wrong with my setup, but cross_val_score works OK with n_jobs=2.

About this issue

Original URL
State: closed
Created 12 years ago
Comments: 59 (53 by maintainers)

Commits related to this issue

disable kmeans profile to test for #636 — committed to ogrisel/scikit-learn by ogrisel 12 years ago
Skip k-means parallel test on Mac OS X Lion (10.7) There is a bug that occurs in the BLAS DGEMM function after a fork on Mac OS X Lion that causes this test to hang. See issue #636 for details. This ... — committed to njwilson/scikit-learn by njwilson 12 years ago
BUG: Don't test test_k_means_plus_plus_init_2_jobs on Mac OSX >= 10.7 because it's broken. See #636. Closes #1407. — committed to erg/scikit-learn by erg 12 years ago
TST skip `test_k_means_plus_plus_init_2_jobs` on Mac OS X 10.9. See GH-636 — committed to kmike/scikit-learn by kmike 10 years ago

Most upvoted comments

@ogrisel can you remind me of the details with accelerate?

The problem is that multiprocessing does a fork without an exec. Many libraries like (some versions of) Accelerate / vecLib, (some versions of) MKL, the OpenMP runtime of GCC, nvidia’s cuda (and probably many others), manage their own internal thread pool. Upon a syscall to fork, the thread pool state in the child process is corrupted: the thread pool things it has many threads while only the main thread state has been forked. It’s possible to change the libraries to make them detect when a fork happens and reinitialize the thread pool in that case: we did that for OpenBLAS (merged upstream in master since 0.2.9) and we contributed a patch (not yet reviewed) to GCC’s OpenMP runtime.

In the end the real culprit is Python’s multiprocessing that does fork without exec (to reduce the overhead of starting and using new Python process for parallel computing, it’s kind of a hack). This is a violation of the POSIX standard and therefore organizations like Apple refuse to consider the lack of fork-safety in Accelerate / vecLib as a bug.

In Python 3.4+ it’s now possible to configure multiprocessing to use the ‘forkserver’ or ‘spawn’ start methods (instead of the default ‘fork’) to manage the process pools. This should make it possible to not be subject to this issue anymore. We don’t use it by default in joblib because it causes some overhead and would make the default behavior slightly different in Python 2.7 and Python 3.4+. Maybe we should change the default to ‘forkserver’ under POSIX to have this problem disappear for Python 3.4+ users.

ogrisel on Aug 25, 2015