scikit-learn: Some processes not working under clustering.MeanShift
Hi, I’m using the parallel version of clustering.MeanShift (which I had written, interestingly). I’ve now noticed that most of the processes are actually “sleeping”, and only a few actually work. Even more oddly, this doesn’t always happen:
- the problem is worse on some machine than on others
- the problem doesn’t seem to appear when working with 2 dimensions instead of 4 (see code below).
- changing the code to use
multiprocessing
instead ofjoblib
makes it work
I have no idea where to start…
Reproduce
When running the code
from sklearn.cluster import MeanShift
import numpy as np
ndim = 4
points = np.random.random([100000, ndim])
MS = MeanShift(n_jobs=20, bandwidth=0.1)
print("Starting.")
MS.fit(points)
a call to htop
shows:
Versions
Linux-2.6.32-573.3.1.el6.x86_64-x86_64-with-redhat-6.6-Carbon Python 3.4.2 (default, Feb 4 2015, 08:24:27) [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] NumPy 1.11.1 SciPy 0.17.1 Scikit-Learn 0.17.1
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 30 (16 by maintainers)
So it seems like the automatic batching of tasks is not well suited to some machines. I am not exactly sure why yet.
A work-around that works for me is to set joblib.parallel.MIN_IDEAL_BATCH_DURATION to a higher value. If you can test whether this snippet works for you, that’d be great: