scikit-learn: KMeans singnificantly slower on 0.23
Describe the bug
With the latest changes, KMeans is significantly slower on small datasets. The time needed to compute clusters is around ten times longer.
Steps/Code to Reproduce
Times with the following code are: scikit-lern 0.22: ~0.015 scikit-learn 0.23: ~0.15
import time
import sklearn.cluster
from sklearn import datasets
data = datasets.load_iris()['data']
t = time.time()
sklearn.cluster.KMeans(n_clusters=2).fit(data)
print(time.time() - t)
I also tried on a bigger dataset with shape (300, 25) where clustering with the new version needed 3-4s while before it happened in miliseconds.
Expected Results
Clusters would be computed as fast as before.
Versions
System:
python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:05:27) [Clang 9.0.1 ]
executable: /Users/primoz/miniconda3/envs/orange/bin/python
machine: Darwin-19.0.0-x86_64-i386-64bit
Python dependencies:
pip: 20.1
setuptools: 46.1.3
sklearn: 0.23.0
numpy: 1.18.4
scipy: 1.4.1
Cython: None
pandas: 1.0.3
matplotlib: 3.2.1
joblib: 0.14.1
Built with OpenMP: True
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 15 (11 by maintainers)
@jeremiedbb thank you for your help. I tested the PR and it works now normally.