scikit-learn: KMeans singnificantly slower on 0.23

Describe the bug

With the latest changes, KMeans is significantly slower on small datasets. The time needed to compute clusters is around ten times longer.

Steps/Code to Reproduce

Times with the following code are: scikit-lern 0.22: ~0.015 scikit-learn 0.23: ~0.15

import time

import sklearn.cluster
from sklearn import datasets

data = datasets.load_iris()['data']

t = time.time()
sklearn.cluster.KMeans(n_clusters=2).fit(data)
print(time.time() - t)

I also tried on a bigger dataset with shape (300, 25) where clustering with the new version needed 3-4s while before it happened in miliseconds.

Expected Results

Clusters would be computed as fast as before.

Versions

System:
    python: 3.7.6 | packaged by conda-forge | (default, Jan  7 2020, 22:05:27)  [Clang 9.0.1 ]
executable: /Users/primoz/miniconda3/envs/orange/bin/python
   machine: Darwin-19.0.0-x86_64-i386-64bit
Python dependencies:
       pip: 20.1
setuptools: 46.1.3
   sklearn: 0.23.0
     numpy: 1.18.4
     scipy: 1.4.1
    Cython: None
    pandas: 1.0.3
matplotlib: 3.2.1
    joblib: 0.14.1
Built with OpenMP: True

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (11 by maintainers)

Most upvoted comments

@jeremiedbb thank you for your help. I tested the PR and it works now normally.