scikit-learn: sklearn.cluster.KMeans 0.23 is extra slower compared to 0.22.2

Used code:

from sklearn import cluster

for k in range(1,15):
     cluster.KMeans(
           n_clusters   = k,           
           random_state = 42,      
           n_init       = 10,
           max_iter     = 2000,
           algorithm    = 'full',
           init         = 'k-means++'   )

Expected Results

Computation in v0.22.2 was done in 2mins for whole set of explored 15 k

Actual Results

Computation takes more than 20min with exactly same data and setup as before Also, computation even with k=1 takes very long time → compared to previous version lower k meant much faster computation

Versions

System: python: 3.7.6 (default, Jan 8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)] executable: C:\Users\micha\anaconda3\python.exe machine: Windows-10-10.0.18362-SP0

Python dependencies: pip: 20.0.2 setuptools: 45.2.0.post20200210 sklearn: 0.23.0 numpy: 1.18.1 scipy: 1.4.1 Cython: 0.29.15 pandas: 1.0.3 matplotlib: 3.1.3 joblib: 0.14.1

Built with OpenMP: True

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 34 (21 by maintainers)

Most upvoted comments

I updated all used packages today and newly (0.23.1) seems solved it and actually it is faster than (0.22.2). Thanks!!

Ok so part of the issue was fixed in #17235. The remainder will be tackled in #17334. Let’s close this.

ogrisel on May 25, 2020

Hello,

I updated all used packages today and newly (0.23.1) seems solved it and actually it is faster than (0.22.2). Thanks!!

MichalRIcar on May 25, 2020