scikit-learn: BisectingKMeans floating point exception fatal error on OSX
Describe the bug
Hi, thanks for this fantastic library 🚀 🎉
I believe I found a bug. In the BisectingKMeans clustering, the predict
function cannot be used when the to-be-predicted-data is on a different numerical scale than the fitted data. Other clustering methods like e.g., KMeans dont have this problem.
In the example below, I fitted the BisectingKMeans on random data and then multiplied the data that should be predicted by 50. It causes a floating point exception error which is pretty bad because python silently exits.
Steps/Code to Reproduce
from sklearn.cluster import BisectingKMeans, KMeans
import numpy as np
x = np.random.rand(3000, 10)
bisect_means = BisectingKMeans(n_clusters=10).fit(x)
labels = bisect_means.predict(50*np.random.rand(100, 10))
print(labels)
Expected Results
A list of predicted class labels
Actual Results
Python silently exits with:
[1] 45074 floating point exception python foo.py
In Jupyter/ipython the kernel simply dies.
Versions
System:
python: 3.9.16 (main, Mar 8 2023, 04:29:44) [Clang 14.0.6 ]
executable: /Users/jab/miniconda3/envs/qondot/bin/python
machine: macOS-10.16-x86_64-i386-64bit
Python dependencies:
sklearn: 1.3.0
pip: 23.0.1
setuptools: 67.8.0
numpy: 1.21.6
scipy: 1.10.1
Cython: 0.29.35
pandas: 1.5.3
matplotlib: 3.7.1
joblib: 1.2.0
threadpoolctl: 3.1.0
Built with OpenMP: True
threadpoolctl info:
user_api: openmp
internal_api: openmp
prefix: libomp
filepath: /Users/jab/miniconda3/envs/qondot/lib/python3.9/site-packages/sklearn/.dylibs/libomp.dylib
version: None
num_threads: 8
user_api: blas
internal_api: openblas
prefix: libopenblas
filepath: /Users/jab/miniconda3/envs/qondot/lib/python3.9/site-packages/numpy/.dylibs/libopenblas.0.dylib
version: 0.3.17
threading_layer: pthreads
architecture: Haswell
num_threads: 4
user_api: blas
internal_api: openblas
prefix: libopenblas
filepath: /Users/jab/miniconda3/envs/qondot/lib/python3.9/site-packages/scipy/.dylibs/libopenblas.0.dylib
version: 0.3.18
threading_layer: pthreads
architecture: SkylakeX
num_threads: 4
About this issue
- Original URL
- State: closed
- Created 10 months ago
- Comments: 21 (13 by maintainers)
Thanks for all the effort @ogrisel.
lldb
, here is the output:The error code (
EXC_ARITHMETIC (code=EXC_I386_DIV, subcode=0x0)
) seems to suggest a division-by-zero error inside_k_means_lloyd.cpython-311-darwin.so