scikit-learn: [BUG] Label propagation sometimes produces label_distributions that contain Nan.
Description
Invalid value encountered in true_divide through when calling fit on LabelSpreading.
After convergence, the label distribution for some samples is all zero and so the variable normalizer
in label_propagation.py:291 contains some zero values causing the division self.label_disributions_ /= normalizer
to produce NaN.
Maybe there is a connection to #8008? In other datasets, increasing the n_neighbors
parameter to a larger than the default value, caused the issue not to appear.
Steps/Code to Reproduce
from sklearn.datasets import fetch_mldata
from sklearn.semi_supervised import label_propagation
import numpy
numpy.seterr(all='raise')
mnist = fetch_mldata('MNIST original', data_home="./tmp")
X = mnist.data[1:10000]
y = mnist.target[1:10000]
# Use only 300 labeled examples
y[300:] = -1
lp_model = label_propagation.LabelSpreading(kernel='knn', n_neighbors=7, n_jobs=-1)
lp_model.fit(X,y)
Expected Results
No error is thrown.
Actual Results
File "reproduce.py", line 16, in <module>
lp_model.fit(X,y)
File "...anaconda3/envs/ssl-py3/lib/python3.6/site-packages/sklearn/semi_supervised/label_propagation.py", line 291, in fit
self.label_distributions_ /= normalizer
FloatingPointError: invalid value encountered in true_divide
Versions
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] NumPy 1.13.0 SciPy 0.19.0 Scikit-Learn 0.19.dev0
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 2
- Comments: 17 (12 by maintainers)
Commits related to this issue
- Bug fix: Label propagation sometimes produces label_distributions that contain Nan.(#9292) — committed to ThuWangzw/scikit-learn by ThuWangzw 3 years ago
I am just wonderinf if this issue has been fixed ? Any updates? Thanks!
I have replicated this issue when instantiating the LabelSpreading model with the default parameter values, i.e., LabelSpreading(). When I switch to instantiate it with LabelSpreading(gamma=0.25, max_iter=5) then the error is not thrown. Even when using gamma=0, max_iter=1 to instantiate LabelSpreading works fine just not defining the values for those parameters generates the issue: label_propagation.py:293: RuntimeWarning: invalid value encountered in divide self.label_distributions_ /= normalizer