scikit-learn: [BUG] Label propagation sometimes produces label_distributions that contain Nan.

Description

Invalid value encountered in true_divide through when calling fit on LabelSpreading.

After convergence, the label distribution for some samples is all zero and so the variable normalizer in label_propagation.py:291 contains some zero values causing the division self.label_disributions_ /= normalizer to produce NaN.

Maybe there is a connection to #8008? In other datasets, increasing the n_neighbors parameter to a larger than the default value, caused the issue not to appear.

Steps/Code to Reproduce

from sklearn.datasets import fetch_mldata
from sklearn.semi_supervised import label_propagation
import numpy

numpy.seterr(all='raise')

mnist = fetch_mldata('MNIST original', data_home="./tmp")

X = mnist.data[1:10000]
y = mnist.target[1:10000]

# Use only 300 labeled examples
y[300:] = -1

lp_model = label_propagation.LabelSpreading(kernel='knn', n_neighbors=7, n_jobs=-1)
lp_model.fit(X,y)

Expected Results

No error is thrown.

Actual Results

  File "reproduce.py", line 16, in <module>
    lp_model.fit(X,y)
  File "...anaconda3/envs/ssl-py3/lib/python3.6/site-packages/sklearn/semi_supervised/label_propagation.py", line 291, in fit
    self.label_distributions_ /= normalizer
FloatingPointError: invalid value encountered in true_divide

Versions

[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] NumPy 1.13.0 SciPy 0.19.0 Scikit-Learn 0.19.dev0

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 2
  • Comments: 17 (12 by maintainers)

Commits related to this issue

Most upvoted comments

I am just wonderinf if this issue has been fixed ? Any updates? Thanks!

I have replicated this issue when instantiating the LabelSpreading model with the default parameter values, i.e., LabelSpreading(). When I switch to instantiate it with LabelSpreading(gamma=0.25, max_iter=5) then the error is not thrown. Even when using gamma=0, max_iter=1 to instantiate LabelSpreading works fine just not defining the values for those parameters generates the issue: label_propagation.py:293: RuntimeWarning: invalid value encountered in divide self.label_distributions_ /= normalizer