scikit-learn: f1_score and precision_recall_fscore_support throw an error for some class labels

In [86]: from sklearn.metrics import f1_score

In [87]: f1_score([1,-1],[1,-1]) 
Out[87]: 1.0

In [88]: f1_score([2,-2],[2,-2])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-88-4c4d2deb31e4> in <module>()
----> 1 f1_score([2,-2],[2,-2])

/home/erg/python/scikit-learn/sklearn/metrics/metrics.pyc in f1_score(y_true, y_pred, labels, pos_label, average)
   1210     """
   1211     return fbeta_score(y_true, y_pred, 1, labels=labels,
-> 1212                        pos_label=pos_label, average=average)
   1213 
   1214 

/home/erg/python/scikit-learn/sklearn/metrics/metrics.pyc in fbeta_score(y_true, y_pred, beta, labels, pos_label, average)
   1357                                                  labels=labels,
   1358                                                  pos_label=pos_label,
-> 1359                                                  average=average)
   1360     return f
   1361 

/home/erg/python/scikit-learn/sklearn/metrics/metrics.pyc in precision_recall_fscore_support(y_true, y_pred, beta, labels, pos_label, average)
   1719                 return (0., 0., 0., 0)
   1720             raise ValueError("pos_label=%d is not a valid label: %r" %
-> 1721                              (pos_label, labels))
   1722         pos_label_idx = list(labels).index(pos_label)
   1723         return (precision[pos_label_idx], recall[pos_label_idx],

ValueError: pos_label=1 is not a valid label: array([-2,  2])

Related to #1990.

About this issue

Original URL
State: closed
Created 11 years ago
Comments: 18 (17 by maintainers)

Most upvoted comments

Yes, of course… The special handling of binary problems is necessary, but makes for a lot of issues.

However, I actually think throwing this error is correct behaviour, even if the error message needs expanding:

if you’re actually doing binary classification, you should set pos_label correctly
if you’re doing multiclass classification, you should be warned that your data is being treated as if your problem is binary, and you should set pos_label correctly.

What behaviour did you expect? What error message would be clearer?

(I have suggested elsewhere (#1983) and for different reasons that pos_label should be replaced by neg_label, though now I think guessing the negative label in the binary case may be a bad idea, but I think -1 is a safe default. With neg_label, rather than testing for the binariness of the targets, we simply average over the classes that remain after removing the negative label, so it doesn’t need special-casing.)

jnothman on Jun 25, 2013

Probably, this function does too much magic. The logic should change to force the user to be more explicit.

+1. With a good error message to be friendly 😃

GaelVaroquaux on Jul 30, 2013

As said by @jnothman and in the doc

    pos_label : str or int, 1 by default
        If ``average`` is not ``None`` and the classification target is binary,
        only this class's scores will be returned.

With only two labels, the function consider that the problem is a binary classification one. If you don’t put pos_label=2, you will have a failure.

Probably, this function does too much magic. The logic should change to force the user to be more explicit. I don’t think this issue should be solve for release 0.14.

arjoly on Jul 27, 2013

Probably, this function does too much magic. The logic should change to force the user to be more explicit.

jnothman on Jul 27, 2013