scikit-learn: f1_score and precision_recall_fscore_support throw an error for some class labels
In [86]: from sklearn.metrics import f1_score
In [87]: f1_score([1,-1],[1,-1])
Out[87]: 1.0
In [88]: f1_score([2,-2],[2,-2])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-88-4c4d2deb31e4> in <module>()
----> 1 f1_score([2,-2],[2,-2])
/home/erg/python/scikit-learn/sklearn/metrics/metrics.pyc in f1_score(y_true, y_pred, labels, pos_label, average)
1210 """
1211 return fbeta_score(y_true, y_pred, 1, labels=labels,
-> 1212 pos_label=pos_label, average=average)
1213
1214
/home/erg/python/scikit-learn/sklearn/metrics/metrics.pyc in fbeta_score(y_true, y_pred, beta, labels, pos_label, average)
1357 labels=labels,
1358 pos_label=pos_label,
-> 1359 average=average)
1360 return f
1361
/home/erg/python/scikit-learn/sklearn/metrics/metrics.pyc in precision_recall_fscore_support(y_true, y_pred, beta, labels, pos_label, average)
1719 return (0., 0., 0., 0)
1720 raise ValueError("pos_label=%d is not a valid label: %r" %
-> 1721 (pos_label, labels))
1722 pos_label_idx = list(labels).index(pos_label)
1723 return (precision[pos_label_idx], recall[pos_label_idx],
ValueError: pos_label=1 is not a valid label: array([-2, 2])
Related to #1990.
About this issue
- Original URL
- State: closed
- Created 11 years ago
- Comments: 18 (17 by maintainers)
Yes, of course… The special handling of binary problems is necessary, but makes for a lot of issues.
However, I actually think throwing this error is correct behaviour, even if the error message needs expanding:
pos_label
correctlypos_label
correctly.What behaviour did you expect? What error message would be clearer?
(I have suggested elsewhere (#1983) and for different reasons that
pos_label
should be replaced byneg_label
, though now I think guessing the negative label in the binary case may be a bad idea, but I think-1
is a safe default. Withneg_label
, rather than testing for the binariness of the targets, we simply average over the classes that remain after removing the negative label, so it doesn’t need special-casing.)+1. With a good error message to be friendly 😃
As said by @jnothman and in the doc
With only two labels, the function consider that the problem is a binary classification one. If you don’t put
pos_label=2
, you will have a failure.Probably, this function does too much magic. The logic should change to force the user to be more explicit. I don’t think this issue should be solve for release 0.14.
+1