scikit-learn: Bug in AUC metric when TP = 100%?

As an example, this works correctly:

In [13]: import numpy as np                                                                                                                         

In [14]: from sklearn import metrics                                                                                                                

In [15]: true = [1, 1, 1, 1, 1, 1, 1, 1, 1, 0.99]                                                                                                   

In [16]: pred = [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]                                                                                                      

In [17]: fpr, tpr, thresholds = metrics.roc_curve(true, pred)                                                                                       

In [18]: metrics.auc(fpr, tpr)                                                                                                                      
Out[18]: 0.22222222222222221

However, if there are no true negatives (e.g. there is only one class), an error is thrown:

In [19]: true = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

In [20]: pred = [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]                                                                                                      

In [21]: fpr, tpr, thresholds = metrics.roc_curve(true, pred)                                                                                       
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-21-35631f51a7c5> in <module>()
----> 1 fpr, tpr, thresholds = metrics.roc_curve(true, pred)

    132     # ROC only for binary classification
    133     if classes.shape[0] != 2:
--> 134         raise ValueError("ROC is defined for binary classification only")
    135 
    136     y_score = np.ravel(y_score)

ValueError: ROC is defined for binary classification only

Is this the correct behavior?

About this issue

Original URL
State: closed
Created 12 years ago
Comments: 24 (19 by maintainers)

Most upvoted comments

sklearn.metrics.roc_auc_score() is not defined when no positive example is in the ground truth for a given label (and symmetrically same issue when no negative example is in the ground truth).

E.g.

import numpy as np
import sklearn.metrics

np.random.seed(seed=1)
y_true = np.array([[0, 0, 0, 1, 0],
                   [0, 0, 1, 0, 0],
                   [1, 1, 0, 1, 0],
                   [1, 1, 1, 1, 0],
                   [1, 1, 1, 0, 0],
                   [1, 1, 0, 0, 0]])
y_score = np.random.random((6,5))
auroc = sklearn.metrics.roc_auc_score(y_true, y_score, average=None)

yields:

Traceback (most recent call last):
  File "C:\test.py", line 9, in <module>
    auroc = sklearn.metrics.roc_auc_score(y_true, y_score, average=None)
  File "C:\Anaconda\lib\site-packages\sklearn\metrics\ranking.py", line 246, in roc_auc_score
    sample_weight=sample_weight)
  File "C:\Anaconda\lib\site-packages\sklearn\metrics\base.py", line 122, in _average_binary_score
    sample_weight=score_weight)
  File "C:\Anaconda\lib\site-packages\sklearn\metrics\ranking.py", line 237, in _binary_roc_auc_score
    raise ValueError("Only one class present in y_true. ROC AUC score "
ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.

because the 5th label is always absent in the ground truth (i.e. y_true).

I guess it makes sense that the AUROC is undefined when no positive example is the ground truth for a given label:

The true positive rate (TPR), aka. sensitivity, hit rate, and recall, is defined as $ \frac{TP}{TP+FN}$.
The false positive rate (FPR), aka. fall-out, is defined as $ \frac{FP}{FP+TN}$.
The ROC plots the TPR against the FPR.

Since the test set contains no positive example, then TP = FN = 0. This means that the TPR is undefined (division by zero), which means that the ROC cannot the plotted, which means that AUROC is undefined.

That being said, in the case of multilabel classification, that would be nice if sklearn.metrics.roc_auc_score() could kindly return a warning as well as the AUROCs for the non-problematic labels instead of throwing an error (like sklearn.metrics.f1_score() does: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in samples with no true labels. and UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in samples with no predicted labels.). Or perhaps adding some option like ignore_monoclass_label=True. This way we wouldn’t have to eliminate the labels with no positive example ourselves before calling sklearn.metrics.roc_auc_score().

Franck-Dernoncourt on Aug 24, 2015

The error message could be more explicit in the multilabel case.

On 1 November 2014 21:23, Joel Nothman joel.nothman@gmail.com wrote:

I presume it’s a multilabel problem in which some label lacks either positive or negative instances.

On 31 October 2014 23:12, Arnaud Joly notifications@github.com wrote:

Can you give a small test case to reproduce?

— Reply to this email directly or view it on GitHub https://github.com/scikit-learn/scikit-learn/issues/1257#issuecomment-61251819 .

jnothman on Nov 1, 2014

I presume it’s a multilabel problem in which some label lacks either positive or negative instances.

On 31 October 2014 23:12, Arnaud Joly notifications@github.com wrote:

Can you give a small test case to reproduce?

— Reply to this email directly or view it on GitHub https://github.com/scikit-learn/scikit-learn/issues/1257#issuecomment-61251819 .

jnothman on Nov 1, 2014