scikit-learn: error in average_precision_score

I believe there is an error in sklearn.metrics.average_precision_score. Here is a script to show the result:

from sklearn.metrics import average_precision_score

y_true = [
       1, 1,
       1, 1,
       1, 1,
       1, -1,      # Single negative here, pos 8
       1, 1
       ]
y_score = list(range(len(y_true)))

for average in ('micro', 'macro', 'weighted', 'samples'):
    print( "Average:", average)
    print(average_precision_score(y_true, y_score, average=average))

What we have are essentially ten documents, nine of which are Positive. The example at position eight is negative. According to my understanding (and Wikipedia, see https://en.wikipedia.org/wiki/Information_retrieval#Average_precision third equation for AveP down the screen), the average precision score should be:

| Pos | Value | Precision |                                 
| === |   === | ========  |                                 
|   1 |     1 | 1/1       |                                 
|   2 |     1 | 2/2       |                                 
|   3 |     1 | 3/3       |                                 
|   4 |     1 | 4/4       |                                 
|   5 |     1 | 5/5       |                                 
|   6 |     1 | 6/6       |                                 
|   7 |     1 | 7/7       |                                 
|   8 |     0 | 7/8       | Contributes zero since negative example
|   9 |     1 | 8/9       |                                 
|  10 |     1 | 9/10      |                                 

The True Positive (TP) value here is 9. The average precision = (1*7 + 8/9 + 9/10) / TP = 0.976 The value produced by the script (for all averaging schemes) = 0.865 I believe a_p_s is producing this value because it is dividing by N instead of TP.

Regards, -Tom

About this issue

  • Original URL
  • State: closed
  • Created 9 years ago
  • Comments: 18 (13 by maintainers)

Most upvoted comments

I agree it’d be a good idea to put a disclaimer on the docs page that this doesn’t agree with the most straightforward definition of AP. Took a little bit of time to figure out what was going on.

Tentatively, it would be good to at least document this confusion about average precision. The overestimation of the true mAP as shown in #6377 is a rather critical bug.