scikit-learn: Multilabel ranking coverage error formula

Description

The formula for sklearn.metrics.coverage_error in multilabel ranking presented in the documentation does not match with the one in the referenced paper “Tsoumakas, G., Katakis, I., & Vlahavas, I. (2010). Mining multi-label data. In Data mining and knowledge discovery handbook (pp. 667-685). Springer US.”

The original formula has a -1 at the end which is not considered in the current implementation.

Steps/Code to Reproduce

The example given in the sklearn documentation is the following:

import numpy as np
from sklearn.metrics import coverage_error
y_true = np.array([[1, 0, 0], [0, 0, 1]])
y_score = np.array([[0.75, 0.5, 1], [1, 0.2, 0.1]])
coverage_error(y_true, y_score)

which returns 2.5.

Expected Results

I think that the expected value should be 1.5.

Actual Results

The current sklearn implementation returns 2.5.

Versions

Windows-7-6.1.7601-SP1
Python 3.5.2 |Anaconda 4.0.0 (64-bit)| (default, Jul  5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
NumPy 1.11.2
SciPy 0.18.0
Scikit-Learn 0.18

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 27 (23 by maintainers)

Most upvoted comments

-1 for a switch.