scikit-learn: Multilabel ranking coverage error formula
Description
The formula for sklearn.metrics.coverage_error
in multilabel ranking presented in the documentation does not match with the one in the referenced paper “Tsoumakas, G., Katakis, I., & Vlahavas, I. (2010). Mining multi-label data. In Data mining and knowledge discovery handbook (pp. 667-685). Springer US.”
The original formula has a -1
at the end which is not considered in the current implementation.
Steps/Code to Reproduce
The example given in the sklearn documentation is the following:
import numpy as np
from sklearn.metrics import coverage_error
y_true = np.array([[1, 0, 0], [0, 0, 1]])
y_score = np.array([[0.75, 0.5, 1], [1, 0.2, 0.1]])
coverage_error(y_true, y_score)
which returns 2.5.
Expected Results
I think that the expected value should be 1.5.
Actual Results
The current sklearn implementation returns 2.5.
Versions
Windows-7-6.1.7601-SP1
Python 3.5.2 |Anaconda 4.0.0 (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
NumPy 1.11.2
SciPy 0.18.0
Scikit-Learn 0.18
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 27 (23 by maintainers)
-1 for a switch.