scikit-learn: ndcg_score fails for negative scores

Description

The method ndcg_score from sklearn.metrics fails when the true relevance scores are negative.

Steps/Code to Reproduce

import numpy as np
from sklearn.metrics import ndcg_score 

y_true  = np.array([-0.89, -0.53, -0.47, 0.39, 0.56]).reshape(1,-1)
y_score = np.array([0.07,0.31,0.75,0.33,0.27]).reshape(1,-1)

ndcg_score(y_true, y_score)  # Should be between 0 and 1 as per the docstring
>>> 396.0329594603174

ndcg_score(y_true+1, y_score+1)
>>> 0.7996030755957273

Expected Results

The documentation doesn’t explicitly state that y_true or y_score should be non-negative. The cited Wikipedia article for DCG doesn’t seem to mention a non-negativity assumption either. So either the method should be able to deal with scores regardless of sign, or the documentation should explicitly say otherwise.

Disclosure/Question: I’m not an expert in ranking metrics, but it seems that there might be cases where one might want to compare lists of scores in (-∞,∞) based on their ordering alone (i.e., not MSE or related metrics). Is there any other metric in scikit-learn that is more appropriate for that use case?

Versions

numpy: 1.18.1 scipy: 1.4.1 sklearn: 0.22.1

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 2
  • Comments: 15 (9 by maintainers)

Commits related to this issue

Most upvoted comments

This is what I think of this issue. Please let me know your comments.

NDCG is a metric that should be a value between 0 and 1. This is given by NDCG = actual DCG / Ideal DCG. Rephrasing it, NDCG is actually a measure of where our DCG is on a scale of 0 and Ideal DCG. This is because, min DCG is always assumed to be 0, which might be true in other cases, but not in a case where we have negative y_true values. So, correcting the above rephrased sentence, we can say, NDCG is actually a measure of where our DCG is located on a scale of min DCG and Ideal DCG

So, the actual NDCG formula should have been, NDCG = (actual DCG - min DCG) / (Ideal DCG - min DCG). When lower bound (i.e., minimum) is 0, we get back the formula that we always used.

Next, is how do we calculate Ideal DCG and min DCG given y_true and y_score.

# From the wiki, sorting all relevant documents in the corpus by their
# relative relevance, produces the maximum possible DCG through position,
# also called Ideal DCG (IDCG) through that position

Ideal DCG = dcg_score(sorted(y_true, reverse=True), sorted(y_true, reverse=True))
# Likewise to get min DCG, take the documents in the opposite direction.
# Notice that the 2nd argument is sorted in ascending order
# (least relevance to most relevance)

min DCG = dcg_score(sorted(y_true, reverse=True), sorted(y_true, reverse=False))

NDCG score calculation:

y_true  = np.array([-0.89, -0.53, -0.47, 0.39, 0.56]).reshape(1,-1)
y_score = np.array([0.07, 0.31, 0.75, 0.33, 0.27]).reshape(1,-1)
max_dcg  = -0.001494970324771916
min_dcg =  -1.0747913396929056
actual_dcg =  -0.5920575220247735
ndcg_score = 0.44976749334605975

On a final note, y_score values just provide the relative ranks (or positions). y_score values are not involved in the calculation of dcg (or ndcg), only their positions are taken. So, shifting the y_score values by a constant amount shouldn’t affect the ndcg score, right (because the positions wouldn’t change)? And when you consider a formula with min DCG, I can see this to hold true.

# I have increased all y_score values by 1
y_true  = np.array([-0.89, -0.53, -0.47, 0.39, 0.56]).reshape(1,-1)
y_score = np.array([1.07, 1.31, 1.75, 1.33, 1.27]).reshape(1,-1)
max_dcg = 2.9469641485546205
min_dcg: 1.8736677791864862
actual_dcg = 2.3564015968546186
ndcg_score = 0.4497674933460597

Hi there, The implementation in the branch that mentioned this issue throws a “DeprecationWarning” as suggested by jeromesdockes. This sort of gives the best of both worlds, that it allows current functionality to still be used, but throws the warning about ndcg_score not always providing results between 0 and 1 for negative y_true values.

Feel free to suggest other changes too, if there’s another change in mind. I’ll be making a PR to this repo within a few days, once I get docs and tests done.