recommenders: [BUG] Evaluation function ndgc_at_k returns incorrect values

Description

The function ndcg_at_k returns incorrect values.

For example, using the example described in https://en.wikipedia.org/wiki/Discounted_cumulative_gain:

In response to a search query, the system returns 6 documents, labelled from D1 to D6. The user provides the true relevance scores for those documents as well as for two more documents (D7 and D8) as follows: 3, 2, 3, 0, 1, 2, 3, 2.

As described in the wiki page, the correct value of NDCG@6 should be 0.785 but ndcg_at_k returns 1.0

The cause could be that in the function’s body, DCG and NDCG are calculated always using 1 as numerator while the formula requires to use the true relevance score. For example this is how DCG is calculated:

df_dcg["dcg"] = 1 / np.log1p(df_dcg["rank"])

In which platform does it happen?

MacOS Python 3.9.12 Recommenders 1.1.0

How do we replicate the issue?

import math

import numpy as np
import pandas as pd
from recommenders.evaluation.python_evaluation import ndcg_at_k

df_true = pd.DataFrame(
    {
        "user_id": np.full(8, 0, dtype=int),
        "item_id": np.arange(8),
        "rating": np.asarray([3, 2, 3, 0, 1, 2, 3, 2]),
    }
)

df_pred = pd.DataFrame(
    {
        "user_id": np.full(6, 0, dtype=int),
        "item_id": np.arange(6),
        "prediction": np.asarray([6, 5, 4, 3, 2, 1]),
    }
)

score = ndcg_at_k(
    rating_true=df_true,
    rating_pred=df_pred,
    col_user="user_id",
    col_item="item_id",
    col_rating="rating",
    col_prediction="prediction",
    k=6,
)

print(score)
assert score == math.isclose(0.78500, 1e-12)

Expected behavior (i.e. solution)

For the example described above ndcg_at_k should return 0.785

Other Comments

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 16

Most upvoted comments

@yueguoguo

Thank you for your response. I would like to add one point regarding

e.g., the function ndcg offers a parameter that allows the user to apply either hits-based calculation or rating-based calculation, depending on whether rating is available or not.

For that we can simply pass form="identity" in my implementation. Looking forward to more feedback. 😃