recommenders: [BUG] xDeepFM output file length less than test file length

Description

After fitting an xDeepFM model using CIN and DNN, I call model.predict() passing in the test file in FFM format with 12788 rows of test examples in the FFM format. When I examine the output file created I see that there are 12689 rows of predictions.

I’ve used different train, validate, test splits and find that I keep seeing fewer rows of predictions than test examples.

The way the model is implemented, the output file rows do not have information such as userID, itemID. If this were the case, at least I could see which examples did not have a prediction returned for them.

What might be the cause for this discrepancy? Is there any way to write out the userID, itemID associated with the prediction?

In which platform does it happen?

Windows 10 running Anaconda. Currently using the master branch from recommenders

How do we replicate the issue?

Expected behavior (i.e. solution)

The number of rows in the output file should match the number of rows in the test file.

Ideally, the output file should have the userID, productID pair and the prediction, or some way of indexing the prediction back to the example presented. It would be nice if this was returned from the model.predict call as a unified dataframe rather than writing outputs

Other Comments

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 15

Most upvoted comments

@yueguoguo - It appears that the fix made in #833 was overwritten by the very next commit (“small fix”). Would it be possible to re-apply the fix into the latest codebase?

@miguelgfierro, @yueguoguo , @Leavingseason - I think this change fixes the problem.

image

After making this change, my output files do not have more than one prediction per line and the number of predictions ties out with the number of examples. I’d like to create a pull request for this fix.

image

@miguelgfierro , @yueguoguo and @Leavingseason I carefully examined the output file. It looks like there are some lines that have more than one prediction.