machinelearning: ML.Ranking LightGBM - Getting error "Value cannot be null. Parameter name: items"
System information
- Windows:
- .net Core 3.0:
Issue
- I am trying to generate a simple ranking of candidates based on a few features for a recruitment application.
- But when running the trainer, I get the following message which is not clear - “Value cannot be null. Parameter name: items”
Source code / logs
//Training Pipeline
IEstimator<ITransformer> dataPipeline = mlContext.Transforms.Categorical.OneHotEncoding("HIGHESTEDUCATION", "HIGHESTEDUCATION")
.Append(mlContext.Transforms.Categorical.OneHotEncoding("SOURCE", "SOURCE"))
.Append(mlContext.Transforms.Text.FeaturizeText("SKILLSET", "SKILLSET"))
.Append(mlContext.Transforms.Categorical.OneHotEncoding("TOWNCITY", "TOWNCITY"))
.Append(mlContext.Transforms.Categorical.OneHotEncoding("YEARSOFEXPERIENCE", "YEARSOFEXPERIENCE"))
.Append(mlContext.Transforms.Concatenate("Features", "HIGHESTEDUCATION", "SKILLSET", "SOURCE", "TOWNCITY", "YEARSOFEXPERIENCE"))
.Append(mlContext.Transforms.Conversion.MapValueToKey("Label","Label"))
.Append(mlContext.Transforms.Conversion.Hash("GroupId", nameof(Candidate.VACANCYID), numberOfBits: 20));
// Set the LightGBM LambdaRank trainer.
IEstimator<ITransformer> trainer = mlContext.Ranking.Trainers.LightGbm(labelColumnName: "Label", featureColumnName: "Features", rowGroupColumnName: "GroupId");
IEstimator<ITransformer> trainerPipeline = dataPipeline.Append(trainer);
// Domain Model
public class Candidate
{
[LoadColumn(0)]
public string HIGHESTEDUCATION { get; set; }
[ColumnName("Label"),LoadColumn(1)]
public Single RELEVANCESCORE { get; set; }
[LoadColumn(2)]
public string SKILLSET { get; set; }
[LoadColumn(3)]
public string SOURCE { get; set; }
[LoadColumn(4)]
public string TOWNCITY { get; set; }
[ ColumnName("GroupId"), LoadColumn(5)]
public string VACANCYID { get; set; }
[LoadColumn(6)]
public string YEARSOFEXPERIENCE { get; set; }
}
Please paste or attach the code or logs or traces that would be helpful to diagnose the issue you are reporting.
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 15 (11 by maintainers)
Still, it’s noticeable that I get Trees with only one leave with “0” when training your data. It means that even after fixing this exception, everything you use as input of the predictor, will predict that your input is of rank “0”. I’m not sure what the interpretation of this would be since in your training data you only have ranks from 1 to 4… perhaps that you need more data or play around with other parameters or preprocessing? @najeeb-kazmi any idea on this regard?
So this seems to be indeed a bug on ML.NET. What happens is that the Tree that is returned by LightGBM has only 1 node, which is a leave, and whose value is 0. Because of this, the next code will leave
tree = new InternalRegressionTree(2);(i.e. leafOutput[0] is 0).https://github.com/dotnet/machinelearning/blob/41c5fc34f30f46541235369064fb5c9ccd3c6587/src/Microsoft.ML.LightGbm/WrappedLightGbmBooster.cs#L264-L275
So this
InternalRegressionTreeconstructor is used; notice that RawThresholds is never initialized: https://github.com/dotnet/machinelearning/blob/41c5fc34f30f46541235369064fb5c9ccd3c6587/src/Microsoft.ML.FastTree/TreeEnsemble/InternalRegressionTree.cs#L82-L96Then when creating the
RegressionTreehere, it tries to use theRawThresholdsarray, which was never initialized, and it throws an exception saying that it’s null: https://github.com/dotnet/machinelearning/blob/41c5fc34f30f46541235369064fb5c9ccd3c6587/src/Microsoft.ML.FastTree/RegressionTree.cs#L158-L171I think the solution to this bug should be straight forward, so I will open a PR solving this.
So I got to reproduce your issue, by taking the data on your screenshot. The stack trace was as follows, and I will now look into this.