machinelearning: ML.Ranking LightGBM - Getting error "Value cannot be null. Parameter name: items"

System information

  • Windows:
  • .net Core 3.0:

Issue

  • I am trying to generate a simple ranking of candidates based on a few features for a recruitment application.
  • But when running the trainer, I get the following message which is not clear - “Value cannot be null. Parameter name: items”

Source code / logs

Capture

//Training Pipeline
            IEstimator<ITransformer> dataPipeline = mlContext.Transforms.Categorical.OneHotEncoding("HIGHESTEDUCATION", "HIGHESTEDUCATION")
                .Append(mlContext.Transforms.Categorical.OneHotEncoding("SOURCE", "SOURCE"))
                .Append(mlContext.Transforms.Text.FeaturizeText("SKILLSET", "SKILLSET"))
                .Append(mlContext.Transforms.Categorical.OneHotEncoding("TOWNCITY", "TOWNCITY"))
                .Append(mlContext.Transforms.Categorical.OneHotEncoding("YEARSOFEXPERIENCE", "YEARSOFEXPERIENCE"))
                .Append(mlContext.Transforms.Concatenate("Features", "HIGHESTEDUCATION", "SKILLSET", "SOURCE", "TOWNCITY", "YEARSOFEXPERIENCE"))
                .Append(mlContext.Transforms.Conversion.MapValueToKey("Label","Label"))
                .Append(mlContext.Transforms.Conversion.Hash("GroupId", nameof(Candidate.VACANCYID), numberOfBits: 20));

            // Set the LightGBM LambdaRank trainer.
            IEstimator<ITransformer> trainer = mlContext.Ranking.Trainers.LightGbm(labelColumnName: "Label", featureColumnName: "Features", rowGroupColumnName: "GroupId"); 
            IEstimator<ITransformer> trainerPipeline = dataPipeline.Append(trainer);
// Domain Model
public class Candidate
    {
        [LoadColumn(0)]
        public string HIGHESTEDUCATION { get; set; }

        [ColumnName("Label"),LoadColumn(1)]
        public Single RELEVANCESCORE { get; set; }

        [LoadColumn(2)]
        public string SKILLSET { get; set; }

        [LoadColumn(3)]
        public string SOURCE { get; set; }

        [LoadColumn(4)]
        public string TOWNCITY { get; set; }

        [ ColumnName("GroupId"), LoadColumn(5)]
        public string VACANCYID { get; set; }

        [LoadColumn(6)]
        public string YEARSOFEXPERIENCE { get; set; }
        
    }

Please paste or attach the code or logs or traces that would be helpful to diagnose the issue you are reporting.

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Comments: 15 (11 by maintainers)

Most upvoted comments

Still, it’s noticeable that I get Trees with only one leave with “0” when training your data. It means that even after fixing this exception, everything you use as input of the predictor, will predict that your input is of rank “0”. I’m not sure what the interpretation of this would be since in your training data you only have ranks from 1 to 4… perhaps that you need more data or play around with other parameters or preprocessing? @najeeb-kazmi any idea on this regard?

So this seems to be indeed a bug on ML.NET. What happens is that the Tree that is returned by LightGBM has only 1 node, which is a leave, and whose value is 0. Because of this, the next code will leave tree = new InternalRegressionTree(2); (i.e. leafOutput[0] is 0).

https://github.com/dotnet/machinelearning/blob/41c5fc34f30f46541235369064fb5c9ccd3c6587/src/Microsoft.ML.LightGbm/WrappedLightGbmBooster.cs#L264-L275

So this InternalRegressionTree constructor is used; notice that RawThresholds is never initialized: https://github.com/dotnet/machinelearning/blob/41c5fc34f30f46541235369064fb5c9ccd3c6587/src/Microsoft.ML.FastTree/TreeEnsemble/InternalRegressionTree.cs#L82-L96

Then when creating the RegressionTree here, it tries to use the RawThresholds array, which was never initialized, and it throws an exception saying that it’s null: https://github.com/dotnet/machinelearning/blob/41c5fc34f30f46541235369064fb5c9ccd3c6587/src/Microsoft.ML.FastTree/RegressionTree.cs#L158-L171

I think the solution to this bug should be straight forward, so I will open a PR solving this.

So I got to reproduce your issue, by taking the data on your screenshot. The stack trace was as follows, and I will now look into this.

   at System.Collections.Immutable.Requires.FailArgumentNullException(String parameterName)
   at System.Collections.Immutable.ImmutableArray.Create[T](T[] items, Int32 start, Int32 length)
   at Microsoft.ML.Trainers.FastTree.RegressionTreeBase..ctor(InternalRegressionTree tree)
   at Microsoft.ML.Trainers.FastTree.RegressionTree..ctor(InternalRegressionTree tree)
   at Microsoft.ML.Trainers.FastTree.TreeEnsembleModelParametersBasedOnRegressionTree.<>c.<CreateTreeEnsembleFromInternalDataStructure>b__5_0(InternalRegressionTree tree)
   at System.Linq.Enumerable.SelectListIterator`2.ToList()
   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
   at Microsoft.ML.Trainers.FastTree.TreeEnsemble`1..ctor(IEnumerable`1 trees, IEnumerable`1 treeWeights, Double bias)
   at Microsoft.ML.Trainers.FastTree.RegressionTreeEnsemble..ctor(IEnumerable`1 trees, IEnumerable`1 treeWeights, Double bias)
   at Microsoft.ML.Trainers.FastTree.TreeEnsembleModelParametersBasedOnRegressionTree.CreateTreeEnsembleFromInternalDataStructure()
   at Microsoft.ML.Trainers.FastTree.TreeEnsembleModelParametersBasedOnRegressionTree..ctor(IHostEnvironment env, String name, InternalTreeEnsemble trainedEnsemble, Int32 numFeatures, String innerArgs)
   at Microsoft.ML.Trainers.LightGbm.LightGbmRankingModelParameters..ctor(IHostEnvironment env, InternalTreeEnsemble trainedEnsemble, Int32 featureCount, String innerArgs)
   at Microsoft.ML.Trainers.LightGbm.LightGbmRankingTrainer.CreatePredictor()
   at Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase`4.TrainModelCore(TrainContext context)
   at Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
   at Microsoft.ML.Trainers.TrainerEstimatorBase`2.Fit(IDataView input)
   at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
   at Issue5022.Program.Main(String[] args) in C:\Users\anvelazq\source\repos\Bugs\Issue5022\Program.cs:line 55