machinelearning: LightGBM trainer exception

System information

  • OS version/distro: Windows 10
  • .NET Version (eg., dotnet --info): .NET Core 2.1

Issue

  • What did you do? Ran MML command line: execgraph “C:\Benchmarking\automl_graph.json”

Contents of automl_.graph.json:

{
  "Inputs": {
    "file_train": "D:\\SplitDatasets\\ExcitementFG2_train.csv",
    "file_test": "D:\\SplitDatasets\\ExcitementFG2_valid.csv"
  },
  "Nodes": [
    {
      "Inputs": {
        "CustomSchema": "sep=, col=Label:R4:78 col=Features1:R4:0-77 col=Features2:R4:79-202 header=+",
        "InputFile": "$file_train"
      },
      "Name": "Data.CustomTextLoader",
      "Outputs": {
        "Data": "$data_train"
      }
    },
    {
      "Inputs": {
        "CustomSchema": "sep=, col=Label:R4:78 col=Features1:R4:0-77 col=Features2:R4:79-202 header=+",
        "InputFile": "$file_test"
      },
      "Name": "Data.CustomTextLoader",
      "Outputs": {
        "Data": "$data_test"
      }
    },
    {
      "Inputs": {
        "BatchSize": 3,
        "StateArguments": {
          "Name": "AutoMlState",
          "Settings": {
            "Engine": {
              "Name": "Rocket",
              "Settings": {}
            },
            "Metric": "Accuracy",
            "TerminatorArgs": {
              "Name": "IterationLimited",
              "Settings": {
                "FinalHistoryLength": 100
              }
            },
            "TrainerKind": "SignatureBinaryClassifierTrainer"
          }
        },
        "TestingData": "$data_test",
        "TrainingData": "$data_train",
		"IgnoreColumns": ["cost"]
      },
      "Name": "Models.PipelineSweeper",
      "Outputs": {
        "Results": "$output_data",
        "State": "$xyz"
      }
    }
  ],
  "Outputs": {
    "output_data": "C:\\Benchmarking\\01-ResultsOut.csv"
  }
}
  • What happened? Encountered an exception in LightGBM trainer

  • What did you expect? A run to completion, w/o exception

Source code / logs

— Command line args — dotnet MML.dll execgraph C:\Benchmarking\automl_graph.json

— Exception message —

System.InvalidOperationException
  HResult=0x80131509
  Message=Categorical split features is zero length
  Source=Microsoft.ML.Core
  StackTrace:
   at Microsoft.ML.Runtime.Contracts.Check(Boolean f, String msg) in C:\MLDotNet\src\Microsoft.ML.Core\Utilities\Contracts.cs:line 497
   at Microsoft.ML.Trainers.FastTree.Internal.RegressionTree.CheckValid(Action`2 checker) in C:\MLDotNet\src\Microsoft.ML.FastTree\TreeEnsemble\RegressionTree.cs:line 469
   at Microsoft.ML.Trainers.FastTree.Internal.RegressionTree..ctor(Int32[] splitFeatures, Double[] splitGain, Double[] gainPValue, Single[] rawThresholds, Single[] defaultValueForMissing, Int32[] lteChild, Int32[] gtChild, Double[] leafValues, Int32[][] categoricalSplitFeatures, Boolean[] categoricalSplit) in C:\MLDotNet\src\Microsoft.ML.FastTree\TreeEnsemble\RegressionTree.cs:line 223
   at Microsoft.ML.Trainers.FastTree.Internal.RegressionTree.Create(Int32 numLeaves, Int32[] splitFeatures, Double[] splitGain, Single[] rawThresholds, Single[] defaultValueForMissing, Int32[] lteChild, Int32[] gtChild, Double[] leafValues, Int32[][] categoricalSplitFeatures, Boolean[] categoricalSplit) in C:\MLDotNet\src\Microsoft.ML.FastTree\TreeEnsemble\RegressionTree.cs:line 189
   at Microsoft.ML.Runtime.LightGBM.Booster.GetModel(Int32[] categoricalFeatureBoudaries) in C:\MLDotNet\src\Microsoft.ML.LightGBM\WrappedLightGbmBooster.cs:line 241
   at Microsoft.ML.Runtime.LightGBM.LightGbmTrainerBase`3.TrainCore(IChannel ch, IProgressChannel pch, Dataset dtrain, CategoricalMetaData catMetaData, Dataset dvalid) in C:\MLDotNet\src\Microsoft.ML.LightGBM\LightGbmTrainerBase.cs:line 378
   at Microsoft.ML.Runtime.LightGBM.LightGbmTrainerBase`3.TrainModelCore(TrainContext context) in C:\MLDotNet\src\Microsoft.ML.LightGBM\LightGbmTrainerBase.cs:line 126
   at Microsoft.ML.Runtime.Training.TrainerEstimatorBase`2.Train(TrainContext context) in C:\MLDotNet\src\Microsoft.ML.Data\Training\TrainerEstimatorBase.cs:line 92
   at Microsoft.ML.Runtime.Training.TrainerEstimatorBase`2.Microsoft.ML.Runtime.ITrainer.Train(TrainContext context) in C:\MLDotNet\src\Microsoft.ML.Data\Training\TrainerEstimatorBase.cs:line 158
   at Microsoft.ML.Runtime.Data.TrainUtils.TrainCore(IHostEnvironment env, IChannel ch, RoleMappedData data, ITrainer trainer, RoleMappedData validData, IComponentFactory`1 calibrator, Int32 maxCalibrationExamples, Nullable`1 cacheData, IPredictor inputPredictor) in C:\MLDotNet\src\Microsoft.ML.Data\Commands\TrainCommand.cs:line 254
   at Microsoft.ML.Runtime.Data.TrainUtils.Train(IHostEnvironment env, IChannel ch, RoleMappedData data, ITrainer trainer, IComponentFactory`1 calibrator, Int32 maxCalibrationExamples) in C:\MLDotNet\src\Microsoft.ML.Data\Commands\TrainCommand.cs:line 223
   at Microsoft.ML.Runtime.EntryPoints.LearnerEntryPointsUtils.Train[TArg,TOut](IHost host, TArg input, Func`1 createTrainer, Func`1 getLabel, Func`1 getWeight, Func`1 getGroup, Func`1 getName, Func`1 getCustom, ICalibratorTrainerFactory calibrator, Int32 maxCalibrationExamples) in C:\MLDotNet\src\Microsoft.ML.Data\EntryPoints\InputBase.cs:line 189
   at Microsoft.ML.Runtime.LightGBM.LightGbm.TrainBinary(IHostEnvironment env, LightGbmArguments input) in C:\MLDotNet\src\Microsoft.ML.LightGBM\LightGbmBinaryTrainer.cs:line 189

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 17 (14 by maintainers)

Most upvoted comments

Hey @daholste, I wasn’t able to reproduce this at all, neither in TLC nor in ML.NET. And it looks like the Models.PipelineSweeper and Rocket components in the graph (along with the execgraph command) were removed in ML.NET some time ago. In any case, there was no repro even when using LightGbm from the command line or API since the dataset is only numerical columns, and the Categorical split features is zero length error isn’t applicable so I’m not sure why you were seeing that in the first place.

I do, however, have the same error reproduced in #3659, and I believe the underlying cause is the same. It deterministically happens when there is only one categorical feature and UseCategoricalSplit is true in LightGbm, and it is likely a bug in model conversion from LightGbm to FastTree. Please follow #3659 for details and updates. I am closing this issue. Please feel free to reopen if you find a repro that is distinct from the conditions described in the other issue.

cc: @vinodshanbhag @justinormont @guolinke @vKuryshev @mayoatte @rauhs @eyvindwa

Hey, sent!

@daholste can you send me the dataset and code with which I can reproduce this issue? The same that you sent to Ivan 😃

I think @justinormont sent me repo file some time ago, but I lost it. If someone can provide reproducible snippet of code, I would be more than happy to fix it.