machinelearning: LightGBMBinary sometimes throws "Splitter/consolidator worker encountered exception while consuming source data"
System Information (please complete the following information):
- OS & Version: Windows Server 2022 Preview
- ML.NET Version: ML.NET 1.6 (custom build, but no changes on text loading)
- .NET Version: NET 5.0
Describe the bug I have a dataset which I am training on FastTree, FastForest and LightGBM with AutoML. LightGBM sometimes throws the below exception about boolean field IsPublicHoliday. Error claims it to have a number value (e.g. 3.208). It would indicate the dataset is not correct. However, I do not receive this exception on all runs of LightGBM, never on FastForest and FastTree. (FastForest and FastTree are running on another similar server).
Exception during AutoML iteration: System.InvalidOperationException: Splitter/consolidator worker encountered exception while consuming source data
---> System.InvalidOperationException: Could not parse value 3.20852317 in line 1786829, column DtIsPublicHoliday
To Reproduce It may be related to the specific 120GB dataset. I will update here if I find the cause.
Expected behavior I expect either the parse error should happen on every run and every algorithm, or it should not happen at all.
Additional context
| Trainer PosPrec PosReca Accuracy AUC AUPRC F1-score Duration MaxPosPr|
|1 LightGbmBinary 0.5340 0.5108 0.5339 0.5516 0.5400 0.5222 1731.1 0.5340|
|2 LightGbmBinary 0.5190 0.4807 0.5190 0.5343 0.5316 0.4991 2772.3 0.5340|
|3 LightGbmBinary 0.5335 0.5150 0.5336 0.5554 0.5410 0.5241 7007.1 0.5340|
|4 LightGbmBinary 0.0000 0.0000 0.5014 0.5132 0.5072 0.0000 1202.4 0.5340|
|5 LightGbmBinary 0.5278 0.4959 0.5274 0.5467 0.5368 0.5114 3865.4 0.5340|
Exception during AutoML iteration: System.InvalidOperationException: Splitter/consolidator worker encountered exception while consuming source data
---> System.InvalidOperationException: Could not parse value 3.20852317 in line 1786829, column DtIsPublicHoliday
at Microsoft.ML.Data.TextLoader.Parser.ProcessOne(FieldSet vs, ColInfo info, ColumnPipe v, Int32 irow, Int64 line) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\Text\TextLoaderParser.cs:line 1463
at Microsoft.ML.Data.TextLoader.Parser.ProcessItems(RowSet rows, Int32 irow, Boolean[] active, FieldSet fields, Int32 srcLim, Int64 line) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\Text\TextLoaderParser.cs:line 1381
at Microsoft.ML.Data.TextLoader.Parser.ParseRow(RowSet rows, Int32 irow, Helper helper, Boolean[] active, String path, Int64 line, String text) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\Text\TextLoaderParser.cs:line 888
at Microsoft.ML.Data.TextLoader.Cursor.ParseSequential()+MoveNext() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\Text\TextLoaderCursor.cs:line 345
at Microsoft.ML.Data.TextLoader.Cursor.MoveNextCore() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\Text\TextLoaderCursor.cs:line 298
at Microsoft.ML.Data.RootCursorBase.MoveNext() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Core\Data\RootCursorBase.cs:line 72
at Microsoft.ML.Data.DataViewUtils.Splitter.<>c__DisplayClass7_1.<ConsolidateCore>b__2() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Data\DataViewUtils.cs:line 426
--- End of inner exception stack trace ---
at Microsoft.ML.Data.DataViewUtils.Splitter.Batch.SetAll(OutPipe[] pipes) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Data\DataViewUtils.cs:line 832
at Microsoft.ML.Data.DataViewUtils.Splitter.Cursor.MoveNextCore() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Data\DataViewUtils.cs:line 1101
at Microsoft.ML.Data.RootCursorBase.MoveNext() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Core\Data\RootCursorBase.cs:line 72
at Microsoft.ML.Trainers.TrainingCursorBase.MoveNext() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Training\TrainerUtils.cs:line 549
at Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase`4.LoadDataset(IChannel ch, Factory factory, Dataset dataset, Int32 numRow, Int32 batchSize, CategoricalMetaData catMetaData) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.LightGbm\LightGbmTrainerBase.cs:line 964
at Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase`4.LoadTrainingData(IChannel ch, RoleMappedData trainData, CategoricalMetaData& catMetaData) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.LightGbm\LightGbmTrainerBase.cs:line 591
at Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase`4.TrainModelCore(TrainContext context) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.LightGbm\LightGbmTrainerBase.cs:line 386
at Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Training\TrainerEstimatorBase.cs:line 158
at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\EstimatorChain.cs:line 68
at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String groupId, String labelColumn, IMetricsAgent`1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, IChannel logger) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.AutoML\Experiment\Runners\RunnerUtil.cs:line 52
Exception during AutoML iteration: System.InvalidOperationException: Splitter/consolidator worker encountered exception while consuming source data
---> System.InvalidOperationException: Could not parse value 17 in line 9976, column DtIsPublicHoliday
at Microsoft.ML.Data.TextLoader.Parser.ProcessOne(FieldSet vs, ColInfo info, ColumnPipe v, Int32 irow, Int64 line) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\Text\TextLoaderParser.cs:line 1463
at Microsoft.ML.Data.TextLoader.Parser.ProcessItems(RowSet rows, Int32 irow, Boolean[] active, FieldSet fields, Int32 srcLim, Int64 line) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\Text\TextLoaderParser.cs:line 1381
at Microsoft.ML.Data.TextLoader.Parser.ParseRow(RowSet rows, Int32 irow, Helper helper, Boolean[] active, String path, Int64 line, String text) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\Text\TextLoaderParser.cs:line 888
at Microsoft.ML.Data.TextLoader.Cursor.ParseSequential()+MoveNext() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\Text\TextLoaderCursor.cs:line 345
at Microsoft.ML.Data.TextLoader.Cursor.MoveNextCore() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\Text\TextLoaderCursor.cs:line 298
at Microsoft.ML.Data.RootCursorBase.MoveNext() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Core\Data\RootCursorBase.cs:line 72
at Microsoft.ML.Data.DataViewUtils.Splitter.<>c__DisplayClass7_1.<ConsolidateCore>b__2() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Data\DataViewUtils.cs:line 426
--- End of inner exception stack trace ---
at Microsoft.ML.Data.DataViewUtils.Splitter.Batch.SetAll(OutPipe[] pipes) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Data\DataViewUtils.cs:line 832
at Microsoft.ML.Data.DataViewUtils.Splitter.Cursor.MoveNextCore() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Data\DataViewUtils.cs:line 1101
at Microsoft.ML.Data.RootCursorBase.MoveNext() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Core\Data\RootCursorBase.cs:line 72
at Microsoft.ML.Trainers.TrainingCursorBase.MoveNext() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Training\TrainerUtils.cs:line 549
at Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase`4.MoveMany(FloatLabelCursor cursor, Int64 count) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.LightGbm\LightGbmTrainerBase.cs:line 734
at Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase`4.CreateDatasetFromSamplingData(IChannel ch, Factory factory, Int32 numRow, String param, Single[] labels, Single[] weights, Int32[] groups, CategoricalMetaData catMetaData, Dataset& dataset) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.LightGbm\LightGbmTrainerBase.cs:line 836
at Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase`4.LoadTrainingData(IChannel ch, RoleMappedData trainData, CategoricalMetaData& catMetaData) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.LightGbm\LightGbmTrainerBase.cs:line 591
at Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase`4.TrainModelCore(TrainContext context) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.LightGbm\LightGbmTrainerBase.cs:line 386
at Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Training\TrainerEstimatorBase.cs:line 158
at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\EstimatorChain.cs:line 68
at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String groupId, String labelColumn, IMetricsAgent`1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, IChannel logger) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.AutoML\Experiment\Runners\RunnerUtil.cs:line 52
|8 LightGbmBinary 0.5324 0.5304 0.5336 0.5537 0.5454 0.5314 2662.2 0.5340|
|9 LightGbmBinary 0.0000 0.0000 0.5014 0.5133 0.5082 0.0000 1494.9 0.5340|
Exception during AutoML iteration: System.InvalidOperationException: Splitter/consolidator worker encountered exception while consuming source data
---> System.InvalidOperationException: Could not parse value 1.0124658 in line 471358, column DtIsPublicHoliday
at Microsoft.ML.Data.TextLoader.Parser.ProcessOne(FieldSet vs, ColInfo info, ColumnPipe v, Int32 irow, Int64 line) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\Text\TextLoaderParser.cs:line 1463
at Microsoft.ML.Data.TextLoader.Parser.ProcessItems(RowSet rows, Int32 irow, Boolean[] active, FieldSet fields, Int32 srcLim, Int64 line) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\Text\TextLoaderParser.cs:line 1381
at Microsoft.ML.Data.TextLoader.Parser.ParseRow(RowSet rows, Int32 irow, Helper helper, Boolean[] active, String path, Int64 line, String text) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\Text\TextLoaderParser.cs:line 888
at Microsoft.ML.Data.TextLoader.Cursor.ParseSequential()+MoveNext() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\Text\TextLoaderCursor.cs:line 345
at Microsoft.ML.Data.TextLoader.Cursor.MoveNextCore() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\Text\TextLoaderCursor.cs:line 298
at Microsoft.ML.Data.RootCursorBase.MoveNext() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Core\Data\RootCursorBase.cs:line 72
at Microsoft.ML.Data.DataViewUtils.Splitter.<>c__DisplayClass7_1.<ConsolidateCore>b__2() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Data\DataViewUtils.cs:line 426
--- End of inner exception stack trace ---
at Microsoft.ML.Data.DataViewUtils.Splitter.Batch.SetAll(OutPipe[] pipes) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Data\DataViewUtils.cs:line 832
at Microsoft.ML.Data.DataViewUtils.Splitter.Cursor.MoveNextCore() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Data\DataViewUtils.cs:line 1101
at Microsoft.ML.Data.RootCursorBase.MoveNext() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Core\Data\RootCursorBase.cs:line 72
at Microsoft.ML.Trainers.TrainingCursorBase.MoveNext() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Training\TrainerUtils.cs:line 549
at Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase`4.LoadDataset(IChannel ch, Factory factory, Dataset dataset, Int32 numRow, Int32 batchSize, CategoricalMetaData catMetaData) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.LightGbm\LightGbmTrainerBase.cs:line 964
at Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase`4.LoadTrainingData(IChannel ch, RoleMappedData trainData, CategoricalMetaData& catMetaData) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.LightGbm\LightGbmTrainerBase.cs:line 591
at Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase`4.TrainModelCore(TrainContext context) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.LightGbm\LightGbmTrainerBase.cs:line 386
at Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Training\TrainerEstimatorBase.cs:line 158
at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\EstimatorChain.cs:line 68
at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String groupId, String labelColumn, IMetricsAgent`1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, IChannel logger) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.AutoML\Experiment\Runners\RunnerUtil.cs:line 52
|11 LightGbmBinary 0.5285 0.4064 0.5233 0.5404 0.5293 0.4595 3290.4 0.5340|
Exception during AutoML iteration: System.InvalidOperationException: Splitter/consolidator worker encountered exception while consuming source data
---> System.InvalidOperationException: Could not parse value 0.99942851 in line 1691779, column DtIsPublicHoliday
at Microsoft.ML.Data.TextLoader.Parser.ProcessOne(FieldSet vs, ColInfo info, ColumnPipe v, Int32 irow, Int64 line) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\Text\TextLoaderParser.cs:line 1463
at Microsoft.ML.Data.TextLoader.Parser.ProcessItems(RowSet rows, Int32 irow, Boolean[] active, FieldSet fields, Int32 srcLim, Int64 line) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\Text\TextLoaderParser.cs:line 1381
at Microsoft.ML.Data.TextLoader.Parser.ParseRow(RowSet rows, Int32 irow, Helper helper, Boolean[] active, String path, Int64 line, String text) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\Text\TextLoaderParser.cs:line 888
at Microsoft.ML.Data.TextLoader.Cursor.ParseSequential()+MoveNext() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\Text\TextLoaderCursor.cs:line 345
at Microsoft.ML.Data.TextLoader.Cursor.MoveNextCore() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\Text\TextLoaderCursor.cs:line 298
at Microsoft.ML.Data.RootCursorBase.MoveNext() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Core\Data\RootCursorBase.cs:line 72
at Microsoft.ML.Data.DataViewUtils.Splitter.<>c__DisplayClass7_1.<ConsolidateCore>b__2() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Data\DataViewUtils.cs:line 426
--- End of inner exception stack trace ---
at Microsoft.ML.Data.DataViewUtils.Splitter.Batch.SetAll(OutPipe[] pipes) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Data\DataViewUtils.cs:line 832
at Microsoft.ML.Data.DataViewUtils.Splitter.Cursor.MoveNextCore() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Data\DataViewUtils.cs:line 1101
at Microsoft.ML.Data.RootCursorBase.MoveNext() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Core\Data\RootCursorBase.cs:line 72
at Microsoft.ML.Trainers.TrainingCursorBase.MoveNext() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Training\TrainerUtils.cs:line 549
at Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase`4.MoveMany(FloatLabelCursor cursor, Int64 count) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.LightGbm\LightGbmTrainerBase.cs:line 734
at Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase`4.CreateDatasetFromSamplingData(IChannel ch, Factory factory, Int32 numRow, String param, Single[] labels, Single[] weights, Int32[] groups, CategoricalMetaData catMetaData, Dataset& dataset) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.LightGbm\LightGbmTrainerBase.cs:line 836
at Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase`4.LoadTrainingData(IChannel ch, RoleMappedData trainData, CategoricalMetaData& catMetaData) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.LightGbm\LightGbmTrainerBase.cs:line 591
at Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase`4.TrainModelCore(TrainContext context) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.LightGbm\LightGbmTrainerBase.cs:line 386
at Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Training\TrainerEstimatorBase.cs:line 158
at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\EstimatorChain.cs:line 68
at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String groupId, String labelColumn, IMetricsAgent`1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, IChannel logger) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.AutoML\Experiment\Runners\RunnerUtil.cs:line 52
|13 LightGbmBinary 0.5416 0.3927 0.5315 0.5506 0.5405 0.4553 2007.7 0.5416|
|14 LightGbmBinary 0.0000 0.0000 0.5014 0.5133 0.5048 0.0000 1496.5 0.5416|
|15 LightGbmBinary 0.5472 0.4911 0.5437 0.5655 0.5480 0.5176 4147.1 0.5472|
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 28 (25 by maintainers)
Thanks.
Next week, I will:
I will ping you once I have done them.