machinelearning: AutoML Regression Experiment fails after 67iterations

Hi,

When running a Regression Experiment, AutoML sistematically fails after 67 iterations, raising the Exception “All instances skipped due to missing features”. By looking at other issues, I got the idea that the SmacSweeper could be the cause. This is also suggested by the stack strace:

in Microsoft.ML.Trainers.FastTree.DataConverter.MemImpl.MakeBoundariesAndCheckLabels(Int64& missingInstances, Int64& totalInstances)
   in Microsoft.ML.Trainers.FastTree.DataConverter.MemImpl..ctor(RoleMappedData data, IHost host, Double[][] binUpperBounds, Single maxLabel, Boolean dummy, Boolean noFlocks, PredictionKind kind, Int32[] categoricalFeatureIndices, Boolean categoricalSplit)
   in Microsoft.ML.Trainers.FastTree.DataConverter.Create(RoleMappedData data, IHost host, Int32 maxBins, Single maxLabel, Boolean diskTranspose, Boolean noFlocks, Int32 minDocsPerLeaf, PredictionKind kind, IParallelTraining parallelTraining, Int32[] categoricalFeatureIndices, Boolean categoricalSplit)
   in Microsoft.ML.Trainers.FastTree.ExamplesToFastTreeBins.FindBinsAndReturnDataset(RoleMappedData data, PredictionKind kind, IParallelTraining parallelTraining, Int32[] categoricalFeaturIndices, Boolean categoricalSplit)
   in Microsoft.ML.Trainers.FastTree.FastTreeTrainerBase`3.ConvertData(RoleMappedData trainData)
   in Microsoft.ML.Trainers.FastTree.FastForestRegressionTrainer.TrainModelCore(TrainContext context)
   in Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
   in Microsoft.ML.AutoML.SmacSweeper.FitModel(IEnumerable`1 previousRuns)
   in Microsoft.ML.AutoML.SmacSweeper.ProposeSweeps(Int32 maxSweeps, IEnumerable`1 previousRuns)
   in Microsoft.ML.AutoML.PipelineSuggester.SampleHyperparameters(MLContext context, SuggestedTrainer trainer, IEnumerable`1 history, Boolean isMaximizingMetric)
   in Microsoft.ML.AutoML.PipelineSuggester.GetNextInferredPipeline(MLContext context, IEnumerable`1 history, DatasetColumnInfo[] columns, TaskKind task, Boolean isMaximizingMetric, CacheBeforeTrainer cacheBeforeTrainer, IEnumerable`1 trainerWhitelist)
   in Microsoft.ML.AutoML.Experiment`2.Execute()
   in Microsoft.ML.AutoML.ExperimentBase`2.Execute(ColumnInformation columnInfo, DatasetColumnInfo[] columns, IEstimator`1 preFeaturizer, IProgress`1 progressHandler, IRunner`1 runner)
   in Microsoft.ML.AutoML.ExperimentBase`2.Execute(IDataView trainData, ColumnInformation columnInformation, IEstimator`1 preFeaturizer, IProgress`1 progressHandler)

However, compared to the other issues, I’m running a console application, I’m loading data from database with no missing values. and I hopefully have the right NuGet dependencies:

Microsoft.ML.AutoML and Microsoft.ML.Recommender: 0.16.0
Microsoft.ML and all the other ML packages: 1.4.0

I understand that the problem might be caused by some of the third-party libraries ML depends on, but isn’t at least possible to ignore the exception thrown by a single trainer without compromising the whole regression experiment? I would like to be able to access the BestRun object and choose the best out of the first 67 experiments without having to look back at the CacheDirectory.

If necessary, I can generate a csv with all the data used for training.

Thanks

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 16 (9 by maintainers)

Most upvoted comments

I was able to successfully install the most recent build from today ( 0.17.3-29602-5 ) which indeed solves the bug. Feel free to close the issue. Thanks for the support

francescomazzurco on Dec 2, 2020

@justinormont @francescomazzurco

As part of moving into arcade, we’ve published some nugets that have a bug, where it requires the MlNetMklDepsCode nuget to work. This is a bug, and we’re working on fixing it. Those nugets should be ignored for the time being.

Also, there had been some problems with publishing nugets from master (which are the ones required by @francescomazzurco ), and so I believe there hasn’t been any nuget published correctly from master since October 20th. So I don’t think there’s any public nuget including the change made on October 30, Justin is referring to. This problem was on Azure DevOps side, and should be fixed now. So I’ll run a manual build to publish nugets from master branch, and hopefully it will work. I’ll update this thread with info about that. Thanks.

antoniovs1029 on Nov 30, 2020