machinelearning: AutoML Regression Experiment Crash

System Information (please complete the following information):

  • OS & Version: Windows 10
  • ML.NET Version: ML.NET v2.0.1 automl 0.20.1
  • .NET Version: e.g. .NET 6.0

Describe the bug AutoML Experiment crash

To Reproduce I am doing autoML on data from database… but attached is the csv of that data automlbug.csv split train test fraction -> 0.01 100 seconds experiment target column sales preFeaturizer->doubletosingle Optimization metric: RegressionMetric.MeanAbsoluteError

Expected behavior experiment should not crash

Screenshots, Code, Sample Projects

    System.AggregateException: One or more errors occurred. (Index was outside the bounds of the array.) ---> System.IndexOutOfRangeException: Index was outside the bounds of the array.

  Stack Trace: 
    PipelineProposer.ProposeSearchSpace()
    EciCostFrugalTuner.Propose(TrialSettings settings)
    AutoMLExperiment.RunAsync(CancellationToken ct)
    --- End of inner exception stack trace ---
    Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
    Task`1.GetResultCore(Boolean waitCompletionNotification)
    AutoMLExperiment.Run()
    RegressionExperiment.Execute(IDataView trainData, IDataView validationData, ColumnInformation columnInformation, IEstimator`1 preFeaturizer, IProgress`1 progressHandler)

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 15 (7 by maintainers)

Most upvoted comments

It’s merged a few hours ago, you should be able to get it from nightly build now

Get Outlook for iOShttps://aka.ms/o0ukef


From: superichmann @.> Sent: Friday, May 12, 2023 12:51:35 AM To: dotnet/machinelearning @.> Cc: XiaoYun Zhang @.>; Mention @.> Subject: Re: [dotnet/machinelearning] AutoML Regression Experiment Crash (Issue #6644)

GG =]

— Reply to this email directly, view it on GitHubhttps://github.com/dotnet/machinelearning/issues/6644#issuecomment-1545328286, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEAYLOUAX6CY5YYVZAERSRLXFXTYPANCNFSM6AAAAAAXUECSGU. You are receiving this because you were mentioned.Message ID: @.***>

@LittleLittleCloud I have Reproduced exactly, please follow these steps:

  1. Create console application .NET 6 x64
  2. Add latest alpha ml.net and automl image
  3. put tr and te files in the project folder.
  4. put the following code in Program.cs
// Initialize MLContext
using Microsoft.ML.AutoML;
using Microsoft.ML;
using Microsoft.ML.Data;
using static Microsoft.ML.DataOperationsCatalog;
using static TorchSharp.torch.utils;

MLContext ctx = new MLContext();

// Define data path
var dataPath = Path.GetFullPath(@"..\..\..\tr.csv");
var testPath = Path.GetFullPath(@"..\..\..\te.csv");

// Infer column information
ColumnInferenceResults columnInference =
    ctx.Auto().InferColumns(dataPath, labelColumnName: "sales");

// Create text loader
TextLoader loader = ctx.Data.CreateTextLoader(columnInference.TextLoaderOptions);

// Load data into IDataView
IDataView train = loader.Load(dataPath);
IDataView test = loader.Load(testPath);

SweepablePipeline pipeline =
    ctx.Auto().Featurizer(train, columnInformation: columnInference.ColumnInformation)
        .Append(ctx.Auto().Regression(labelColumnName: columnInference.ColumnInformation.LabelColumnName));

var experimentSettings = new RegressionExperimentSettings();
experimentSettings.MaxModels = 30;
experimentSettings.OptimizingMetric = RegressionMetric.MeanAbsoluteError;
experimentSettings.CacheBeforeTrainer = CacheBeforeTrainer.Off;
experimentSettings.CacheDirectoryName = null;
RegressionExperiment experiment = ctx.Auto().CreateRegressionExperiment(experimentSettings);
var preDoubleToSingle = TransformDoubleToSingle(train);
var experimentResult = experiment.Execute(train, test, "sales", preFeaturizer: preDoubleToSingle);
Console.WriteLine(experimentResult.BestRun.TrainerName);
Console.WriteLine(experimentResult.BestRun.ValidationMetrics.MeanAbsoluteError);
IEstimator<ITransformer> TransformDoubleToSingle(IDataView data)
{
    var mlContext = new MLContext();

    var doubleColumns = data.Schema
        .Where(col => col.Type == NumberDataViewType.Double)
        .Select(col => new InputOutputColumnPair($"{col.Name}", col.Name));
    if (doubleColumns.Any())
        return mlContext.Transforms.Conversion.ConvertType(doubleColumns.ToArray(), DataKind.Single);
    else
        return null;
}

The exception should occur after a short while: image