MachineLearningNotebooks: Error: The input data is empty. Ensure data correctness and availability.

I have the following code, and I am very sure the dataset is not empy!

workspace = Workspace(subscription_id, resource_group, workspace_name)

dstraining_datasensor1 = Dataset.get_by_name(workspace, name='sensor1')


from azureml.automl.core.forecasting_parameters import ForecastingParameters

forecasting_parametersSensor1 = ForecastingParameters(time_column_name='EventEnqueuedUtcTime', 
                                               forecast_horizon=5,
                                               time_series_id_column_names=["eui"],
                                               freq='H',
                                               target_lags='auto',
                                               target_rolling_window_size=10)


from azureml.core.workspace import Workspace
from azureml.core.experiment import Experiment
from azureml.train.automl import AutoMLConfig
from azureml.core.compute import ComputeTarget, AmlCompute
import logging

amlcompute_cluster_name = "computecluster"
compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)
experiment_name = 'iot-forecast'

experiment = Experiment(ws, experiment_name)

automl_configSensor1 = AutoMLConfig(task='forecasting',
                             primary_metric='normalized_root_mean_squared_error',
                             experiment_timeout_minutes=100,
                             enable_early_stopping=True,
                             training_data=dstraining_datasensor1,
                             compute_target = compute_target,
                             label_column_name='TempC_DS',
                             n_cross_validations=5,
                             enable_ensembling=False,
                             verbosity=logging.INFO,
                             forecasting_parameters=forecasting_parametersSensor1)

remote_run = experiment.submit(automl_configSensor1, show_output=True)

However after some minutes, in the experiment I get this:

Status Failed  Error: The input data is empty. Ensure data correctness and availability.

I checked the dataset and its definitely not empty

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 26

Most upvoted comments

Ive already used Option B successfully, pre filtering in Ps days and outputting results back to tabular.

I’ll test option A

Tony Pines

Sent from my iPhone

On Mar 18, 2021, at 4:21 PM, Cesar De la Torre @.***> wrote:

@fausttiger What we’re saying is that right now, if using the SDK-Notebook+Filter() and no rows are within the first 10K, you will get the error because the dataset profile validation (even if you created the dataset profile in advanced) is not used by AutoML by default. This is an issue/bug from us to be fixed pretty soon.

Hence, the workarounds you currently have are any of the following:

OPTION A. Shuffle the data previously so filters could match the values in the first 10k rows. OPTION B. Create a new dataset out of filtered data before providing it to AutoMLConfig class. OPTION C. Use the UI (NOT the SDK) with the HTTP URL parameter provided above. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Cesar

I’ll have to see if this is possible… this is my personal GitHub account, but this error is on one of our corporate Azure subscriptions. We’re not allowed to post to GitHub forums from our Enterprise accounts… hence I was posting via my personal.

There may be a sanitized dataset and simpler notebook I can cobble together to reproduce.

On Mar 15, 2021, at 1:38 PM, Cesar De la Torre @.***> wrote:

@fausttiger https://github.com/fausttiger Could you send me a pointer to the sample dataset and notebook to repro the issue? email to: cesardl at microsoft dot com

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Azure/MachineLearningNotebooks/issues/1374#issuecomment-799699007, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG47L5SHAPYEI4WQ6QOLG3TDZO3XANCNFSM4YMW6IPA.