MachineLearningNotebooks: The `run.input_datasets` dictionary is empty - even after passing into the PythonScriptStep

The run.input_datasets dictionary is empty - even after passing into the PythonScriptStep.

Pipeline.ipynb

input_dataset = Dataset.get_by_name(ws, name='super_secret_data')

cleanStep = PythonScriptStep(
    script_name = "clean.py",
    inputs = [input_dataset.as_named_input('important_dataset')],
    outputs = [output_data],
    compute_target = cpu_cluster,
    source_directory = experiment_folder
)

clean.py

run = Run.get_context()
print(run.input_datasets)

input_ds = run.input_datasets['important_dataset']
input_df = input_ds.to_pandas_dataframe()

When the pipeline is run, the log for the clean.py step shows the run.input_datasets object is an empty dict and therefore the script fails with a KeyError.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 1
  • Comments: 29 (8 by maintainers)

Most upvoted comments

@MayMSFT Ok, thanks!

Perhaps it’s good to add a note with the run.input_datasets that the attribute input_datasets remains empty?

The first thing I tried was to use this information to register the dataset to the model.

I’m also having this issue (Azure ML SDK Version: 1.6.0)

  • No errors in 70_driver_log.txt
  • On ml.azure.com the dataset is listed
  • run.get_details()['inputDatasets'] shows the datasets that I gave as inputs
  • run.input_datasets is {}
  • run.register_model() registers the model without reference to the input datasets.

The above happens regardless of local or a compute instance in azure.

@MayMSFT apologies if this is not the correct place to raise this issue.

thanks Anders. This was caused by a bug in our code. For dataset.as_named_input(), passing string with capital letter will cause the error. We will fix it on Feb 17 release. The current walkaround is to use small letters only.

@ezwiefel Based on the driver log, it looks like the code that was supposed to set up input_datasets is not run. Can you please paste the code that shows how you set up the conda dependencies? I don’t see you passing in a run configuration, which is where you would specify the conda dependencies, to the PythonScriptStep.

Hi, could you please look into code in comments above? I have added dependencies mentioned https://github.com/Azure/MachineLearningNotebooks/issues/707#issuecomment-567585408.

Hi,

I am facing the same issue. I am using TabularDataset. Installed below dependencies:

env = Environment('my_env')
cd = CondaDependencies.create(pip_packages=['tensorflow==1.12.0','keras==2.2.4','azureml-sdk','azureml-defaults','matplotlib', 'scikit-learn', 'azureml-dataprep[pandas,fuse]>=1.1.14'])
env.python.conda_dependencies = cd


 est = TensorFlow(source_directory=script_folder,
                 script_params=script_params,
                 compute_target=compute_target, 
                 inputs=[ds.as_named_input('my_data')],
                 entry_script='keras_lstm.py', 
                 environment_definition= env)

Script: dataset = run.input_datasets["my_data"]

Error:

return super().__getitem__(key)
KeyError: 'my_data'

Could someone please share solution if any?

Thanks, SJ