MachineLearningNotebooks: The `run.input_datasets` dictionary is empty - even after passing into the PythonScriptStep
The run.input_datasets
dictionary is empty - even after passing into the PythonScriptStep.
Pipeline.ipynb
input_dataset = Dataset.get_by_name(ws, name='super_secret_data')
cleanStep = PythonScriptStep(
script_name = "clean.py",
inputs = [input_dataset.as_named_input('important_dataset')],
outputs = [output_data],
compute_target = cpu_cluster,
source_directory = experiment_folder
)
clean.py
run = Run.get_context()
print(run.input_datasets)
input_ds = run.input_datasets['important_dataset']
input_df = input_ds.to_pandas_dataframe()
When the pipeline is run, the log for the clean.py step shows the run.input_datasets
object is an empty dict and therefore the script fails with a KeyError.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 1
- Comments: 29 (8 by maintainers)
@MayMSFT Ok, thanks!
Perhaps it’s good to add a note with the run.input_datasets that the attribute
input_datasets
remains empty?The first thing I tried was to use this information to register the dataset to the model.
I’m also having this issue (
Azure ML SDK Version: 1.6.0
)70_driver_log.txt
run.get_details()['inputDatasets']
shows the datasets that I gave as inputsrun.input_datasets
is{}
run.register_model()
registers the model without reference to the input datasets.The above happens regardless of local or a compute instance in azure.
@MayMSFT apologies if this is not the correct place to raise this issue.
thanks Anders. This was caused by a bug in our code. For dataset.as_named_input(), passing string with capital letter will cause the error. We will fix it on Feb 17 release. The current walkaround is to use small letters only.
Hi, could you please look into code in comments above? I have added dependencies mentioned https://github.com/Azure/MachineLearningNotebooks/issues/707#issuecomment-567585408.
Hi,
I am facing the same issue. I am using TabularDataset. Installed below dependencies:
Script:
dataset = run.input_datasets["my_data"]
Error:
Could someone please share solution if any?
Thanks, SJ