azure-sdk-for-python: Dataset.download() hangs for a long time and, when done, some files are missing in the destination folder
- Package Name: azureml-core | azureml-dataset-runtime
- Package Version: 1.36.0.post2 | 1.36.0
- Operating System: Windows 10, macOS
- Python Version: 3.8
Describe the bug
I tried to use Dataset.download()
method to download a registered dataset (made of multiple files) in my personal computer (Windows 10, ~50Mbps connection). For small test datasets (a few MBs), it works as expected. For bigger datasets (~3GB) the download hangs or it terminates after a long time with no exception or logging errors. Furthermore, some of the files are missing in the target folder. The same happens in the macOS laptop of my colleague.
Everything works properly in my Azure ML virtual machine (running Linux).
To Reproduce
- Register a dataset from a datastore.
- Try to download it with
from azureml.core import Workspace, Dataset
# Fill workspace arguments
workspace = Workspace.get(
name="",
subscription_id="",
resource_group="",
auth=InteractiveLoginAuthentication(tenant_id="")
)
dataset= Dataset.get_by_name(workspace, "<dataset-name>")
dataset.download("your/target/path")
Expected behavior The dataset is downloaded to “your/target/path”.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 1
- Comments: 18 (6 by maintainers)
@janluke Sorry for the delay in getting back to you. This is definitely neither expected nor something that we have seen before. Given that you can repro this across machines and users, I think the issue is in your specific set of files and setup. To help investigate this better could you please share some additional info with me:
dataset._dataflow._steps
in your envronmentThanks !