dask-cloudprovider: AzureVMCluster constructor just hangs after creating the scheduler.

What happened:

After creating a new cluster with the AzureVMCluster constructor, the run just hangs after creating the scheduler.

In the Azure Portal one can see the scheduler after it is created, it runs for a couple of minutes then it is stopped, presumably an indication that something went wrong, but the run does not fail, it just hangs

What you expected to happen: The cluster to be created with the number of workers specified

Minimal Complete Verifiable Example: In a new Conda environment

pip install dask-cloudprovider[azure] az login

from dask_cloudprovider.azure import AzureVMCluster
resource_group = "NGC-AML-Quick-Launch"
workspace_name = "NGC_AML_Quick_Launch_WS"
vnet="NGC-AML-Quick-Launch-vnet"
security_group="NGC-AML-Quick-Launch-nsg"
initial_node_count = 2
vm_size = "Standard_NC6s_v3"
location = "South Central US"
base_dockerfile = "rapidsai/rapidsai-core:cuda10.2-runtime-ubuntu18.04-py3.8"
base_dockerfile = "rapidsai/rapidsai-core-dev-nightly:0.18-cuda10.2-devel-ubuntu18.04-py3.8"
env_vars = {"EXTRA_CONDA_PACKAGES":"pywin32","EXTRA_PIP_PACKAGES": "dask-cloudprovider[azure] dask-cloudprovider[azure] --upgrade  gcsfs dask_xgboost azureml"}
env_vars = {"EXTRA_PIP_PACKAGES": "dask-cloudprovider[azure]"}

cluster = AzureVMCluster(
resource_group=resource_group,
location = location,
vnet=vnet,
security_group=security_group,
n_workers=initial_node_count,
vm_size=vm_size,
docker_image=base_dockerfile,
docker_args="--privileged",
security=False,
env_vars=env_vars,
worker_class="dask_cuda.CUDAWorker")

Anything else we need to know?: Screenshot (16)

VM dask-7984db15-scheduler is created and can be seen on the Azure Portal, it runs for a few minutes then it is closed, but the run never crashes it just hangs

Environment:

  • Dask version: 2021.02.0
  • Python version: 3.8
  • Operating System: windows
  • Install method (conda, pip, source): pip

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 30 (16 by maintainers)

Commits related to this issue

Most upvoted comments

That file will exist on the Dask nodes, not the Jupyter Lab instance.

rapidsai/rapidsai-core-nightly:0.18-cuda10.2-runtime-ubuntu18.04-py3.8

what was your extra_pip in this case?