dask-cloudprovider: AzureVMCluster constructor just hangs after creating the scheduler.
What happened:
After creating a new cluster with the AzureVMCluster constructor, the run just hangs after creating the scheduler.
In the Azure Portal one can see the scheduler after it is created, it runs for a couple of minutes then it is stopped, presumably an indication that something went wrong, but the run does not fail, it just hangs
What you expected to happen: The cluster to be created with the number of workers specified
Minimal Complete Verifiable Example: In a new Conda environment
pip install dask-cloudprovider[azure] az login
from dask_cloudprovider.azure import AzureVMCluster
resource_group = "NGC-AML-Quick-Launch"
workspace_name = "NGC_AML_Quick_Launch_WS"
vnet="NGC-AML-Quick-Launch-vnet"
security_group="NGC-AML-Quick-Launch-nsg"
initial_node_count = 2
vm_size = "Standard_NC6s_v3"
location = "South Central US"
base_dockerfile = "rapidsai/rapidsai-core:cuda10.2-runtime-ubuntu18.04-py3.8"
base_dockerfile = "rapidsai/rapidsai-core-dev-nightly:0.18-cuda10.2-devel-ubuntu18.04-py3.8"
env_vars = {"EXTRA_CONDA_PACKAGES":"pywin32","EXTRA_PIP_PACKAGES": "dask-cloudprovider[azure] dask-cloudprovider[azure] --upgrade gcsfs dask_xgboost azureml"}
env_vars = {"EXTRA_PIP_PACKAGES": "dask-cloudprovider[azure]"}
cluster = AzureVMCluster(
resource_group=resource_group,
location = location,
vnet=vnet,
security_group=security_group,
n_workers=initial_node_count,
vm_size=vm_size,
docker_image=base_dockerfile,
docker_args="--privileged",
security=False,
env_vars=env_vars,
worker_class="dask_cuda.CUDAWorker")
Anything else we need to know?:
VM dask-7984db15-scheduler is created and can be seen on the Azure Portal, it runs for a few minutes then it is closed, but the run never crashes it just hangs
Environment:
- Dask version: 2021.02.0
- Python version: 3.8
- Operating System: windows
- Install method (conda, pip, source): pip
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 30 (16 by maintainers)
Commits related to this issue
- Fix for #257. When an env_vars variable is passed to the docker run command and it contains one or more space characters, we need to wrapp the value in double quotes. — committed to heiqs/dask-cloudprovider by deleted user 3 years ago
- Fix for #257. When an env_vars variable is passed to the docker run command and it contains one or more space characters, we need to wrapp the value in double quotes. — committed to heiqs/dask-cloudprovider by deleted user 3 years ago
- Fix for #257. (#258) When an env_vars variable is passed to the docker run command and it contains one or more space characters, we need to wrapp the value in double quotes. Co-authored-by: mo <... — committed to dask/dask-cloudprovider by heiqs 3 years ago
That file will exist on the Dask nodes, not the Jupyter Lab instance.
what was your extra_pip in this case?