dask-cloudprovider: Not able to create Dask client using AzureMLCluster object
I have created a Dask cluster on Azure ML using the following API.
amlcluster = AzureMLCluster(ws,
vm_size="STANDARD_D1",
datastores=[Datastore.get(ws, "my_datastore")],
environment_definition=ws.environments['AzureML-Dask-CPU'],
initial_node_count=2,
scheduler_idle_timeout=600,
vnet='some_vnet',
subnet='subnet1',
vnet_resource_group='some_rsrc_group',
ct_name="my_dask_cluster"
)
Once the cluster is created, if I try to print the variable amlcluster
in Jupyter Lab, it throws the following error.
KeyError Traceback (most recent call last) /anaconda/envs/azureml_custom_py37/lib/python3.7/site-packages/IPython/core/formatters.py in call(self, obj) 916 method = get_real_method(obj, self.print_method) 917 if method is not None: –> 918 method() 919 return True 920
/anaconda/envs/azureml_custom_py37/lib/python3.7/site-packages/distributed/deploy/cluster.py in ipython_display(self, **kwargs) 361 from IPython.display import display 362 –> 363 data = {“text/plain”: repr(self), “text/html”: self.repr_html()} 364 display(data, raw=True) 365
/anaconda/envs/azureml_custom_py37/lib/python3.7/site-packages/distributed/deploy/cluster.py in repr(self) 389 self._cluster_class_name, 390 self.scheduler_address, –> 391 len(self.scheduler_info[“workers”]), 392 sum(w[“nthreads”] for w in self.scheduler_info[“workers”].values()), 393 )
KeyError: ‘workers’
After the error, it provides just a Dashboard Link. Not sure if it is supposed to print anything else.
If I try to create a Dask Client that alos fails:
client = Client(amlcluster)
This is how the library version looks like for me:
dask 2.20.0 py_0 dask-cloudprovider 0.4.1 <pip> dask-core 2.20.0 py_0 dask-glm 0.2.0 py_1 conda-forge dask-ml 1.6.0 py_0 conda-forge dask-xgboost 0.1.10 <pip>
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 16 (16 by maintainers)
Commits related to this issue
- Update requirements for distributed closes #165 #145 — committed to nickeubank/dask-cloudprovider by nickeubank 4 years ago
- Update requirements for distributed (#166) * Update requirements for distributed closes #165 #145 * update to 2.30. see discussion in #165 -- 2.15 not enough apparently. — committed to dask/dask-cloudprovider by nickeubank 4 years ago
Huh, ok! Guess the problem is more recent than 2.20! Maybe there’s another line of code added somewhere elses. I’ll change the bump suggestion in #165 to 2.30.
Replicated! I downgraded distributed to 2.11, and I got this:
Opening new issue about bumping the version requirement.
@arnabbiswas1 what version of
distributed
do you have? Looks like my students had 2.11, and comparing 2.11 to 2.30, it looks like we might have an explanation:See
distributed/deploy/cluster.py
diff here: https://github.com/dask/distributed/compare/2.11.0...2.30.1I have three students who are getting the same thing, and we cannot for the life of us get it figured out. We’ve tried upgrading dask (to 2.30), dask-cloudprovider (0.4.1), upgrading jupyter, making sure widgets work, etc.
When the students try and print
amlcluster
, they get:And if they try and pass it to
Client()
, they get:Meanwhile, on Azure they seem to have three happy active nodes, so seems like the cluster was created, there’s just a subsequent issue…
(@arnabbiswas1 you’re making me feel bad because that’s my site! And yet… I don’t know the problem! Sorry. 😦 )
It’s also extremely consistent – we spent an hour and a half trying to figure it out, and over and over we got this.
@arnabbiswas1 are you on the same vnet as the cluster trying to access it?
@quasiben @drabastomek fyi I setup a test which runs every 2 hours here: https://github.com/Azure/azureml-examples/actions?query=workflow%3Arun-tutorial-ud - it is mostly green, majority of the failures have been my fault