dask-cuda: [BUG] sync client constructor hangs on connecting to an async localcudacuster
Describe the bug
Connecting Client(synchronous=False) to a LocalCUDACluster(synchronous=True) hangs. This means an async localcudacluster cannot be used with RAPIDS libraries that expect a sync client, e.g., blazingsql. (cc @felipeblazing @kkraus14 ) .
Demo:
import asyncio, cudf, dask_cudf
from dask_cuda import LocalCUDACluster
from dask.distributed import Client
async def main():
async with await LocalCUDACluster(asynchronous=True, dashboard_address=None) as cluster_async:
print('making sync client..') ### last message to get printed
with Client(address=cluster_async, asynchronous=False) as client_sync:
print('exiting client..')
print('exiting cluster..')
print('cleaned up.')
asyncio.run(main())
=>
making sync client..
-------------------------
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x7f72a2bccf90>>, <Task finished coro=<Worker.heartbeat() done, defined at /conda/envs/rapids/lib/python3.7/site-packages/distributed/worker.py:929> exception=OSError('Timed out during handshake while connecting to tcp://127.0.0.1:33281 after 10 s')>)
Traceback (most recent call last):
File "/conda/envs/rapids/lib/python3.7/site-packages/distributed/comm/core.py", line 319, in connect
handshake = await asyncio.wait_for(comm.read(), time_left())
File "/conda/envs/rapids/lib/python3.7/asyncio/tasks.py", line 449, in wait_for
raise futures.TimeoutError()
concurrent.futures._base.TimeoutError
Steps/Code to reproduce bug
See above
Expected behavior
Code to terminate without exceptions
Environment overview (please complete the following information)
ubuntu w/ 10.2 -> docker ubuntu 18 -> conda rapids=17
Additional context
This is about LocalCUDACluster. When the cluster is started separately (dask-scheduler / dask-cuda-worker + connecting by address), initial testing makes it look fine to mix sync + async clients, at least in separate processes.
About this issue
- Original URL
- State: open
- Created 3 years ago
- Comments: 15 (3 by maintainers)
The context statement is incorrectly stated:
Should be