distributed: dask.distributed client hangs in VSCode

Assistance request for those using vscode (https://github.com/microsoft/vscode-python/issues/7845):

dask - 2.6.0 vscode - 1.39.2 (latest but version independent) python extension - 2019.10.41019 (it’s tied to this version but vscode team cannot find issue without diving into dask)

Steps for bug

from dask.distributed import Client
client = Client()
  • start Python Interactive and paste this in input terminal and we get never ending stream of
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Restarting worker
...
  • restart notebook server (using interactive window) and run same code and it runs fine.

If you run this code in python, ipython or jupyter notebook, no problems.

If anyone using vscode this is very easy to reproduce and if you can find a possible cause for the error I can open a new issue and hopefully point the team towards the right direction.

Thank you

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 4
  • Comments: 26 (11 by maintainers)

Most upvoted comments

@harrisliuwk I have experienced the same, it is tied to multiprocessing.

I agree with @mrocklin to reproduce this using just the multiprocessing and asyncio modules and find the (timing) issues mentioned in https://github.com/microsoft/vscode-python/issues/7845#issuecomment-543990049.

I have been using dask-kubernetes in azure aks and stopped trying to find the issue.

EDIT: just ran a quick test in VSCode

from dask.distributed import Client
client = Client()

generates an endless stream of errors:

tornado.application - ERROR - Exception in callback <bound method Nanny.memory_monitor of <Nanny: None, threads: 2>>
Traceback (most recent call last):
  File "/home/user/miniconda3/envs/test/lib/python3.8/site-packages/tornado/ioloop.py", line 907, in _run
    return self.callback()
  File "/home/user/miniconda3/envs/test/lib/python3.8/site-packages/distributed/nanny.py", line 414, in memory_monitor
    process = self.process.process
AttributeError: 'NoneType' object has no attribute 'process'

versions: dask: 2.20.0 distributed: 2.20.0 ms-python extension: 2020.7.94776 python: 3.8.3 os: linux

Just an update on identifying the problem.

platform: Windows 10 Version 1809 Dask: 2.20.0 VSCode: 1.47.2

It seems that the problem is compatibility between Dask and VSCode’s Python Interactive Window.

When you do

from distributed import Client
client = Client()

and

from distributed import LocalCluster 
client = LocalCluster()

both work well with powershell and VSCode’s Jupyter notebook environment. Note that the default for both Client and LocalCluster is processes=True.

But if you run the example codes above in Python Interactive Window, it just hangs forever, and there’s no error message printed out. Instead if you do

from distributed import Client
client = Client(processes=False)

and

from distributed import LocalCluster 
client = LocalCluster(processes=False)

in Interactive Window, the code will run as expected.

Or if anyone has the time to dive into what’s happening between VSCode and Dask that would be great.  As I mentioned before in this thread, I think that the thing to do here is to try to replicate this problem with the multiprocessing module without Dask to see what the actual culprit is.

It also happens with LocalCluster(), with LocalCluster(processes=False) there are no problems. So it’s tied to the processes.