distributed: Worker failed to start

import distributed print(distributed.__version__) 1.21.2

import tornado print(tornado.version) 4.5.3

from dask.distributed import Client, LocalCluster client = Client()

tornado.application - ERROR - Multiple exceptions in yield list Traceback (most recent call last): File “C:\Users\brahm\Anaconda3\lib\site-packages\tornado\gen.py”, line 1069, in run yielded = self.gen.send(value) File “C:\Users\brahm\Anaconda3\lib\site-packages\distributed\deploy\local.py”, line 196, in _start_worker raise gen.TimeoutError(“Worker failed to start”) tornado.gen.TimeoutError: Worker failed to start

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File “C:\Users\brahm\Anaconda3\lib\site-packages\tornado\gen.py”, line 828, in callback result_list.append(f.result()) File “C:\Users\brahm\Anaconda3\lib\site-packages\tornado\concurrent.py”, line 238, in result raise_exc_info(self._exc_info) File “<string>”, line 4, in raise_exc_info File “C:\Users\brahm\Anaconda3\lib\site-packages\tornado\gen.py”, line 1069, in run yielded = self.gen.send(value) File “C:\Users\brahm\Anaconda3\lib\site-packages\distributed\deploy\local.py”, line 196, in _start_worker raise gen.TimeoutError(“Worker failed to start”) tornado.gen.TimeoutError: Worker failed to start tornado.application - ERROR - Multiple exceptions in yield list Traceback (most recent call last): File “C:\Users\brahm\Anaconda3\lib\site-packages\tornado\gen.py”, line 1069, in run yielded = self.gen.send(value) File “C:\Users\brahm\Anaconda3\lib\site-packages\distributed\deploy\local.py”, line 196, in _start_worker raise gen.TimeoutError(“Worker failed to start”) tornado.gen.TimeoutError: Worker failed to start

During handling of the above exception, another exception occurred: …

About this issue

Original URL
State: open
Created 6 years ago
Comments: 54 (36 by maintainers)

Most upvoted comments

If I pass processes=False, then it works just fine. If I run desk-scheduler and desk-worker in the command line and connect to it via Client in both python2 and python3, it works just fine.

I believe this has to do with how processes work on macOS. If a process uses the libdispatch library for asynchronous work, the OS marks it as a multi-threaded process complete with an objective C runtime. A process with an objective C runtime under the hood can NOT be forked (i.e. it crashes)

So my theory is that if a python process uses any threading (implemented under the hood with libdispatch) prior to forking, it will crash.

Starting in python3, you can specify whether to spawn a fresh new python process which circumvents the issue or forkserver (the default case).

So to reiterate:

python3 + “spawn” + LocalCluster()=> success python3 + “forkserver” + LocalCluster() => fail python2 + LocalCluster() => fail

python3 + “forkserver” + LocalCluster(processes=False) => success python3 + “spawn” + LocalCluster(processes=False) => success python2 + LocalCluster(processes=False) => success

Given that my workload is cpu bound and running python2, using a thread pool over a process pool won’t give me the speedup that I’m looking for.

On a related note: This issue can be pretty subtle. I first ran into this issue while using the requests library. In one of the comments of https://stackoverflow.com/questions/28521535/requests-how-to-disable-bypass-proxy, it says that requests will check to see if the system has configured any proxies, which requires the python process to communicate with cfprefsd which then marks it as a multi-threaded environment. Then if you try to fork the python process, then it will crash.

allentsouhuang on Nov 15, 2018

Can I ask you to try the following?

Avoid processes with client = Client(processes=False)
See if you can create things locally with the command line? http://dask.pydata.org/en/latest/setup/cli.html

mrocklin on Mar 9, 2018

i got the solution to this problem if client is called inside main then this works fine

Athlete369 on Jun 14, 2019

@Athlete369 can you provide more information? What version of distributed? What’s the traceback?

TomAugspurger on Jun 13, 2019