distributed: Can't start worker on MacOS

On current master distributed-2.9.1+13.g7d2ed43c (and tornado==6.0.3), I get the following error on MacOS but not in Docker:

# already running dask scheduler
(.venv) ➜  model git:(nyc) ✗ PYTHONPATH=. dask-worker 'localhost:8786' --nthreads 1 --memory-limit 6GB --local-directory /tmp/ --nprocs 1
distributed.nanny - INFO -         Start Nanny at: 'tcp://127.0.0.1:55985'
objc[39479]: +[__NSPlaceholderDictionary initialize] may have been in progress in another thread when fork() was called.
objc[39479]: +[__NSPlaceholderDictionary initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
distributed.nanny - INFO - Worker process 39479 was killed by signal 6
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOMainLoop object at 0x118220490>>, <Task finished coro=<Nanny._on_exit() done, defined at /Users/brett/model/.venv/lib/python3.7/site-packages/distributed/nanny.py:396> exception=TypeError('addresses should be strings or tuples, got None')>)
Traceback (most recent call last):
  File "/Users/brett/model/.venv/lib/python3.7/site-packages/tornado/ioloop.py", line 743, in _run_callback
    ret = callback()
  File "/Users/brett/model/.venv/lib/python3.7/site-packages/tornado/ioloop.py", line 767, in _discard_future_result
    future.result()
  File "/Users/brett/model/.venv/lib/python3.7/site-packages/distributed/nanny.py", line 399, in _on_exit
    await self.scheduler.unregister(address=self.worker_address)
  File "/Users/brett/model/.venv/lib/python3.7/site-packages/distributed/core.py", line 757, in send_recv_from_rpc
    result = await send_recv(comm=comm, op=key, **kwargs)
  File "/Users/brett/model/.venv/lib/python3.7/site-packages/distributed/core.py", line 556, in send_recv
    raise exc.with_traceback(tb)
  File "/Users/brett/model/.venv/lib/python3.7/site-packages/distributed/core.py", line 408, in handle_comm
    result = handler(comm, **msg)
  File "/Users/brett/model/.venv/lib/python3.7/site-packages/distributed/scheduler.py", line 2122, in remove_worker
    address = self.coerce_address(address)
  File "/Users/brett/model/.venv/lib/python3.7/site-packages/distributed/scheduler.py", line 4831, in coerce_address
    raise TypeError("addresses should be strings or tuples, got %r" % (addr,))
TypeError: addresses should be strings or tuples, got None
distributed.nanny - INFO - Closing Nanny at 'tcp://127.0.0.1:55985'
distributed.dask_worker - INFO - End worker

No issues in the latest release (2.9.1). Possibly related to #3356 but I didn’t see any similar error messages so I figured I’d keep it separate for now.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 30 (28 by maintainers)

Most upvoted comments

I am now able to reproduce. I am on Mojave, I used brew to build a venv and pip installed same as @jrbourbeau . I am also getting a lovely pop up with the following details:

Termination Reason: Namespace OBJC, Code 0x1

Application Specific Information: objc[74025]: +[__NSPlaceholderDictionary initialize] may have been in progress in another thread when fork() was called. crashed on child side of fork pre-exec

Thread 0 Crashed: 0 libsystem_kernel.dylib 0x00007fff77fee016 __abort_with_payload + 10 1 libsystem_kernel.dylib 0x00007fff77fe95db abort_with_payload_wrapper_internal + 82 2 libsystem_kernel.dylib 0x00007fff77fe9589 abort_with_reason + 22 3 libobjc.A.dylib 0x00007fff766cf8dd _objc_fatalv(unsigned long long, unsigned long long, char const*, __va_list_tag*) + 108 4 libobjc.A.dylib 0x00007fff766cf78f _objc_fatal(char const*, …) + 135 5 libobjc.A.dylib 0x00007fff766d060f performForkChildInitialize(objc_class*, objc_class*) + 341 6 libobjc.A.dylib 0x00007fff766d162f initializeAndMaybeRelock(objc_class*, objc_object*, mutex_tt<false>&, bool) + 187 7 libobjc.A.dylib 0x00007fff766c0690 lookUpImpOrForward + 228 8 libobjc.A.dylib 0x00007fff766c0114 _objc_msgSend_uncached + 68 9 libobjc.A.dylib 0x00007fff766c369b +[NSObject new] + 86 10 com.apple.Foundation 0x00007fff4e18f478 -[NSThread init] + 61 11 com.apple.Foundation 0x00007fff4e18f3e2 ____NSThreads_block_invoke + 64 12 libdispatch.dylib 0x00007fff77e4e63d _dispatch_client_callout + 8 13 libdispatch.dylib 0x00007fff77e4fd4b _dispatch_once_callout + 20 14 com.apple.Foundation 0x00007fff4e18f39d _NSThreadGet0 + 325 15 com.apple.Foundation 0x00007fff4e18ec31 _NSInitializePlatform + 407 16 libobjc.A.dylib 0x00007fff766c1d51 call_load_methods + 233 17 libobjc.A.dylib 0x00007fff766bf405 load_images + 117 18 dyld 0x000000011bdcb46a dyld::notifySingle(dyld_image_states, ImageLoader const*,

--no-nanny also resolves the issue. Additionally, using spawn for the multiprocessing-method in ~/.config/dask/distributed.yaml also resolves the problem

  worker:
    multiprocessing-method: spawn

Digging a little more into brew and multiprocessing, it appears that this has been an issue in the past: https://bugs.python.org/issue33725