distributed: Unable to start a Client after update to latest dask

@mrocklin suggested to file an issue here.

In a restarted notebook I run:

import distributed
client = distributed.Client()

And get hundreds of these errors:

tornado.application - ERROR - Exception in Future <tornado.concurrent.Future object at 0x7fc162990be0> after timeout
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 910, in error_callback
    future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1063, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/conda/lib/python3.6/site-packages/distributed/nanny.py", line 300, in start
    yield self._wait_until_running()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run
    value = future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1069, in run
    yielded = self.gen.send(value)
  File "/opt/conda/lib/python3.6/site-packages/distributed/nanny.py", line 386, in _wait_until_running
    raise ValueError("Worker not started")
ValueError: Worker not started
tornado.application - ERROR - Exception in Future <tornado.concurrent.Future object at 0x7fc1629b82b0> after timeout
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 910, in error_callback
    future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1063, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/conda/lib/python3.6/site-packages/distributed/nanny.py", line 300, in start
    yield self._wait_until_running()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run
    value = future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1069, in run
    yielded = self.gen.send(value)
  File "/opt/conda/lib/python3.6/site-packages/distributed/nanny.py", line 386, in _wait_until_running
    raise ValueError("Worker not started")
ValueError: Worker not started

I can reproduce this with:

docker run -it --rm quantumtinkerer/jupyter-research:latest bash

# create a new env or just use the current one where `distributed` is already installed
conda create --yes -n dask python=3.6 dask distributed
source activate dask

python
import distributed
c = distributed.Client()

The Docker image is based on jupyter/docker-stacks/base-notebook.

The weird thing is that it only happens on our server where we have a Jupyterhub. When I try it on a different machine there doesn’t seem to be an issue.

Any idea on how I can debug this?

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 18 (11 by maintainers)

Most upvoted comments

My first guess would be some networking issue. You might try the following:

  1. Use Client(processes=False) which will avoid networking issues entirely
  2. Try setting up a dask-scheduler and dask-worker processes manually to see if that produces more fine-grained error messages

@basnijholt thank you for access to your system. I tried updating to master with !pip install git+https://github.com/dask/distributed.git --upgrade and things seem to work now:

image

I should have a bit of time to look at this starting tomorrow.