distributed: Leaking semaphores tracking issue.

The purpose of this issue is to track cases of semaphore leaks with view of hopefully getting to the root cause and fixing it (assuming it is a problem!). This issue is to also hold any useful notes whilst narrowing down the problems.

Things that would be really helpful:

  • Anyone being able to supply a concrete reproducer of a semaphore leak coming from dask/distributed.
  • If the leak is not (easily) reproducible, anecdotal evidence about what was running at the time including things like:
    • The setup of dask-scheduler and dask-workers.
    • Roughly what the application code was doing at the time.
    • If any signals were sent to the code (e.g. SIGINT).
    • The reported message.

Many thanks for your help.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 8
  • Comments: 25 (14 by maintainers)

Most upvoted comments

@ijstokes, please wrap your code in if __name__ == '__main__' and retry.

@2gotgrossman because of how multiprocessing works under the hood. This is explained here: https://docs.python.org/3/library/multiprocessing.html#the-spawn-and-forkserver-start-methods

Happening to me. This program will trigger it:

import sys
from distributed import Client

client = Client()

sys.exit()

I have the latest versions of dask and distributed installed (see below for exact versions of all pkgs in conda environment). Nothing related to Python, Dask, or Bokeh are running (returns nothing):

$ ps -ef | grep -i -e bokeh -e python -e dask

Here is the output that I get before the program “hangs” (in the midst of the Client() object creation):

Traceback (most recent call last):
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/multiprocessing/forkserver.py", line 178, in main
    _serve_one(s, listener, alive_r, handler)
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/multiprocessing/forkserver.py", line 212, in _serve_one
    code = spawn._main(child_r)
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/multiprocessing/spawn.py", line 114, in _main
    prepare(preparation_data)
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/multiprocessing/spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/multiprocessing/spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/ijstokes/code/anaconda-download-data/semaphore_tracker_problem.py", line 4, in <module>
    client = Client()
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/distributed/client.py", line 400, in __init__
    self.start(timeout=timeout)
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/distributed/client.py", line 435, in start
    sync(self.loop, self._start, **kwargs)
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/distributed/utils.py", line 223, in sync
    six.reraise(*error[0])
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/six.py", line 686, in reraise
    raise value
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/distributed/utils.py", line 212, in f
    result[0] = yield make_coro()
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run
    value = future.result()
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/tornado/gen.py", line 1063, in run
    yielded = self.gen.throw(*exc_info)
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/distributed/client.py", line 478, in _start
    yield self.cluster._start()
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run
    value = future.result()
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/tornado/gen.py", line 1063, in run
    yielded = self.gen.throw(*exc_info)
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/distributed/deploy/local.py", line 149, in _start
    services=self.worker_services, **self.worker_kwargs)
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run
    value = future.result()
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/tornado/gen.py", line 1063, in run
    yielded = self.gen.throw(*exc_info)
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/distributed/deploy/local.py", line 155, in _start_all_workers
    yield [self._start_worker(**kwargs) for i in range(n_workers)]
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run
    value = future.result()
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/tornado/gen.py", line 828, in callback
    result_list.append(f.result())
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/tornado/gen.py", line 1063, in run
    yielded = self.gen.throw(*exc_info)
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/distributed/deploy/local.py", line 183, in _start_worker
    yield w._start()
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run
    value = future.result()
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/tornado/gen.py", line 1063, in run
    yielded = self.gen.throw(*exc_info)
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/distributed/nanny.py", line 104, in _start
    response = yield self.instantiate()
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run
    value = future.result()
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/tornado/gen.py", line 307, in wrapper
    yielded = next(result)
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/distributed/nanny.py", line 217, in instantiate
    self.process.start()
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/multiprocessing/context.py", line 291, in _Popen
    return Popen(process_obj)
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/multiprocessing/popen_forkserver.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/multiprocessing/spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/multiprocessing/spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

After I press CTRL-C I get this output:

^CTraceback (most recent call last):
  File "semaphore_tracker_problem.py", line 4, in <module>
    client = Client()
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/distributed/client.py", line 400, in __init__
    self.start(timeout=timeout)
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/distributed/client.py", line 435, in start
    sync(self.loop, self._start, **kwargs)
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/site-packages/distributed/utils.py", line 221, in sync
    e.wait(1000000)
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/threading.py", line 551, in wait
    signaled = self._cond.wait(timeout)
Traceback (most recent call last):
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/threading.py", line 299, in wait
  File "<string>", line 1, in <module>
    gotit = waiter.acquire(True, timeout)
KeyboardInterrupt
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/multiprocessing/forkserver.py", line 164, in main
    rfds = [key.fileobj for (key, events) in selector.select()]
  File "/Users/ijstokes/anaconda/envs/test/lib/python3.6/selectors.py", line 577, in select
    kev_list = self._kqueue.control(None, max_ev, timeout)
KeyboardInterrupt
/Users/ijstokes/anaconda/envs/test/lib/python3.6/multiprocessing/semaphore_tracker.py:129: UserWarning: semaphore_tracker: There appear to be 192 leaked semaphores to clean up at shutdown
  len(cache))

Here is my conda environment:

$ conda list --explicit
# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: osx-64
@EXPLICIT
https://repo.continuum.io/pkgs/free/osx-64/bokeh-0.12.5-py36_1.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/click-6.7-py36_0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/cloudpickle-0.2.2-py36_0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/dask-0.14.3-py36_1.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/distributed-1.16.3-py36_0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/heapdict-1.0.0-py36_1.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/jinja2-2.9.6-py36_0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/locket-0.2.0-py36_1.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/markupsafe-0.23-py36_2.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/mkl-2017.0.1-0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/msgpack-python-0.4.8-py36_0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/numpy-1.13.0-py36_0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/openssl-1.0.2l-0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/pandas-0.20.2-np113py36_0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/partd-0.3.8-py36_0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/pip-9.0.1-py36_1.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/psutil-5.2.2-py36_0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/python-3.6.1-2.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/python-dateutil-2.6.0-py36_0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/pytz-2017.2-py36_0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/pyyaml-3.12-py36_0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/readline-6.2-2.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/requests-2.14.2-py36_0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/setuptools-27.2.0-py36_0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/six-1.10.0-py36_0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/sortedcollections-0.5.3-py36_0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/sortedcontainers-1.5.7-py36_0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/sqlite-3.13.0-0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/tblib-1.3.2-py36_0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/tk-8.5.18-0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/toolz-0.8.2-py36_0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/tornado-4.5.1-py36_0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/wheel-0.29.0-py36_0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/xz-5.2.2-1.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/yaml-0.1.6-0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/zict-0.1.2-py36_0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/zlib-1.2.8-3.tar.bz2

And in case you want to try reproducing it:

$ conda list -e
# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: osx-64
bokeh=0.12.5=py36_1
click=6.7=py36_0
cloudpickle=0.2.2=py36_0
dask=0.14.3=py36_1
distributed=1.16.3=py36_0
heapdict=1.0.0=py36_1
jinja2=2.9.6=py36_0
locket=0.2.0=py36_1
markupsafe=0.23=py36_2
mkl=2017.0.1=0
msgpack-python=0.4.8=py36_0
numpy=1.13.0=py36_0
openssl=1.0.2l=0
pandas=0.20.2=np113py36_0
partd=0.3.8=py36_0
pip=9.0.1=py36_1
psutil=5.2.2=py36_0
python=3.6.1=2
python-dateutil=2.6.0=py36_0
pytz=2017.2=py36_0
pyyaml=3.12=py36_0
readline=6.2=2
requests=2.14.2=py36_0
setuptools=27.2.0=py36_0
six=1.10.0=py36_0
sortedcollections=0.5.3=py36_0
sortedcontainers=1.5.7=py36_0
sqlite=3.13.0=0
tblib=1.3.2=py36_0
tk=8.5.18=0
toolz=0.8.2=py36_0
tornado=4.5.1=py36_0
wheel=0.29.0=py36_0
xz=5.2.2=1
yaml=0.1.6=0
zict=0.1.2=py36_0
zlib=1.2.8=3