distributed: Leaking semaphores tracking issue.
The purpose of this issue is to track cases of semaphore leaks with view of hopefully getting to the root cause and fixing it (assuming it is a problem!). This issue is to also hold any useful notes whilst narrowing down the problems.
Things that would be really helpful:
- Anyone being able to supply a concrete reproducer of a semaphore leak coming from dask/distributed.
- If the leak is not (easily) reproducible, anecdotal evidence about what was running at the time including things like:
- The setup of
dask-scheduleranddask-workers. - Roughly what the application code was doing at the time.
- If any signals were sent to the code (e.g.
SIGINT). - The reported message.
- The setup of
Many thanks for your help.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 8
- Comments: 25 (14 by maintainers)
@ijstokes, please wrap your code in
if __name__ == '__main__'and retry.@2gotgrossman because of how multiprocessing works under the hood. This is explained here: https://docs.python.org/3/library/multiprocessing.html#the-spawn-and-forkserver-start-methods
Happening to me. This program will trigger it:
I have the latest versions of dask and distributed installed (see below for exact versions of all pkgs in conda environment). Nothing related to Python, Dask, or Bokeh are running (returns nothing):
Here is the output that I get before the program “hangs” (in the midst of the
Client()object creation):After I press
CTRL-CI get this output:Here is my conda environment:
And in case you want to try reproducing it: