dask-jobqueue: AssertionError (assert count > 0) in SLURMCluster._adapt
When I run a slurm cluster with adapt() I sometimes get the following crash (but this is not deterministic and I have not identified a way to trigger it more often).
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f59359aed90>, <Future finished exception=AssertionError()>)
Traceback (most recent call last):
File "/scratch/ogrisel/miniconda3/lib/python3.7/site-packages/tornado/ioloop.py", line 758, in _run_callback
ret = callback()
File "/scratch/ogrisel/miniconda3/lib/python3.7/site-packages/tornado/stack_context.py", line 300, in null_wrapper
return fn(*args, **kwargs)
File "/scratch/ogrisel/miniconda3/lib/python3.7/site-packages/tornado/ioloop.py", line 779, in _discard_future_result
future.result()
File "/scratch/ogrisel/miniconda3/lib/python3.7/site-packages/tornado/gen.py", line 1141, in run
yielded = self.gen.throw(*exc_info)
File "/scratch/ogrisel/miniconda3/lib/python3.7/site-packages/distributed/deploy/adaptive.py", line 334, in _adapt
workers = yield self._retire_workers(workers=recommendations['workers'])
File "/scratch/ogrisel/miniconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run
value = future.result()
File "/scratch/ogrisel/miniconda3/lib/python3.7/site-packages/tornado/gen.py", line 1141, in run
yielded = self.gen.throw(*exc_info)
File "/scratch/ogrisel/miniconda3/lib/python3.7/site-packages/distributed/deploy/adaptive.py", line 242, in _retire_workers
close_workers=True)
File "/scratch/ogrisel/miniconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run
value = future.result()
File "/scratch/ogrisel/miniconda3/lib/python3.7/site-packages/tornado/gen.py", line 1141, in run
yielded = self.gen.throw(*exc_info)
File "/scratch/ogrisel/miniconda3/lib/python3.7/site-packages/distributed/scheduler.py", line 2800, in retire_workers
n=1, delete=False)
File "/scratch/ogrisel/miniconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run
value = future.result()
File "/scratch/ogrisel/miniconda3/lib/python3.7/site-packages/tornado/gen.py", line 1147, in run
yielded = self.gen.send(value)
File "/scratch/ogrisel/miniconda3/lib/python3.7/site-packages/distributed/scheduler.py", line 2613, in replicate
assert count > 0
AssertionError
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 33 (18 by maintainers)
@ogrisel do you still encounter this bug?