distributed: Nanny error: Worker process was killed by unknown signal
distributed.nanny - WARNING - Worker process 13375 was killed by unknown signal
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Worker process 13377 was killed by unknown signal
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Worker process 13372 was killed by unknown signal
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Worker process 13383 was killed by unknown signal
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Worker process 13373 was killed by unknown signal
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Worker process 13384 was killed by unknown signal
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Worker process 13380 was killed by unknown signal
distributed.nanny - WARNING - Restarting worker
Happens without fail when using read_parquet with fastparquet can be avoided with pyarrow but still happens x% of the time. (x depends on how you setup n_workers, n_clients, memory_limit in client but would say is always greater than 25%).
My machine runs Fedora 27 and I was able to work around the problem by setting multiprocessing-method to spawn thanks to help from @mrocklin.
(In debugging this with @mrocklin we were never able to get more information out about what the root cause was).
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 15 (8 by maintainers)
Another (self contained, though less minimal) example:
So far so good. Then
df2.mean().compute()results in traceback:Similar code snippets which execute as expected:
df['a_1']=(df['a_1']*10000).astype(int).np.random.rand(3_000_000,20)tonp.random.rand(2_000_000,20)This issue would benefit from a minimum reproducible example.