dask-jobqueue: tornado.application - ERROR - Exception in callback functools.partial(>,

I am trying to do data analysis on the 9900 parquet files that in total they have 100GB size.
After 70K garbage collections warning: distributed.utils_perf - WARNING - full garbage collections took 60% CPU time recently (threshold: 10%)

My job killed and there is the following error.

distributed.utils_perf - WARNING - full garbage collections took 60% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 59% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 56% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 56% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 60% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 62% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 61% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 56% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 59% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 61% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 56% CPU time recently (threshold: 10%)
distributed.worker - INFO - Connection to scheduler broken.  Reconnecting...
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x2ab42ba925d0>>, <Task finished coro=<Worker.heartbeat() done, defined at /galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/worker.py:883> exception=CommClosedError('in <closed TCP>: ConnectionResetError: [Errno 104] Connection reset by peer')>)
Traceback (most recent call last):
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/comm/tcp.py", line 188, in read
    n_frames = await stream.read_bytes(8)
tornado.iostream.StreamClosedError: Stream is closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/tornado/ioloop.py", line 743, in _run_callback
    ret = callback()
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/tornado/ioloop.py", line 767, in _discard_future_result
    future.result()
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/worker.py", line 920, in heartbeat
    raise e
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/worker.py", line 893, in heartbeat
    metrics=await self.get_metrics(),
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/utils_comm.py", line 391, in retry_operation
    operation=operation,
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/utils_comm.py", line 379, in retry
    return await coro()
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/core.py", line 757, in send_recv_from_rpc
    result = await send_recv(comm=comm, op=key, **kwargs)
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/core.py", line 540, in send_recv
    response = await comm.read(deserializers=deserializers)
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/comm/tcp.py", line 208, in read
    convert_stream_closed_error(self, e)
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/comm/tcp.py", line 121, in convert_stream_closed_error
    raise CommClosedError("in %s: %s: %s" % (obj, exc.__class__.__name__, exc))
distributed.comm.core.CommClosedError: in <closed TCP>: ConnectionResetError: [Errno 104] Connection reset by peer
distributed.worker - INFO - Connection to scheduler broken.  Reconnecting...
distributed.worker - INFO - Connection to scheduler broken.  Reconnecting...
distributed.worker - INFO - Connection to scheduler broken.  Reconnecting...
distributed.worker - INFO - Connection to scheduler broken.  Reconnecting...
distributed.worker - INFO - Connection to scheduler broken.  Reconnecting...
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x2b3465022590>>, <Task finished coro=<Worker.heartbeat() done, defined at /galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/worker.py:883> exception=CommClosedError('in <closed TCP>: ConnectionResetError: [Errno 104] Connection reset by peer')>)
Traceback (most recent call last):
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/comm/tcp.py", line 188, in read
    n_frames = await stream.read_bytes(8)
tornado.iostream.StreamClosedError: Stream is closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/tornado/ioloop.py", line 743, in _run_callback
    ret = callback()
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/tornado/ioloop.py", line 767, in _discard_future_result
    future.result()
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/worker.py", line 920, in heartbeat
    raise e
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/worker.py", line 893, in heartbeat
    metrics=await self.get_metrics(),
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/utils_comm.py", line 391, in retry_operation
    operation=operation,
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/utils_comm.py", line 379, in retry
    return await coro()
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/core.py", line 757, in send_recv_from_rpc
    result = await send_recv(comm=comm, op=key, **kwargs)
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/core.py", line 540, in send_recv
    response = await comm.read(deserializers=deserializers)
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/comm/tcp.py", line 208, in read
    convert_stream_closed_error(self, e)
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/comm/tcp.py", line 121, in convert_stream_closed_error
    raise CommClosedError("in %s: %s: %s" % (obj, exc.__class__.__name__, exc))
distributed.comm.core.CommClosedError: in <closed TCP>: ConnectionResetError: [Errno 104] Connection reset by peer
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x2adcf6fabb50>>, <Task finished coro=<Worker.heartbeat() done, defined at /galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/worker.py:883> exception=CommClosedError('in <closed TCP>: ConnectionResetError: [Errno 104] Connection reset by peer')>)
Traceback (most recent call last):
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/comm/tcp.py", line 188, in read
    n_frames = await stream.read_bytes(8)
tornado.iostream.StreamClosedError: Stream is closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/tornado/ioloop.py", line 743, in _run_callback
    ret = callback()
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/tornado/ioloop.py", line 767, in _discard_future_result
    future.result()
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/worker.py", line 920, in heartbeat
    raise e
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/worker.py", line 893, in heartbeat
    metrics=await self.get_metrics(),
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/utils_comm.py", line 391, in retry_operation
    operation=operation,
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/utils_comm.py", line 379, in retry
    return await coro()
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/core.py", line 757, in send_recv_from_rpc
    result = await send_recv(comm=comm, op=key, **kwargs)
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/core.py", line 540, in send_recv
    response = await comm.read(deserializers=deserializers)
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/comm/tcp.py", line 208, in read
    convert_stream_closed_error(self, e)
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/comm/tcp.py", line 121, in convert_stream_closed_error
    raise CommClosedError("in %s: %s: %s" % (obj, exc.__class__.__name__, exc))
distributed.comm.core.CommClosedError: in <closed TCP>: ConnectionResetError: [Errno 104] Connection reset by peer
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x2ba64a584990>>, <Task finished coro=<Worker.heartbeat() done, defined at /galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/worker.py:883> exception=CommClosedError('in <closed TCP>: ConnectionResetError: [Errno 104] Connection reset by peer')>)
Traceback (most recent call last):
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/comm/tcp.py", line 188, in read
    n_frames = await stream.read_bytes(8)
tornado.iostream.StreamClosedError: Stream is closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/tornado/ioloop.py", line 743, in _run_callback
    ret = callback()
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/tornado/ioloop.py", line 767, in _discard_future_result
    future.result()
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/worker.py", line 920, in heartbeat
    raise e
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/worker.py", line 893, in heartbeat
    metrics=await self.get_metrics(),
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/utils_comm.py", line 391, in retry_operation
    operation=operation,
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/utils_comm.py", line 379, in retry
    return await coro()
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/core.py", line 757, in send_recv_from_rpc
    result = await send_recv(comm=comm, op=key, **kwargs)
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/core.py", line 540, in send_recv
    response = await comm.read(deserializers=deserializers)
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/comm/tcp.py", line 208, in read
    convert_stream_closed_error(self, e)
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/comm/tcp.py", line 121, in convert_stream_closed_error
    raise CommClosedError("in %s: %s: %s" % (obj, exc.__class__.__name__, exc))
distributed.comm.core.CommClosedError: in <closed TCP>: ConnectionResetError: [Errno 104] Connection reset by peer
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x2ac978e74f90>>, <Task finished coro=<Worker.heartbeat() done, defined at /galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/worker.py:883> exception=CommClosedError('in <closed TCP>: ConnectionResetError: [Errno 104] Connection reset by peer')>)
Traceback (most recent call last):
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/comm/tcp.py", line 188, in read
    n_frames = await stream.read_bytes(8)
tornado.iostream.StreamClosedError: Stream is closed

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 1
  • Comments: 16 (6 by maintainers)

Most upvoted comments

Did you find the solution?

Hi @vishnuk94, are you getting this exception with the same code as @MSKazemi?

I finally took some time to try @MSKazemi code. I’m still at the data generation part, it is first generating a dataset in memory using Pandas which takes a long time. It would be better to generate the dataset directly using Dask DataFrame.