wandb: [CLI]: BrokenPipeError: [Errno 32] Broken pipe

Bub description

Training and logging run fine; however, at the end of the process, the wandb outputs the error message below.

wandb: Waiting for W&B process to finish... (success).
wandb: \ 0.014 MB of 0.014 MB uploaded (0.000 MB deduped)
wandb: Run history:
wandb:               epoch ▁▁▁▁▁▁▁▁▃▃▃▃▃▃▃▃▅▅▅▅▅▅▅▅▆▆▆▆▆▆▆▆████████
wandb:          train_loss █▄▃▂▂▃▄▃▁▂▂▄▄▂▃▂▁▂▁▁▁▁▁▁▃▁▁▁▁▂▁▁▁▁▁▅▁▁▁▃
wandb: trainer/global_step ▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
wandb:          val_Mac-F1 ▁▄▆▅█
wandb:          val_Mic-F1 ▁▄▇▇█
wandb:          val_Wei-F1 ▁▅▇▇█
wandb:            val_loss █▂▁▄▆
wandb: 
wandb: Run summary:
wandb:               epoch 4
wandb:          train_loss 0.47728
wandb: trainer/global_step 5534
wandb:          val_Mac-F1 0.70413
wandb:          val_Mic-F1 0.88889
wandb:          val_Wei-F1 0.93459
wandb:            val_loss 0.46428
wandb: 
wandb: 🚀 View run BERT_WEBKB_0_exp at: https://wandb.ai/celsofranca/lightning_logs/runs/1qq5guxx
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: /tmp/wandb/run-20231012_134638-1qq5guxx/logs
Exception in thread Exception in thread IntMsgThrNetStatThr:
:
Traceback (most recent call last):
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 299, in check_internal_messages
    self._target(*self._args, **self._kwargs)
  File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 267, in check_network_status
    self._loop_check_status(
  File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 223, in _loop_check_status
    self._loop_check_status(
  File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 223, in _loop_check_status
    local_handle = request()
  File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 743, in deliver_internal_messages
    return self._deliver_internal_messages(internal_message)
    local_handle = request()
  File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/interface/interface_shared.py", line 481, in _deliver_internal_messages
  File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 735, in deliver_network_status
    return self._deliver_record(record)
  File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/interface/interface_shared.py", line 428, in _deliver_record
    return self._deliver_network_status(status)
  File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/interface/interface_shared.py", line 475, in _deliver_network_status
    handle = mailbox._deliver_record(record, interface=self)
  File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/lib/mailbox.py", line 455, in _deliver_record
    return self._deliver_record(record)
  File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/interface/interface_shared.py", line 428, in _deliver_record
    interface._publish(record)
  File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/interface/interface_sock.py", line 51, in _publish
    handle = mailbox._deliver_record(record, interface=self)
  File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/lib/mailbox.py", line 455, in _deliver_record
    self._sock_client.send_record_publish(record)
  File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 221, in send_record_publish
    interface._publish(record)
  File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/interface/interface_sock.py", line 51, in _publish
    self.send_server_request(server_req)
  File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 155, in send_server_request
    self._sock_client.send_record_publish(record)
  File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 221, in send_record_publish
    self._send_message(msg)
  File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 152, in _send_message
    self.send_server_request(server_req)
  File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 155, in send_server_request
    self._sendall_with_error_handle(header + data)
  File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 130, in _sendall_with_error_handle
    self._send_message(msg)
  File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 152, in _send_message
    sent = self._sock.send(data)
BrokenPipeError: [Errno 32] Broken pipe
    self._sendall_with_error_handle(header + data)
  File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 130, in _sendall_with_error_handle
    sent = self._sock.send(data)
BrokenPipeError: [Errno 32] Broken pipe



Additional Files

No response

Environment

  • WandB version: 0.15.12
  • OS: Ubuntu 20.04
  • Python version: Python 3.8.10
  • Versions of relevant libraries: pytorch-lightning==2.0.9

Additional Context

No response

About this issue

  • Original URL
  • State: open
  • Created 9 months ago
  • Reactions: 1
  • Comments: 18

Most upvoted comments

Hi @celsofranssa,

I’ll be happy to assist you with this inquiry. We received this and we will investigate it and get back to you for updates.

Regards, Carlo Argel

Any updates @Carlo-Argel? This issue is killing joy of wandb, and it is just bizarre it takes so long to fix it.

similar issue at the end of the process but it does not affect other things

I randomly get this error every now and then during training too, I assume it is related to networking issues. It would be great if any internal W&B issues wouldn’t result in the run crashing.

{'loss': 55842.3375, 'learning_rate': 0.00019748020497041964, 'epoch': 1.32}
{'loss': 55757.2188, 'learning_rate': 0.0001974556426587668, 'epoch': 1.33}
  9%|██████████▋                                                                                                             | 2587/29100 [1:22:29<526:27:27, 71.48s/it]
Exception in thread NetStatThr:
Traceback (most recent call last):
  File "/home/user/mambaforge/envs/slt/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "/home/user/mambaforge/envs/slt/lib/python3.11/threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "/home/user/mambaforge/envs/slt/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 268, in check_network_status
    self._loop_check_status(
  File "/home/user/mambaforge/envs/slt/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 224, in _loop_check_status
    local_handle = request()
                   ^^^^^^^^^
  File "/home/user/mambaforge/envs/slt/lib/python3.11/site-packages/wandb/sdk/interface/interface.py", line 792, in deliver_network_status
    return self._deliver_network_status(status)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/mambaforge/envs/slt/lib/python3.11/site-packages/wandb/sdk/interface/interface_shared.py", line 500, in _deliver_network_status
    return self._deliver_record(record)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/mambaforge/envs/slt/lib/python3.11/site-packages/wandb/sdk/interface/interface_shared.py", line 449, in _deliver_record
    handle = mailbox._deliver_record(record, interface=self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/mambaforge/envs/slt/lib/python3.11/site-packages/wandb/sdk/lib/mailbox.py", line 455, in _deliver_record
    interface._publish(record)
  File "/home/user/mambaforge/envs/slt/lib/python3.11/site-packages/wandb/sdk/interface/interface_sock.py", line 51, in _publish
    self._sock_client.send_record_publish(record)
  File "/home/user/mambaforge/envs/slt/lib/python3.11/site-packages/wandb/sdk/lib/sock_client.py", line 221, in send_record_publish
    self.send_server_request(server_req)
  File "/home/user/mambaforge/envs/slt/lib/python3.11/site-packages/wandb/sdk/lib/sock_client.py", line 155, in send_server_request
    self._send_message(msg)
  File "/home/user/mambaforge/envs/slt/lib/python3.11/site-packages/wandb/sdk/lib/sock_client.py", line 152, in _send_message
    self._sendall_with_error_handle(header + data)
  File "/home/user/mambaforge/envs/slt/lib/python3.11/site-packages/wandb/sdk/lib/sock_client.py", line 130, in _sendall_with_error_handle
    sent = self._sock.send(data)
           ^^^^^^^^^^^^^^^^^^^^^
BrokenPipeError: [Errno 32] Broken pipe
Killed```

because of this error my progress of nearly 3days stopped in between. Now i have to start again. Is there any alternative for this or handler for this or should i just store the progress locally

i meet the same question, wait for fixing it.