wandb: [CLI]: BrokenPipeError: [Errno 32] Broken pipe
Bub description
Training and logging run fine; however, at the end of the process, the wandb outputs the error message below.
wandb: Waiting for W&B process to finish... (success).
wandb: \ 0.014 MB of 0.014 MB uploaded (0.000 MB deduped)
wandb: Run history:
wandb: epoch ▁▁▁▁▁▁▁▁▃▃▃▃▃▃▃▃▅▅▅▅▅▅▅▅▆▆▆▆▆▆▆▆████████
wandb: train_loss █▄▃▂▂▃▄▃▁▂▂▄▄▂▃▂▁▂▁▁▁▁▁▁▃▁▁▁▁▂▁▁▁▁▁▅▁▁▁▃
wandb: trainer/global_step ▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
wandb: val_Mac-F1 ▁▄▆▅█
wandb: val_Mic-F1 ▁▄▇▇█
wandb: val_Wei-F1 ▁▅▇▇█
wandb: val_loss █▂▁▄▆
wandb:
wandb: Run summary:
wandb: epoch 4
wandb: train_loss 0.47728
wandb: trainer/global_step 5534
wandb: val_Mac-F1 0.70413
wandb: val_Mic-F1 0.88889
wandb: val_Wei-F1 0.93459
wandb: val_loss 0.46428
wandb:
wandb: 🚀 View run BERT_WEBKB_0_exp at: https://wandb.ai/celsofranca/lightning_logs/runs/1qq5guxx
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: /tmp/wandb/run-20231012_134638-1qq5guxx/logs
Exception in thread Exception in thread IntMsgThrNetStatThr:
:
Traceback (most recent call last):
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
self.run()
File "/usr/lib/python3.8/threading.py", line 870, in run
File "/usr/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 299, in check_internal_messages
self._target(*self._args, **self._kwargs)
File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 267, in check_network_status
self._loop_check_status(
File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 223, in _loop_check_status
self._loop_check_status(
File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 223, in _loop_check_status
local_handle = request()
File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 743, in deliver_internal_messages
return self._deliver_internal_messages(internal_message)
local_handle = request()
File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/interface/interface_shared.py", line 481, in _deliver_internal_messages
File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 735, in deliver_network_status
return self._deliver_record(record)
File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/interface/interface_shared.py", line 428, in _deliver_record
return self._deliver_network_status(status)
File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/interface/interface_shared.py", line 475, in _deliver_network_status
handle = mailbox._deliver_record(record, interface=self)
File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/lib/mailbox.py", line 455, in _deliver_record
return self._deliver_record(record)
File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/interface/interface_shared.py", line 428, in _deliver_record
interface._publish(record)
File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/interface/interface_sock.py", line 51, in _publish
handle = mailbox._deliver_record(record, interface=self)
File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/lib/mailbox.py", line 455, in _deliver_record
self._sock_client.send_record_publish(record)
File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 221, in send_record_publish
interface._publish(record)
File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/interface/interface_sock.py", line 51, in _publish
self.send_server_request(server_req)
File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 155, in send_server_request
self._sock_client.send_record_publish(record)
File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 221, in send_record_publish
self._send_message(msg)
File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 152, in _send_message
self.send_server_request(server_req)
File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 155, in send_server_request
self._sendall_with_error_handle(header + data)
File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 130, in _sendall_with_error_handle
self._send_message(msg)
File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 152, in _send_message
sent = self._sock.send(data)
BrokenPipeError: [Errno 32] Broken pipe
self._sendall_with_error_handle(header + data)
File "/home/celso/projects/venvs/LightningPrototype/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 130, in _sendall_with_error_handle
sent = self._sock.send(data)
BrokenPipeError: [Errno 32] Broken pipe
Additional Files
No response
Environment
- WandB version: 0.15.12
- OS: Ubuntu 20.04
- Python version: Python 3.8.10
- Versions of relevant libraries: pytorch-lightning==2.0.9
Additional Context
No response
About this issue
- Original URL
- State: open
- Created 9 months ago
- Reactions: 1
- Comments: 18
Hi @celsofranssa,
I’ll be happy to assist you with this inquiry. We received this and we will investigate it and get back to you for updates.
Regards, Carlo Argel
Any updates @Carlo-Argel? This issue is killing joy of wandb, and it is just bizarre it takes so long to fix it.
similar issue at the end of the process but it does not affect other things
I randomly get this error every now and then during training too, I assume it is related to networking issues. It would be great if any internal W&B issues wouldn’t result in the run crashing.
because of this error my progress of nearly 3days stopped in between. Now i have to start again. Is there any alternative for this or handler for this or should i just store the progress locally
i meet the same question, wait for fixing it.