wandb: wandb: Network error (TransientException), entering retry loop.

wandb, version 0.8.22 Python 3.6.8 Ubuntu 18

Description

W&B was working fine until one day I started getting these error messages. I’m using W&B normally as far as I can tell. I’m not using sweeps or anything fancy. I “init” and then “log”:

wandb: Tracking run with wandb version 0.8.22
wandb: Run data is saved locally in XXX/wandb/run-20200128_181440-yz2o7uiw
wandb: Syncing run A002
wandb: ⭐ View project at https://app.wandb.ai/XXX
wandb: 🚀 View run at https://app.wandb.ai/XXX
wandb: Run `wandb off` to turn off syncing.

wandb: Network error (TransientException), entering retry loop. See /home/XXX/wandb/debug.log for full traceback.
wandb: ERROR Error uploading "___batch_archive_1.tgz": CommError, None
[ batch loss: 0.000208 | batch RMSE: 3.7270] :  27%|███████████▋                               | 36/132 [00:28<01:17,  1.24it/sBus error (core dumped) 2.6732 | val-loss: 0.000312| val_rmse: 4.0230:  74%|█████████████▎    | 37/50 [1:07:44<23:51, 110.11s/it]
(sia-env) XXX:~/projects/orofacial$ wandb: Program ended successfully.        | 14/132 [00:10<01:34,  1.25it/s]
wandb: Run summary:
wandb:                       _step 73
wandb:                  _timestamp 1580239346.2581983
wandb:                    _runtime 4074.0023016929626
wandb:                        Loss 0.00013416090676909802
wandb:               learning rate 1.25e-06
wandb: Syncing 8 W&B file(s) and 0 media file(s)
 (%(failed_batches)d failed uploads)wandb: Network error (TransientException), entering retry loop. See /home/siarez/projects/orofacial/artifacts/train.py/2020-01-28-13-14-32_0/wandb/debug.log for full traceback.
 (%(failed_batches)d failed uploads)wandb: ERROR Error uploading "config.yaml": CommError, None
 (%(failed_batches)d failed uploads)wandb: ERROR Error uploading "wandb-summary.json": CommError, None
wandb: ERROR Error uploading "wandb-metadata.json": CommError, None
 (%(failed_batches)d failed uploads)wandb: ERROR Error uploading "output.log": CommError, None
 (%(failed_batches)d failed uploads)wandb:                                                                                
wandb: Synced A002: https://app.wandb.ai/siarez/orofacial/runs/yz2o7uiw

What I Did

Nothing

The debug.log was too large to fit in “pastebin”. But here are the first 140 lines: https://hastebin.com/ixonizoyal.sql

Here are the last 170 lines of debug.log: https://hastebin.com/alevotuduv.sql

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 16 (6 by maintainers)

Most upvoted comments

Hey @laurelrr the instance your running this on either can’t communicate with the internet, or is behind a proxy that making the SSL handshake fail. You can run the script in offline mode and later sync the data to a wandb server from a machine that has functioning internet. Just set the WANDB_MODE=offline environment variable or specify mode="offline" in your call to wandb.init

I’m using wandb localwith python=3.6.8 wandb=0.9.2. The log file is large and here is part of it:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/anaconda3/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/home/user/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 725, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/home/user/anaconda3/lib/python3.6/site-packages/urllib3/util/retry.py", line 403, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/home/user/anaconda3/lib/python3.6/site-packages/urllib3/packages/six.py", line 734, in reraise
    raise value.with_traceback(tb)
  File "/home/user/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 677, in urlopen
    chunked=chunked,
  File "/home/user/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 426, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/home/user/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 421, in _make_request
    httplib_response = conn.getresponse()
  File "/home/user/anaconda3/lib/python3.6/http/client.py", line 1331, in getresponse
    response.begin()
  File "/home/user/anaconda3/lib/python3.6/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "/home/user/anaconda3/lib/python3.6/http/client.py", line 266, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

wandb sync gives this error:

wandb: Network error (ConnectionError), entering retry loop. See /home/user/Documents/syoya/Competes/Alaska2/wandb/debug.log for full traceback.
wandb: Network error (ConnectionError), entering retry loop. See /home/user/Documents/syoya/Competes/Alaska2/wandb/debug.log for full traceback.

I’ve also tested on my local Mac and same problem occurs.