wandb: wandb: Network error (TransientException), entering retry loop.
wandb, version 0.8.22 Python 3.6.8 Ubuntu 18
Description
W&B was working fine until one day I started getting these error messages. I’m using W&B normally as far as I can tell. I’m not using sweeps or anything fancy. I “init” and then “log”:
wandb: Tracking run with wandb version 0.8.22
wandb: Run data is saved locally in XXX/wandb/run-20200128_181440-yz2o7uiw
wandb: Syncing run A002
wandb: ⭐ View project at https://app.wandb.ai/XXX
wandb: 🚀 View run at https://app.wandb.ai/XXX
wandb: Run `wandb off` to turn off syncing.
wandb: Network error (TransientException), entering retry loop. See /home/XXX/wandb/debug.log for full traceback.
wandb: ERROR Error uploading "___batch_archive_1.tgz": CommError, None
[ batch loss: 0.000208 | batch RMSE: 3.7270] : 27%|███████████▋ | 36/132 [00:28<01:17, 1.24it/sBus error (core dumped) 2.6732 | val-loss: 0.000312| val_rmse: 4.0230: 74%|█████████████▎ | 37/50 [1:07:44<23:51, 110.11s/it]
(sia-env) XXX:~/projects/orofacial$ wandb: Program ended successfully. | 14/132 [00:10<01:34, 1.25it/s]
wandb: Run summary:
wandb: _step 73
wandb: _timestamp 1580239346.2581983
wandb: _runtime 4074.0023016929626
wandb: Loss 0.00013416090676909802
wandb: learning rate 1.25e-06
wandb: Syncing 8 W&B file(s) and 0 media file(s)
(%(failed_batches)d failed uploads)wandb: Network error (TransientException), entering retry loop. See /home/siarez/projects/orofacial/artifacts/train.py/2020-01-28-13-14-32_0/wandb/debug.log for full traceback.
(%(failed_batches)d failed uploads)wandb: ERROR Error uploading "config.yaml": CommError, None
(%(failed_batches)d failed uploads)wandb: ERROR Error uploading "wandb-summary.json": CommError, None
wandb: ERROR Error uploading "wandb-metadata.json": CommError, None
(%(failed_batches)d failed uploads)wandb: ERROR Error uploading "output.log": CommError, None
(%(failed_batches)d failed uploads)wandb:
wandb: Synced A002: https://app.wandb.ai/siarez/orofacial/runs/yz2o7uiw
What I Did
Nothing
The debug.log was too large to fit in “pastebin”. But here are the first 140 lines: https://hastebin.com/ixonizoyal.sql
Here are the last 170 lines of debug.log: https://hastebin.com/alevotuduv.sql
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 16 (6 by maintainers)
Hey @laurelrr the instance your running this on either can’t communicate with the internet, or is behind a proxy that making the SSL handshake fail. You can run the script in offline mode and later sync the data to a wandb server from a machine that has functioning internet. Just set the
WANDB_MODE=offlineenvironment variable or specifymode="offline"in your call towandb.initI’m using
wandb localwith python=3.6.8 wandb=0.9.2. The log file is large and here is part of it:wandb syncgives this error:I’ve also tested on my local Mac and same problem occurs.