wandb: [CLI] Network error (ReadTimeout), entering retry loop.
Description
My program just hangs after wandb cannot connect and log data. The error is Network error (ReadTimeout), entering retry loop. See wandb/debug-internal.log for full traceback.
Wandb features wandb.log()
Environment
- OS: Ubuntu 18.04
- Environment: terminal on local machine
- Python Version: 3.7.10
- WandB: 0.10.24
Here is part of the debug-internal.log:
2021-04-06 04:11:40,754 DEBUG SenderThread:14386 [sender.py:send():160] send: stats
2021-04-06 04:11:49,658 DEBUG HandlerThread:14386 [handler.py:handle_request():120] handle_request: status
2021-04-06 04:11:49,659 DEBUG SenderThread:14386 [sender.py:send():160] send: request
2021-04-06 04:11:49,659 DEBUG SenderThread:14386 [sender.py:send_request():169] send_request: status
2021-04-06 04:12:10,483 WARNING Thread-5 :14386 [util.py:request_with_retry():768] requests_with_retry encountered retryable exception: 502 Server Error: Bad Gateway for url: https://api.wandb.ai/files/prashant-pand3y/unet-us-datasets/ubc-none-Dice-linear-8-weights-20210405-210012/file_stream. func: <bound method Session.post of <requests.sessions.Session object at 0x7fe833fc7bd0>>, response: b'\n<html><head>\n<meta http-equiv="content-type" content="text/html;charset=utf-8">\n<title>502 Server Error</title>\n</head>\n<body text=#000000 bgcolor=#ffffff>\n<h1>Error: Server Error</h1>\n<h2>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.</h2>\n<h2></h2>\n</body></html>\n', args: ('https://api.wandb.ai/files/prashant-pand3y/unet-us-datasets/ubc-none-Dice-linear-8-weights-20210405-210012/file_stream',), kwargs: {'json': {'complete': False, 'failed': False}}
2021-04-06 04:12:15,821 WARNING Thread-5 :14386 [util.py:request_with_retry():768] requests_with_retry encountered retryable exception: 500 Server Error: Internal Server Error for url: https://api.wandb.ai/files/prashant-pand3y/unet-us-datasets/ubc-none-Dice-linear-8-weights-20210405-210012/file_stream. func: <bound method Session.post of <requests.sessions.Session object at 0x7fe833fc7bd0>>, response: b'{"error":"Error 1040: Too many connections"}\n', args: ('https://api.wandb.ai/files/prashant-pand3y/unet-us-datasets/ubc-none-Dice-linear-8-weights-20210405-210012/file_stream',), kwargs: {'json': {'complete': False, 'failed': False}}
2021-04-06 04:12:23,298 ERROR SenderThread:14386 [retry.py:__call__():111] Retry attempt failed:
Traceback (most recent call last):
File "/home/prashant/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 384, in _make_request
six.raise_from(e, None)
File "<string>", line 2, in raise_from
File "/home/prashant/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 380, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib/python3.7/http/client.py", line 1369, in getresponse
response.begin()
File "/usr/lib/python3.7/http/client.py", line 310, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.7/http/client.py", line 271, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/lib/python3.7/socket.py", line 589, in readinto
return self._sock.recv_into(b)
File "/usr/lib/python3.7/ssl.py", line 1071, in recv_into
return self.read(nbytes, buffer)
File "/usr/lib/python3.7/ssl.py", line 929, in read
return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/prashant/py37/lib/python3.7/site-packages/requests/adapters.py", line 445, in send
timeout=timeout
File "/home/prashant/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/home/prashant/py37/lib/python3.7/site-packages/urllib3/util/retry.py", line 367, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/home/prashant/py37/lib/python3.7/site-packages/urllib3/packages/six.py", line 686, in reraise
raise value
File "/home/prashant/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/home/prashant/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 386, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/home/prashant/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 306, in _raise_timeout
raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='api.wandb.ai', port=443): Read timed out. (read timeout=10)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/prashant/py37/lib/python3.7/site-packages/wandb/old/retry.py", line 96, in __call__
result = self._call_fn(*args, **kwargs)
File "/home/prashant/py37/lib/python3.7/site-packages/wandb/sdk/internal/internal_api.py", line 123, in execute
return self.client.execute(*args, **kwargs)
File "/home/prashant/py37/lib/python3.7/site-packages/wandb/vendor/gql-0.2.0/gql/client.py", line 52, in execute
result = self._get_result(document, *args, **kwargs)
File "/home/prashant/py37/lib/python3.7/site-packages/wandb/vendor/gql-0.2.0/gql/client.py", line 60, in _get_result
return self.transport.execute(document, *args, **kwargs)
File "/home/prashant/py37/lib/python3.7/site-packages/wandb/vendor/gql-0.2.0/gql/transport/requests.py", line 38, in execute
request = requests.post(self.url, **post_args)
File "/home/prashant/py37/lib/python3.7/site-packages/requests/api.py", line 112, in post
return request('post', url, data=data, json=json, **kwargs)
File "/home/prashant/py37/lib/python3.7/site-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/home/prashant/py37/lib/python3.7/site-packages/requests/sessions.py", line 512, in request
resp = self.send(prep, **send_kwargs)
File "/home/prashant/py37/lib/python3.7/site-packages/requests/sessions.py", line 622, in send
r = adapter.send(request, **kwargs)
File "/home/prashant/py37/lib/python3.7/site-packages/requests/adapters.py", line 526, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='api.wandb.ai', port=443): Read timed out. (read timeout=10)
2021-04-06 04:12:28,406 DEBUG SenderThread:14386 [sender.py:send():160] send: stats
2021-04-06 04:12:41,545 DEBUG SenderThread:14386 [sender.py:send():160] send: stats
2021-04-06 04:12:43,407 DEBUG HandlerThread:14386 [handler.py:handle_request():120] handle_request: status
About this issue
- Original URL
- State: open
- Created 3 years ago
- Comments: 32 (5 by maintainers)
I have same issue in latest wandb version
Hi, same issue here. After a few hours of training I get,
wandb: Network error (ReadTimeout), entering retry loop.Using wandb 0.12.14Hey @ramit-wandb ,
Just wanted to let you know that upgrading to
wandb 0.12.9solved my problem š Thank you again for your time and reply šAlso having this issue again.
OS: Linux-3.10.0-1160.36.2.el7.x86_64-x86_64-with-centos-7.9.2009-Core Python Version: 3.7.9 WandB: 0.10.3
Iām having the same issue as well with previously working code. The issue started a couple of hours ago. OS: Linux Python Version: 3.7.11 WandB: 0.12.6
The same problem starts hours ago. My codes work well previously. But with the same code, the network error occurs. OS: Linux Python: 3.7 wandb: 0.10.32
Working fine now. It seems that it was on their end.