wandb: Read timed out when total combination of tunable parameters exceed about 15 million
wandb --version && python --version && uname
wandb, version 0.8.21 Python 3.6.9 Linux
What I Did
wandb sweep sweep.yaml
method: grid
metric:
name: val_acc
goal: minimize
parameters:
setting:
distribution: categorical
values:
- stack_ffn
- act_pkm
- stack_encdec_ffn
q_linear:
distribution: categorical
values:
- true
- false
k_linear:
distribution: categorical
values:
- true
- false
v_linear:
distribution: categorical
values:
- true
- false
o_linear:
distribution: categorical
values:
- true
- false
q_norm:
distribution: categorical
values:
- true
- false
k_norm:
distribution: categorical
values:
- true
- false
v_norm:
distribution: categorical
values:
- true
- false
inner_norm:
distribution: categorical
values:
- true
- false
norm_way:
distribution: categorical
values:
- C
- CL
q_activ:
distribution: categorical
values:
- no
- softmax
- sparsemax
k_activ:
distribution: categorical
values:
- no
- softmax
- sparsemax
v_activ:
distribution: categorical
values:
- no
- softmax
- sparsemax
inner_activ:
distribution: categorical
values:
- no
- softmax
- sparsemax
proj_share:
distribution: categorical
values:
- qk
- qv
- kv
- qkv
- no
proj_way:
distribution: categorical
values:
- ->head
- head->
- head->_share
relative:
distribution: categorical
values:
- true
- false
q_downscale:
distribution: categorical
values:
- true
- false
k_downscale:
distribution: categorical
values:
- true
- false
v_downscale:
distribution: categorical
values:
- true
- false
inner_downscale:
distribution: categorical
values:
- true
- false
inner_mul:
distribution: categorical
values:
- QK
- KV
and get timeout
Network error (ReadTimeout), entering retry loop. See /home/shulie8518/Workspace/Review_Attention/wandb/debug.log for full traceback.
debug.log
2020-01-19 16:01:03,860 DEBUG MainThread:31362 [connectionpool.py:_new_conn():824] Starting new HTTPS connection (1): api.wandb.ai
2020-01-19 16:01:15,154 DEBUG MainThread:31362 [connectionpool.py:_new_conn():824] Starting new HTTPS connection (1): api.wandb.ai
2020-01-19 16:01:27,885 DEBUG MainThread:31362 [connectionpool.py:_new_conn():824] Starting new HTTPS connection (1): api.wandb.ai
2020-01-19 16:01:38,038 ERROR MainThread:31362 [retry.py:__call__():108] Retry attempt failed:
Traceback (most recent call last):
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
six.raise_from(e, None)
File "<string>", line 2, in raise_from
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 383, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib/python3.7/http/client.py", line 1344, in getresponse
response.begin()
File "/usr/lib/python3.7/http/client.py", line 306, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.7/http/client.py", line 267, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/lib/python3.7/socket.py", line 589, in readinto
return self._sock.recv_into(b)
File "/usr/lib/python3.7/ssl.py", line 1071, in recv_into
return self.read(nbytes, buffer)
File "/usr/lib/python3.7/ssl.py", line 929, in read
return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/requests/adapters.py", line 440, in send
timeout=timeout
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 639, in urlopen
_stacktrace=sys.exc_info()[2])
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/util/retry.py", line 357, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/packages/six.py", line 686, in reraise
raise value
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 389, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 309, in _raise_timeout
raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='api.wandb.ai', port=443): Read timed out. (read timeout=10)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/wandb/retry.py", line 95, in __call__
result = self._call_fn(*args, **kwargs)
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/wandb/apis/internal.py", line 110, in execute
return self.client.execute(*args, **kwargs)
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/gql/client.py", line 52, in execute
result = self._get_result(document, *args, **kwargs)
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/gql/client.py", line 60, in _get_result
return self.transport.execute(document, *args, **kwargs)
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/gql/transport/requests.py", line 38, in execute
request = requests.post(self.url, **post_args)
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/requests/api.py", line 112, in post
return request('post', url, data=data, json=json, **kwargs)
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/requests/sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/requests/sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/requests/adapters.py", line 521, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='api.wandb.ai', port=443): Read timed out. (read timeout=10)
2020-01-19 16:01:42,484 DEBUG MainThread:31362 [connectionpool.py:_new_conn():824] Starting new HTTPS connection (1): api.wandb.ai
2020-01-19 16:02:00,755 DEBUG MainThread:31362 [connectionpool.py:_new_conn():824] Starting new HTTPS connection (1): api.wandb.ai
2020-01-19 16:02:27,699 DEBUG MainThread:31362 [connectionpool.py:_new_conn():824] Starting new HTTPS connection (1): api.wandb.ai
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 34 (11 by maintainers)
The issue still persists on my side
Thanks for the report. We will look into this and figure out ifwe can handle this size of combinations or if we have to set some limits.