wandb: Read timed out when total combination of tunable parameters exceed about 15 million

wandb --version && python --version && uname

wandb, version 0.8.21 Python 3.6.9 Linux

What I Did

wandb sweep sweep.yaml

method: grid
metric:
  name: val_acc
  goal: minimize
parameters:
  setting:
    distribution: categorical
    values:
      - stack_ffn
      - act_pkm
      - stack_encdec_ffn
  q_linear:
    distribution: categorical
    values:
      - true
      - false
  k_linear:
    distribution: categorical
    values:
      - true
      - false
  v_linear:
    distribution: categorical
    values:
      - true
      - false
  o_linear:
    distribution: categorical
    values:
      - true
      - false
  q_norm:
    distribution: categorical
    values:
      - true
      - false
  k_norm:
    distribution: categorical
    values:
      - true
      - false
  v_norm:
    distribution: categorical
    values:
      - true
      - false
  inner_norm:
    distribution: categorical
    values:
      - true
      - false
  norm_way:
    distribution: categorical
    values:
      - C
      - CL
  q_activ:
    distribution: categorical
    values:
      - no
      - softmax
      - sparsemax
  k_activ:
    distribution: categorical
    values:
      - no
      - softmax
      - sparsemax
  v_activ:
    distribution: categorical
    values:
      - no
      - softmax
      - sparsemax
  inner_activ:
    distribution: categorical
    values:
      - no
      - softmax
      - sparsemax
  proj_share:
    distribution: categorical
    values:
      - qk
      - qv
      - kv
      - qkv
      - no
  proj_way:
    distribution: categorical
    values:
      - ->head
      - head->
      - head->_share
  relative:
    distribution: categorical
    values:
      - true
      - false
  q_downscale:
    distribution: categorical
    values:
      - true
      - false
  k_downscale:
    distribution: categorical
    values:
      - true
      - false
  v_downscale:
    distribution: categorical
    values:
      - true
      - false
  inner_downscale:
    distribution: categorical
    values:
      - true
      - false
  inner_mul:
    distribution: categorical
    values:
      - QK
      - KV

and get timeout

Network error (ReadTimeout), entering retry loop. See /home/shulie8518/Workspace/Review_Attention/wandb/debug.log for full traceback.

debug.log

2020-01-19 16:01:03,860 DEBUG   MainThread:31362 [connectionpool.py:_new_conn():824] Starting new HTTPS connection (1): api.wandb.ai
2020-01-19 16:01:15,154 DEBUG   MainThread:31362 [connectionpool.py:_new_conn():824] Starting new HTTPS connection (1): api.wandb.ai
2020-01-19 16:01:27,885 DEBUG   MainThread:31362 [connectionpool.py:_new_conn():824] Starting new HTTPS connection (1): api.wandb.ai
2020-01-19 16:01:38,038 ERROR   MainThread:31362 [retry.py:__call__():108] Retry attempt failed:
Traceback (most recent call last):
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 383, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.7/http/client.py", line 1344, in getresponse
    response.begin()
  File "/usr/lib/python3.7/http/client.py", line 306, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.7/http/client.py", line 267, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.7/ssl.py", line 1071, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.7/ssl.py", line 929, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/requests/adapters.py", line 440, in send
    timeout=timeout
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 639, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/util/retry.py", line 357, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/packages/six.py", line 686, in reraise
    raise value
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 389, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 309, in _raise_timeout
    raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='api.wandb.ai', port=443): Read timed out. (read timeout=10)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/wandb/retry.py", line 95, in __call__
    result = self._call_fn(*args, **kwargs)
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/wandb/apis/internal.py", line 110, in execute
    return self.client.execute(*args, **kwargs)
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/gql/client.py", line 52, in execute
    result = self._get_result(document, *args, **kwargs)
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/gql/client.py", line 60, in _get_result
    return self.transport.execute(document, *args, **kwargs)
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/gql/transport/requests.py", line 38, in execute
    request = requests.post(self.url, **post_args)
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/requests/api.py", line 112, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/requests/adapters.py", line 521, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='api.wandb.ai', port=443): Read timed out. (read timeout=10)
2020-01-19 16:01:42,484 DEBUG   MainThread:31362 [connectionpool.py:_new_conn():824] Starting new HTTPS connection (1): api.wandb.ai
2020-01-19 16:02:00,755 DEBUG   MainThread:31362 [connectionpool.py:_new_conn():824] Starting new HTTPS connection (1): api.wandb.ai
2020-01-19 16:02:27,699 DEBUG   MainThread:31362 [connectionpool.py:_new_conn():824] Starting new HTTPS connection (1): api.wandb.ai

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Comments: 34 (11 by maintainers)

Most upvoted comments

The issue still persists on my side

Thanks for the report. We will look into this and figure out ifwe can handle this size of combinations or if we have to set some limits.