urllib3: After upgrade to 1.26.16 connection pool blocks at max size during CI tests
Subject
This is a hard issue to reproduce and I am not sure if it’s an issue on urllib3 side or a change in urllib3 uncovering something we at https://github.com/rotki/rotki do wrong in our test suite.
We upgraded from 1.26.14 to 1.26.16 recently and since then our CI job running the API tests frequently got stuck and was killed by the CI timeout. Managed to even reproduce this locally not by running specific tests, but by running the API tests part of the suite and having it get stuck some times and never in the same place.
After lots of dumb experimentation:
- Downgrading to 1.26.14 and see the failure not happen there after 7 runs
- Upgrading to 1.26.15 and see the failure not happen there after 7 runs
- Upgrading to 1.26.16 and see the failure happen at the 2nd run.
I suspect some change in 1.26.16 causes this problem for us.
Trying to run the CI suite locally sometimes you can see it gets stuck with 1.26.16. And when you drop in the debugger you see something like this:
Trace 1
# not a valid xpub string
xpub = 'foo'
json_data = {'xpub': xpub}
response = requests.put(api_url_for(
rotkehlchen_api_server,
'btcxpubresource',
blockchain='BTC',
), json=json_data)
assert_error_response(
response=response,
contained_in_msg='"xpub": ["Failed to initialize an xpub due to Given XPUB foo is too small"', # noqa: E501
status_code=HTTPStatus.BAD_REQUEST,
)
# tags empty list
xpub = 'xpub68V4ZQQ62mea7ZUKn2urQu47Bdn2Wr7SxrBxBDDwE3kjytj361YBGSKDT4WoBrE5htrSB8eAMe59NPnKrcAbiv2veN5GQUmfdjRddD1Hxrk' # noqa: E501
json_data = {
'xpub': xpub,
'tags': [],
}
> response = requests.put(api_url_for(
rotkehlchen_api_server,
'btcxpubresource',
blockchain='BTC',
), json=json_data)
rotkehlchen/tests/api/test_bitcoin.py:615:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/requests/api.py:130: in put
return request("put", url, data=data, **kwargs)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/requests/api.py:59: in request
return session.request(method=method, url=url, **kwargs)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/requests/sessions.py:589: in request
resp = self.send(prep, **send_kwargs)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/requests/sessions.py:703: in send
r = adapter.send(request, **kwargs)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/requests/adapters.py:486: in send
resp = conn.urlopen(
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/urllib3/connectionpool.py:703: in urlopen
conn = self._get_conn(timeout=pool_timeout)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/urllib3/connectionpool.py:275: in _get_conn
conn = self.pool.get(block=self.block, timeout=timeout)
/usr/lib/python3.9/queue.py:182: in get
self.not_full.notify()
/usr/lib/python3.9/threading.py:361: in notify
if not self._is_owned():
/usr/lib/python3.9/threading.py:274: in _is_owned
if self._lock.acquire(False):
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/gevent/thread.py:141: in acquire
sleep()
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/gevent/hub.py:159: in sleep
waiter.get()
src/gevent/_waiter.py:143: in gevent._gevent_c_waiter.Waiter.get
???
src/gevent/_waiter.py:154: in gevent._gevent_c_waiter.Waiter.get
???
src/gevent/_greenlet_primitives.py:61: in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
???
src/gevent/_greenlet_primitives.py:61: in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
???
src/gevent/_greenlet_primitives.py:65: in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E KeyboardInterrupt
src/gevent/_gevent_c_greenlet_primitives.pxd:35: KeyboardInterrupt
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB post_mortem >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[27] > /home/lefteris/w/rotkehlchen/src/gevent/_gevent_c_greenlet_primitives.pxd(35)gevent._gevent_c_greenlet_primitives._greenlet_switch()
6 frames hidden (try 'help hidden_frames')
And the reason seems to be that the urllib connection pool has no more connections left (?)
<urllib3.util.queue.LifoQueue object at 0x7f61482b3b20>
(Pdb++) pp self.pool.size
*** AttributeError: 'LifoQueue' object has no attribute 'size'
(Pdb++) pp self.pool.__dict__
{'all_tasks_done': <Condition(<LockType at 0x7f61482ac580 counter=1 _links[0]>, 0)>,
'maxsize': 10,
'mutex': <gevent.thread.LockType object at 0x7f61482ac580>,
'not_empty': <Condition(<LockType at 0x7f61482ac580 counter=1 _links[0]>, 0)>,
'not_full': <Condition(<LockType at 0x7f61482ac580 counter=1 _links[0]>, 0)>,
'queue': deque([None, None, None, None, None, None, None, None, None, None]),
'unfinished_tasks': 11}
(Pdb++)
Trace 2
And another trace very similar where tests get stuck below:
> response = requests.get(api_url_for(
rotkehlchen_api_server_with_exchanges,
'blockchainbalancesresource',
), json={'async_query': True, 'blockchain': 'BTC'})
rotkehlchen/tests/api/test_balances.py:609:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3.9/unittest/mock.py:1092: in __call__
return self._mock_call(*args, **kwargs)
/usr/lib/python3.9/unittest/mock.py:1096: in _mock_call
return self._execute_mock_call(*args, **kwargs)
/usr/lib/python3.9/unittest/mock.py:1166: in _execute_mock_call
return self._mock_wraps(*args, **kwargs)
rotkehlchen/tests/utils/blockchain.py:535: in mock_requests_get
return original_requests_get(url, *args, **kwargs)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/requests/api.py:73: in get
return request("get", url, params=params, **kwargs)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/requests/api.py:59: in request
return session.request(method=method, url=url, **kwargs)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/requests/sessions.py:589: in request
resp = self.send(prep, **send_kwargs)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/requests/sessions.py:703: in send
r = adapter.send(request, **kwargs)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/requests/adapters.py:454: in send
conn = self.get_connection(request.url, proxies)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/requests/adapters.py:356: in get_connection
conn = self.poolmanager.connection_from_url(url)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/urllib3/poolmanager.py:299: in connection_from_url
return self.connection_from_host(
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/urllib3/poolmanager.py:246: in connection_from_host
return self.connection_from_context(request_context)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/urllib3/poolmanager.py:261: in connection_from_context
return self.connection_from_pool_key(pool_key, request_context=request_context)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/urllib3/poolmanager.py:282: in connection_from_pool_key
pool = self._new_pool(scheme, host, port, request_context=request_context)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/urllib3/poolmanager.py:214: in _new_pool
return pool_cls(host, port, **request_context)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/urllib3/connectionpool.py:214: in __init__
self.pool.put(None)
/usr/lib/python3.9/queue.py:152: in put
self.not_empty.notify()
/usr/lib/python3.9/threading.py:361: in notify
if not self._is_owned():
/usr/lib/python3.9/threading.py:274: in _is_owned
if self._lock.acquire(False):
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/gevent/thread.py:141: in acquire
sleep()
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/gevent/hub.py:159: in sleep
waiter.get()
src/gevent/_waiter.py:143: in gevent._gevent_c_waiter.Waiter.get
???
src/gevent/_waiter.py:154: in gevent._gevent_c_waiter.Waiter.get
???
src/gevent/_greenlet_primitives.py:61: in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
???
src/gevent/_greenlet_primitives.py:61: in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
???
src/gevent/_greenlet_primitives.py:65: in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E KeyboardInterrupt
src/gevent/_gevent_c_greenlet_primitives.pxd:35: KeyboardInterrupt
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB post_mortem >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[36] > /home/lefteris/w/rotkehlchen/src/gevent/_gevent_c_greenlet_primitives.pxd(35)gevent._gevent_c_greenlet_primitives._greenlet_switch()
with this time being stuck at put() but again with the pool at maxsize
(Pdb++) pp self.pool
<urllib3.util.queue.LifoQueue object at 0x7f795ce54c70>
(Pdb++) pp self.pool.__dict__
{'all_tasks_done': <Condition(<LockType at 0x7f795ce5a350 counter=1 _links[0]>, 0)>,
'maxsize': 10,
'mutex': <gevent.thread.LockType object at 0x7f795ce5a350>,
'not_empty': <Condition(<LockType at 0x7f795ce5a350 counter=1 _links[0]>, 0)>,
'not_full': <Condition(<LockType at 0x7f795ce5a350 counter=1 _links[0]>, 0)>,
'queue': deque([None]),
'unfinished_tasks': 1}
(Pdb++)
Environment
Describe your environment. At least, paste here the output of:
>>> print("OS", platform.platform())
OS Linux-6.2.13-arch1-1-x86_64-with-glibc2.37
>>> print("Python", platform.python_version())
Python 3.9.10
>>> print(ssl.OPENSSL_VERSION)
OpenSSL 1.1.1t 7 Feb 2023
>>> print("urllib3", urllib3.__version__)
urllib3 1.26.16
>>>
gevent==22.10.2
Steps to Reproduce
The difficult thing as I explained is that it’s hard to reproduce the error. But if you run this branch of our code (which is basically just our develop branch) along with urllib3 1.26.16 and run the API tests locally with the following command:
CI=true CASSETTES_DIR=/home/lefteris/.local/share/rotki/test-caching python pytestgeventwrapper.py -xs --pdb --durations=150 rotkehlchen/tests/api
Then at some point they get stuck.
Expected Behavior
Test should run until the end, or at least fail if something happens.
Actual Behavior
They fail with what I explained above.
Important note
Looking at the diff between 1.26.15 and 1.26.16 is:
git log --pretty=oneline 1.26.15..1.26.16 main
d94029b7e2193ff47b627906a70e06377a09aae8 (tag: 1.26.16) Release 1.26.16
18e92145e9cddbabdf51c98f54202aa37fd5d4c8 Use trusted publishing for PyPI
d25cf83bbae850a290fe34ed1610ae55c0558b36 [1.26] Fix invalid test_ssl_failure_midway_through_conn
25cca389496b86ee809c21e5b641aeaa74809263 [1.26] Fix test_ssl_object_attributes
bbea166e946ff5bf1411c1827e1d92566be59cb1 [1.26] Fix pypy2.7 CI
710114d7810558fd7e224054a566b53bb8601494 [1.26] Don't close ConnectionPools from PoolManager until unreferenced
6b77c3975719e403a1d28ddf0d3ff8377814ab14 [1.26] pin virtualenv to <20.22.0 for python-version==2.7
cdb8a7d6902418d214e67b3a8b0d854e460e2c49 [1.26] Fix compatible Python versions in v2-roadmap.rst (#2979)
0dfbb254cc4f9a8b1d0912f6b08e8ef03f2ca4f0 [1.26] Bump Ubuntu version in integration tests
f50c0d03c2b8cc20b0be7fa2fb7a01e15b7b0313 [1.26] Fix isort pre-commit installation
f54eaf2ae8d38d1ab29b96e2daa06f40fd392c67 [1.26] Use the default attestation name
The most relevant commit to what we are seeing is this: https://github.com/urllib3/urllib3/commit/710114d7810558fd7e224054a566b53bb8601494
In fact this seems really relevant as to why the connection pool would be out of connections.
Running a locally modified version of urrlib with
# Do not pass 'self' as callback to 'finalize'.
# Then the 'finalize' would keep an endless living (leak) to self.
# By just passing a reference to the pool allows the garbage collector
# to free self if nobody else has a reference to it.
pool = self.pool
# Close all the HTTPConnections in the pool before the
# HTTPConnectionPool object is garbage collected.
weakref_finalize(self, _close_pool_connections, pool)
removed then the tests are no longer getting stuck.
Now I am not familiar enough with urllib3 code to understand why this may be but I would like to figure this out. Either if it’s some weird bug in urllib3 or if it has uncovered something we do wrong ourselves. Any help is appreciated
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 5
- Comments: 20 (4 by maintainers)
This is a known, documented issue resulting from
multiprocessingbeing implemented in C.Also facing the issue with hanging
pool.get().python==3.11.5urllib3==1.26.18gevent==23.9.1Not complete traceback:
@LefterisJP I also saw that you are involved in discussion with @jamadden. Can you please share what is the progress on this one? Is there anyway we can help to investigate and fix the issue?