urllib3: After upgrade to 1.26.16 connection pool blocks at max size during CI tests

Subject

This is a hard issue to reproduce and I am not sure if it’s an issue on urllib3 side or a change in urllib3 uncovering something we at https://github.com/rotki/rotki do wrong in our test suite.

We upgraded from 1.26.14 to 1.26.16 recently and since then our CI job running the API tests frequently got stuck and was killed by the CI timeout. Managed to even reproduce this locally not by running specific tests, but by running the API tests part of the suite and having it get stuck some times and never in the same place.

After lots of dumb experimentation:

I suspect some change in 1.26.16 causes this problem for us.

Trying to run the CI suite locally sometimes you can see it gets stuck with 1.26.16. And when you drop in the debugger you see something like this:

Trace 1

        # not a valid xpub string
        xpub = 'foo'
        json_data = {'xpub': xpub}
        response = requests.put(api_url_for(
            rotkehlchen_api_server,
            'btcxpubresource',
            blockchain='BTC',
        ), json=json_data)
        assert_error_response(
            response=response,
            contained_in_msg='"xpub": ["Failed to initialize an xpub due to Given XPUB foo is too small"',  # noqa: E501
            status_code=HTTPStatus.BAD_REQUEST,
        )
    
        # tags empty list
        xpub = 'xpub68V4ZQQ62mea7ZUKn2urQu47Bdn2Wr7SxrBxBDDwE3kjytj361YBGSKDT4WoBrE5htrSB8eAMe59NPnKrcAbiv2veN5GQUmfdjRddD1Hxrk'  # noqa: E501
        json_data = {
            'xpub': xpub,
            'tags': [],
        }
>       response = requests.put(api_url_for(
            rotkehlchen_api_server,
            'btcxpubresource',
            blockchain='BTC',
        ), json=json_data)

rotkehlchen/tests/api/test_bitcoin.py:615: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/requests/api.py:130: in put
    return request("put", url, data=data, **kwargs)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/requests/api.py:59: in request
    return session.request(method=method, url=url, **kwargs)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/requests/sessions.py:589: in request
    resp = self.send(prep, **send_kwargs)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/requests/sessions.py:703: in send
    r = adapter.send(request, **kwargs)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/requests/adapters.py:486: in send
    resp = conn.urlopen(
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/urllib3/connectionpool.py:703: in urlopen
    conn = self._get_conn(timeout=pool_timeout)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/urllib3/connectionpool.py:275: in _get_conn
    conn = self.pool.get(block=self.block, timeout=timeout)
/usr/lib/python3.9/queue.py:182: in get
    self.not_full.notify()
/usr/lib/python3.9/threading.py:361: in notify
    if not self._is_owned():
/usr/lib/python3.9/threading.py:274: in _is_owned
    if self._lock.acquire(False):
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/gevent/thread.py:141: in acquire
    sleep()
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/gevent/hub.py:159: in sleep
    waiter.get()
src/gevent/_waiter.py:143: in gevent._gevent_c_waiter.Waiter.get
    ???
src/gevent/_waiter.py:154: in gevent._gevent_c_waiter.Waiter.get
    ???
src/gevent/_greenlet_primitives.py:61: in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
    ???
src/gevent/_greenlet_primitives.py:61: in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
    ???
src/gevent/_greenlet_primitives.py:65: in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   KeyboardInterrupt

src/gevent/_gevent_c_greenlet_primitives.pxd:35: KeyboardInterrupt
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB post_mortem >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[27] > /home/lefteris/w/rotkehlchen/src/gevent/_gevent_c_greenlet_primitives.pxd(35)gevent._gevent_c_greenlet_primitives._greenlet_switch()
   6 frames hidden (try 'help hidden_frames')

And the reason seems to be that the urllib connection pool has no more connections left (?)

<urllib3.util.queue.LifoQueue object at 0x7f61482b3b20>
(Pdb++) pp self.pool.size
*** AttributeError: 'LifoQueue' object has no attribute 'size'
(Pdb++) pp self.pool.__dict__
{'all_tasks_done': <Condition(<LockType at 0x7f61482ac580 counter=1 _links[0]>, 0)>,
 'maxsize': 10,
 'mutex': <gevent.thread.LockType object at 0x7f61482ac580>,
 'not_empty': <Condition(<LockType at 0x7f61482ac580 counter=1 _links[0]>, 0)>,
 'not_full': <Condition(<LockType at 0x7f61482ac580 counter=1 _links[0]>, 0)>,
 'queue': deque([None, None, None, None, None, None, None, None, None, None]),
 'unfinished_tasks': 11}
(Pdb++) 

Trace 2

And another trace very similar where tests get stuck below:

>               response = requests.get(api_url_for(
                    rotkehlchen_api_server_with_exchanges,
                    'blockchainbalancesresource',
                ), json={'async_query': True, 'blockchain': 'BTC'})

rotkehlchen/tests/api/test_balances.py:609: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3.9/unittest/mock.py:1092: in __call__
    return self._mock_call(*args, **kwargs)
/usr/lib/python3.9/unittest/mock.py:1096: in _mock_call
    return self._execute_mock_call(*args, **kwargs)
/usr/lib/python3.9/unittest/mock.py:1166: in _execute_mock_call
    return self._mock_wraps(*args, **kwargs)
rotkehlchen/tests/utils/blockchain.py:535: in mock_requests_get
    return original_requests_get(url, *args, **kwargs)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/requests/api.py:73: in get
    return request("get", url, params=params, **kwargs)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/requests/api.py:59: in request
    return session.request(method=method, url=url, **kwargs)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/requests/sessions.py:589: in request
    resp = self.send(prep, **send_kwargs)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/requests/sessions.py:703: in send
    r = adapter.send(request, **kwargs)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/requests/adapters.py:454: in send
    conn = self.get_connection(request.url, proxies)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/requests/adapters.py:356: in get_connection
    conn = self.poolmanager.connection_from_url(url)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/urllib3/poolmanager.py:299: in connection_from_url
    return self.connection_from_host(
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/urllib3/poolmanager.py:246: in connection_from_host
    return self.connection_from_context(request_context)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/urllib3/poolmanager.py:261: in connection_from_context
    return self.connection_from_pool_key(pool_key, request_context=request_context)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/urllib3/poolmanager.py:282: in connection_from_pool_key
    pool = self._new_pool(scheme, host, port, request_context=request_context)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/urllib3/poolmanager.py:214: in _new_pool
    return pool_cls(host, port, **request_context)
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/urllib3/connectionpool.py:214: in __init__
    self.pool.put(None)
/usr/lib/python3.9/queue.py:152: in put
    self.not_empty.notify()
/usr/lib/python3.9/threading.py:361: in notify
    if not self._is_owned():
/usr/lib/python3.9/threading.py:274: in _is_owned
    if self._lock.acquire(False):
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/gevent/thread.py:141: in acquire
    sleep()
../../.virtualenvs/rotkipy39/lib/python3.9/site-packages/gevent/hub.py:159: in sleep
    waiter.get()
src/gevent/_waiter.py:143: in gevent._gevent_c_waiter.Waiter.get
    ???
src/gevent/_waiter.py:154: in gevent._gevent_c_waiter.Waiter.get
    ???
src/gevent/_greenlet_primitives.py:61: in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
    ???
src/gevent/_greenlet_primitives.py:61: in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
    ???
src/gevent/_greenlet_primitives.py:65: in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   KeyboardInterrupt

src/gevent/_gevent_c_greenlet_primitives.pxd:35: KeyboardInterrupt
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB post_mortem >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[36] > /home/lefteris/w/rotkehlchen/src/gevent/_gevent_c_greenlet_primitives.pxd(35)gevent._gevent_c_greenlet_primitives._greenlet_switch()

with this time being stuck at put() but again with the pool at maxsize

(Pdb++) pp self.pool
<urllib3.util.queue.LifoQueue object at 0x7f795ce54c70>
(Pdb++) pp self.pool.__dict__
{'all_tasks_done': <Condition(<LockType at 0x7f795ce5a350 counter=1 _links[0]>, 0)>,
 'maxsize': 10,
 'mutex': <gevent.thread.LockType object at 0x7f795ce5a350>,
 'not_empty': <Condition(<LockType at 0x7f795ce5a350 counter=1 _links[0]>, 0)>,
 'not_full': <Condition(<LockType at 0x7f795ce5a350 counter=1 _links[0]>, 0)>,
 'queue': deque([None]),
 'unfinished_tasks': 1}
(Pdb++) 

Environment

Describe your environment. At least, paste here the output of:

>>> print("OS", platform.platform())
OS Linux-6.2.13-arch1-1-x86_64-with-glibc2.37
>>> print("Python", platform.python_version())
Python 3.9.10
>>> print(ssl.OPENSSL_VERSION)
OpenSSL 1.1.1t  7 Feb 2023
>>> print("urllib3", urllib3.__version__)
urllib3 1.26.16
>>> 
gevent==22.10.2

Steps to Reproduce

The difficult thing as I explained is that it’s hard to reproduce the error. But if you run this branch of our code (which is basically just our develop branch) along with urllib3 1.26.16 and run the API tests locally with the following command:

CI=true CASSETTES_DIR=/home/lefteris/.local/share/rotki/test-caching python pytestgeventwrapper.py -xs --pdb  --durations=150 rotkehlchen/tests/api 

Then at some point they get stuck.

Expected Behavior

Test should run until the end, or at least fail if something happens.

Actual Behavior

They fail with what I explained above.

Important note

Looking at the diff between 1.26.15 and 1.26.16 is:

git log --pretty=oneline 1.26.15..1.26.16                                                                                                                                                                main 
d94029b7e2193ff47b627906a70e06377a09aae8 (tag: 1.26.16) Release 1.26.16
18e92145e9cddbabdf51c98f54202aa37fd5d4c8 Use trusted publishing for PyPI
d25cf83bbae850a290fe34ed1610ae55c0558b36 [1.26] Fix invalid test_ssl_failure_midway_through_conn
25cca389496b86ee809c21e5b641aeaa74809263 [1.26] Fix test_ssl_object_attributes
bbea166e946ff5bf1411c1827e1d92566be59cb1 [1.26] Fix pypy2.7 CI
710114d7810558fd7e224054a566b53bb8601494 [1.26] Don't close ConnectionPools from PoolManager until unreferenced
6b77c3975719e403a1d28ddf0d3ff8377814ab14 [1.26] pin virtualenv to <20.22.0 for python-version==2.7
cdb8a7d6902418d214e67b3a8b0d854e460e2c49 [1.26] Fix compatible Python versions in v2-roadmap.rst (#2979)
0dfbb254cc4f9a8b1d0912f6b08e8ef03f2ca4f0 [1.26] Bump Ubuntu version in integration tests
f50c0d03c2b8cc20b0be7fa2fb7a01e15b7b0313 [1.26] Fix isort pre-commit installation
f54eaf2ae8d38d1ab29b96e2daa06f40fd392c67 [1.26] Use the default attestation name

The most relevant commit to what we are seeing is this: https://github.com/urllib3/urllib3/commit/710114d7810558fd7e224054a566b53bb8601494

In fact this seems really relevant as to why the connection pool would be out of connections.

Running a locally modified version of urrlib with

        # Do not pass 'self' as callback to 'finalize'.
        # Then the 'finalize' would keep an endless living (leak) to self.
        # By just passing a reference to the pool allows the garbage collector
        # to free self if nobody else has a reference to it.
        pool = self.pool

        # Close all the HTTPConnections in the pool before the
        # HTTPConnectionPool object is garbage collected.
        weakref_finalize(self, _close_pool_connections, pool)

removed then the tests are no longer getting stuck.

Now I am not familiar enough with urllib3 code to understand why this may be but I would like to figure this out. Either if it’s some weird bug in urllib3 or if it has uncovered something we do wrong ourselves. Any help is appreciated

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 5
  • Comments: 20 (4 by maintainers)

Most upvoted comments

We also had this deadlock issue. We found it was related to gevent.monkey that we were using globally. Our deadlocks occurred in multiprocessing.pool.ThreadPool

This is a known, documented issue resulting from multiprocessing being implemented in C.

Also facing the issue with hanging pool.get().

python==3.11.5 urllib3==1.26.18 gevent==23.9.1

Not complete traceback:

  File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 649, in put
    return self.request("PUT", url, data=data, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ddtrace/contrib/requests/connection.py", line 65, in _wrap_request
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ddtrace/contrib/requests/connection.py", line 121, in _wrap_send
    response = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 704, in urlopen
    conn = self._get_conn(timeout=pool_timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 276, in _get_conn
    conn = self.pool.get(block=self.block, timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/queue.py", line 182, in get
    self.not_full.notify()
  File "/usr/local/lib/python3.11/threading.py", line 376, in notify
    if not self._is_owned():
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/threading.py", line 289, in _is_owned
    if self._lock.acquire(False):
       ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/gevent/thread.py", line 132, in acquire
    sleep()
  File "/usr/local/lib/python3.11/site-packages/gevent/hub.py", line 159, in sleep
    waiter.get()
  File "src/gevent/_waiter.py", line 143, in gevent._gevent_c_waiter.Waiter.get
  File "src/gevent/_waiter.py", line 154, in gevent._gevent_c_waiter.Waiter.get
  File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
  File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
  File "src/gevent/_greenlet_primitives.py", line 65, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
  File "src/gevent/_gevent_c_greenlet_primitives.pxd", line 35, in gevent._gevent_c_greenlet_primitives._greenlet_switch
greenlet.GreenletExit

@LefterisJP I also saw that you are involved in discussion with @jamadden. Can you please share what is the progress on this one? Is there anyway we can help to investigate and fix the issue?