redis-py: 4.6.0 - ConnectionError max number of clients reached

Version: 4.6.0

Platform: Python 3.10 / Docker or Linux

Server : Redis 7.0.11

Description:

We have an application using redis in an async context for months now. With the ugrade of redis-py from 4.5.x to 4.6.0, we saw our app on staging environment starts failing with ConnectionError max number of clients reached

I first run a review of the code to add the missing await redis_conn.close() statements that were missing but the issue still occurs.

It seems that we reach the 10.000 max clients settings in one hour approx whereas we never used to so far. Unfortunately on this environment, I don’t track this metric.

On staging last week, I used a workaround to set a client timeout to pass the one hour issue (config set timeout 60)

Doing it again a few minutes ago, I could drop connection from 8k to 160 and it’s stable then around 160.

On production environment (Python 3.9, redis 4.5.5), we have 25 connected clients on avergage over the last 6 months with a similar activity to our staging environment. And there is no timeout configuration set, so it’s disabled (default configuration)

On staging, reverting to 4.5.5 version of redis-py and with timeout disabled (aka value to 0), after a few minutes it’s stable around 20 connections whereas it was to be 1000+ in the same amount of time with 4.6.0.

My code used to be basically :

from redis.asyncio import Redis
from app import settings

async def get_redis_connection():
    redis = Redis.from_url(settings.REDIS_URL)
    return redis

async def list_queued_job_ids():
    redis_conn = await get_redis_connection()
    ... do something with redis data and related code ...
    await redis_conn.close()
    return something

So I don’t know if it’s a regression in 4.6.0 with the AbstractClass maybe ? or something to add/change in my code but I didn’t find any documentation that could help me on the changes to be made.

Let me know if you need further details.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 37 (18 by maintainers)

Commits related to this issue

Most upvoted comments

The reason was that the async version was doing all kinds of nasty stuff, including creating a new event loop, in order to perform a disconnect. __del__ is a synchronous method, whereas disconnect() is async.

The reasoning was also, that if there is a del call, there is no need to disconnect cleanly (i.e a socket disconnect will happen when the socket objects gets collected). What I failed to take into account was that in case of connection pooling, this is not a disconnect, but a return to the pool. connections need to return to the pool, not disconnect. We can do this more cleanly. I will create a pr which achieves this without having to invoke all the async stuff.

As several related bugs were fixed with this issue, I suggest that next one starts a new issue 😄

thanks @kristjanvalur for all your fixes

aaand, it got merged 😃

I haven’t tried 5.0 in our production application since I had issues with 4.6 which capped our Redis instance connections but I also saw this issue on 4.6 with a FastAPI application. Our pattern was essentially

Right, @Harry-Lees , I’d rewrite that as:

    connection = redis.from_url(config.REDIS_URI, decode_responses=True)
    try:
       yield connection
    finally:
        await connection.close()

or

    async with redis.from_url(config.REDIS_URI, decode_responses=True) as connection:
       yield connection

this ensures that the connection is closed even if there is an error when using it.

@kristjanvalur seems I found the last remaining culprits where redis connection was not correctly closed.

Thanks for your detailed explaination about context managers. It helped to fix few ones this afternoon.

On my local machine, seems quite stable so far - will run it all night long to see if it changes but seems good so far this time 😄

It’s a FastAPI app with arq workers also - so extracting the code for easy reproduction is not that easy 😢

I haven’t tried 5.0 in our production application since I had issues with 4.6 which capped our Redis instance connections but I also saw this issue on 4.6 with a FastAPI application. Our pattern was essentially

async def get_redis():
    """
    Acquire a Redis connection.
    """
    if config.REDIS_URI is None:
        raise ValueError("REDIS_URI must be set in the environment")
    connection = redis.from_url(config.REDIS_URI, decode_responses=True)
    yield connection
    await connection.close()

which was then called in endpoints using FastAPI’s dependency injection.

from fastapi import Depends

async def foo(conn = Depends(get_redis)):
   ...

Interestingly, in async python, the canonical method to use is aclose(). I’ll probably create PR to add those, so that stdlib context managers and other tools play well with redis.

The proper pattern, with the small example you gave me is:

async def do_something():
    async with Redis.from_url("redis://localhost:6379/0") as redis:
        await do_something_with_the_redis(redis)

Or for a worker:

async def do_something():
    redis = Redis.from_url("redis://localhost:6379/0")
    try:
        start_worker(work_handler, redis)
    except Exception:
        await redis.close()
        raise

async def work_handler(redis):
    async with redis:
        await do_something_with_the_redis(redis)

Basically, making sure that whenever you have created a Redis connection pool, (implicit inside the Redis object), that it is appropriately closed afterwards.

If you have a more complex pattern, such as a separately managed ConnectionPool (which is wise), then you similarly need to close that, when you have stopped using it (using pool.disconnect(*) curiously. It should have an aclose() method too, for consistency.

In async python, doing any sort of IO related cleanup during garbage collection is frowned upon. Instead, it is the responibility of the programmer to ensure that resources are freed, typically using a context manager pattern as above. This is because __del__ handlers can be called at arbitrary places and often are very limited in that what they can do.

Synchronous python is more forgiving, in that blocking IO operations can generally be done anywyere.

Standard library python generally does not do any cleanup from del. Synchronous redis-py did put safety belts in the sync code. The async library also had safety belts in place but they were nasty: They would create a new event loop and do things that generally should not be done in __del__ methods.

As it’s a rainy day, I just tested it and it works as expected.

❯ poetry add git+https://github.com:redis/redis-py.git#master

Updating dependencies
Resolving dependencies... (0.5s)

Package operations: 0 installs, 1 update, 0 removals

  • Updating redis (4.6.0 -> 5.0.0rc2 3e50d28)

Writing lock file

then, run the sample code below :

from redis.asyncio import Redis
import asyncio


async def get_redis_connection():
    redis = Redis.from_url("redis://localhost:6379/0")
    # redis = Redis(host="localhost", port=6379, db=0)
    return redis


async def do_something():
    redis_conn = await get_redis_connection()
    result = await redis_conn.incr("counter")
    await asyncio.sleep(1)
    await redis_conn.close()
    return result


async def main():
    while True:
        res = await do_something()
        print(res)

The two versions of the redis variable definition works as expected with same behaviour.

   redis = Redis.from_url("redis://localhost:6379/0")
   # or
   redis = Redis(host="localhost", port=6379, db=0)

So from my point of view, all the issues identified in this issue are fixed.

Thanks for your support, fixes and follow-up @kristjanvalur @woutdenolf @dvora-h !

So, no, I fixed this use case in the pr.

Interesting, I can have a look.

update: The difference in behaviour is that when Redis.from_url() is called, then Redis.auto_close_connection == False, whereas it is True otherwise. I think this is a bug, since a private connection pool is indeed created, and should be closed.

When it is False, the pool is simply garbage collected, and also the connections in it. For some reason, however, without an explicit disconnect, the connection stays open. I need to investigate how that happens.

When I checkout v4.5.5 the connection does have a __del__.

It was removed in https://github.com/redis/redis-py/pull/2755

@kristjanvalur Can you provide some insight in why __del__ was removed? The use-case above needs it.

Nice. I can reproduce it. I compared the behavior with the synchronous version and I noticed this:

If you make a sync equivalent of the code in https://github.com/redis/redis-py/issues/2831#issuecomment-1641925032, the clients are disconnected when the connections are garbage collected:

#redis/connection.py

class AbstractConnection:

    def __del__(self):
        try:
            self.disconnect()
        except Exception:
            pass

There is no __del__ for the async version so disconnect never gets called. This means all connections you make stay connected.

Could you create a full example that reproduces the issue?

You can hit maxclients like this but that’s to be expected

import asyncio
from redis.asyncio import Redis


async def get_redis_connection():
    redis = Redis.from_url("redis://localhost:6379/0")
    return redis


async def list_queued_job_ids():
    redis_conn = await get_redis_connection()
    result = await redis_conn.incr("counter")
    await asyncio.sleep(1)
    await redis_conn.close()
    return result


async def main(fail=False):
    aws = (list_queued_job_ids() for _ in range(10000 + int(fail)))
    print(max(await asyncio.gather(*aws)))


asyncio.run(main(fail=False))
asyncio.run(main(fail=True))