redis-py: Sporadic "Connection not ready" exceptions with BlockingConnectionPool since 3.2.0
Version: 3.2.0
Platform: Python 3.6.7 | packaged by conda-forge | (default, Nov 21 2018, 02:32:25) [GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux CentOS Linux release 7.5.1804 (Core) on docker
The redis server is using the official docker image. Redis server v=4.0.11 sha=00000000:0 malloc=jemalloc-4.0.3 bits=64 build=74253224a862200c
Description: Since upgrading to 3.2.0, we started getting sporadic errors in getting connections. image
The code that is running looks like this:
pool = BlockingConnectionPool(max_connections=config.REDIS_CONNECTIONS_PER_WORKER, host=config.REDIS_HOST, port=config.REDIS_PORT, db=0, timeout=config.REDIS_TIMEOUT)
redis = StrictRedis(connection_pool=pool)
redis.setex(...) / redis.get(...)
Additional information: This code is running inside an eventlet gunicorn worker. The server is very “network heavy” and opens lots of sockets (for example, dns queries).
127.0.0.1:6379> config get maxclients
1) "maxclients"
2) "10000"
127.0.0.1:6379> info clients
# Clients
connected_clients:104
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 3
- Comments: 22 (9 by maintainers)
It’s in line with what I was skimming in the eventlet docs. So that means that in your environment redis-py is likely using
select.selectto validate the health of a connection.select.selecthas a number of issues, most notably only being able to poll file descriptors with file numbers < ~1024. If you’re running over that limit, the current implementation will simply return that the connection isn’t ready, which would explain why you’re seeing the error. See here: https://github.com/andymccurdy/redis-py/blob/master/redis/selector.py#L47If you can correlate these errors around traffic spikes, that might further suggest that we’re on the right track.
One thing you could try is to reduce the
max_connectionsin the pool. Fewer connections means fewer file descriptors which should reduce the chance of hitting theselect.selectissue.We could also make the selector more pluggable such that you could inject your own logic or turn off the health checks for your environment.
You could actually do this now with a (albeit ugly) monkey patch like so:
Or another possible solution is a simple flag to the connection pool to turn of the health checks. Enabling such a flag would cause the pool to behave the same as it did in 3.1.0.
Great, thanks for testing @NirBenor. I’ll get this merged in a few days.
Appears to work fine 👍