redis-py: Random ConnectionErrors (104, Connection reset by peer)
Version: redis-py: 3.2.1 redis server: 5.0.5
Platform: Debian
Description:
I have a connection to the Server that is idle for a few hours, and when I try to perform an operation, I get a redis.exceptions.ConnectionError: Error while reading from socket: (104, 'Connection reset by peer'). This happens not always but often enough to be annoying. I can wrap it in a try-except and just reconnect and it will work, however i don’t want to wrap every command to redis like this.
Some Background: The program is getting tasks from a Queue implemented in Redis, computing for a few hours, getting the next task, and so on.
The question is, is this a bug or expected? Is there an option to prevent this? I don’t have any timeouts set on the client or server (everything is default). The current workaround would be to emulate a pre-ping, by sending a ping wrapped in a try-except-and-reconnect-if-needed, but it’s not very pretty.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 28
- Comments: 36 (11 by maintainers)
Hi, sorry I haven’t responded to this issue earlier. Upgrading to version 3.3.x should improve your experience. 3.3.x added a new option
health_check_interval. You can use health checks like:The value of
health_check_intervalspecifies the time in seconds a connection can be idle before its health needs to be checked. In the above case, if the connection is idle for more than 30 seconds, a round trip PING/PONG will be attempted just before your next command. If the PING/PONG fails, the connection will reestablished behind the scenes before your intended command is executed.@hyeongguen-song I have found an example that can replicate this issue. After 14k connections the iterations slow down, after 28k there is
Connection reset by peer. Interesting thing is that small sleep after each 1k connections will prevent the issue. Maybe there is problem with high load?Replicated on 2 different machines, setup
Output
Hello guys, after debugging the code with TCP dump and ipdb inside python I finally discovered why the provided here solutions were not working or working but not always.
The next portion of discussion will be related to the people who are using
django-redisas their CACHE backend configuration and notredis-pydirectly.The problem appeared in the configuration of the RedisClient, as we should setup the configuration for the
CONNECTION_POOL_KWARGSand not forREDIS_CLIENT_KWARGS.Currently we are having hte following configuration and
health-checkwithPING-PONGcommands is working correctly.Not everything is working and retried for 5 times in case of
ConnectionErrororTimeoutError. Also if you need to handle special exceptions, you can specify them using special kwarg.We are currently testing GCP MemoryStore as a replacement for memcache.
Using GAE Python 2.7, we get random “ConnectionError: Error while reading from socket: (104, ‘Connection reset by peer’)”. Typically this error occurs after about 120 seconds of waiting for some redis command to complete (delete, set, get), so they can be hard to handle with a backoff mechanism.
We use version 3.3.8. I’ve been testing using
health_check_interval=30and lower values than 30 (down to 2, currently). This seems to have made the errors less frequent, but they still occur often enough to be of concern.Perhaps this is a purely MemoryStore/redis server issue, however.
I’ve thrown the kitchen sink at this problem with an application I’m working on. It has not yet seen enough action to conclude one way or other, but I am posting it here on off chance it may help someone. The big point is configuring keepalive options beyond the boolean
socket_keepalive. In particular, the default value for TCP_KEEPIDLE is 7200 seconds (my understanding is that there is an RFC specifying 2 hours as the minimum acceptable default for TCP).For all of you who had issues with GCP, are you guys still facing this problem? We have health check intervals setup and are on Redis 5 - not sure what else to try. We keep getting hundreds of these errors each day.
It’s easier than that. If you get a
TimeoutErrororConnectionError, you can simply call.disconnect()followed by.connect()on the connection that raised the error. You could do this by subclassingRedis.execute_commandand putting your retry logic there.I haven’t had any more issues, I’m running redis v5 and have it running in the same region as my app.
Hey guys, there is no fix for the issue?
I am also on GCP and also having random issues. I am using
django_cachewhich is usingredis-pyunder the hood. I am providingsocket_keepaliveandhealth_check_intervalbut still having mentioned issue from time to time, at least once a day.@tastypackets Yes, every few minutes this error appears, the config changes suggested here don’t do the trick for me.
Related to GCP issues, it appears to have gone away after I changed to Redis version 5 inside MemoryStore. I have only been testing it for a day on version 5, but so far I haven’t seen this error again and I used to see it all the time.