redis-py: Random ConnectionErrors (104, Connection reset by peer)

Version: redis-py: 3.2.1 redis server: 5.0.5

Platform: Debian

Description: I have a connection to the Server that is idle for a few hours, and when I try to perform an operation, I get a redis.exceptions.ConnectionError: Error while reading from socket: (104, 'Connection reset by peer'). This happens not always but often enough to be annoying. I can wrap it in a try-except and just reconnect and it will work, however i don’t want to wrap every command to redis like this.

Some Background: The program is getting tasks from a Queue implemented in Redis, computing for a few hours, getting the next task, and so on.

The question is, is this a bug or expected? Is there an option to prevent this? I don’t have any timeouts set on the client or server (everything is default). The current workaround would be to emulate a pre-ping, by sending a ping wrapped in a try-except-and-reconnect-if-needed, but it’s not very pretty.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 28
  • Comments: 36 (11 by maintainers)

Most upvoted comments

Hi, sorry I haven’t responded to this issue earlier. Upgrading to version 3.3.x should improve your experience. 3.3.x added a new option health_check_interval. You can use health checks like:

client = redis.Redis(..., health_check_interval=30)

The value of health_check_interval specifies the time in seconds a connection can be idle before its health needs to be checked. In the above case, if the connection is idle for more than 30 seconds, a round trip PING/PONG will be attempted just before your next command. If the PING/PONG fails, the connection will reestablished behind the scenes before your intended command is executed.

@hyeongguen-song I have found an example that can replicate this issue. After 14k connections the iterations slow down, after 28k there is Connection reset by peer. Interesting thing is that small sleep after each 1k connections will prevent the issue. Maybe there is problem with high load?

Replicated on 2 different machines, setup

  • python3.7
  • redis==4.3.4
  • redis docker image: redis/redis-stack:latest
import time
import redis

test_key = 'test_key'

def insert_dummy_data():
    conn = redis.Redis()
    conn.hset(test_key, mapping={'a': 1})
    conn.close()
    print('data inserted')


def run_dummy_get():
    i = 0
    while True:
        t = time.time()
        for k in range(1000):
            conn = redis.Redis()
            conn.hgetall(test_key)
            conn.close()

        i += 1
        print(f'{i:02d} - 1k hgetall done in {time.time() - t:.2f}s')
        # time.sleep(0.5)  # sleep here will prevent the issue 


if __name__ == '__main__':
    insert_dummy_data()
    run_dummy_get()

Output


data inserted
01 - 1k hgetall done in 0.52s
02 - 1k hgetall done in 0.59s
03 - 1k hgetall done in 1.19s
04 - 1k hgetall done in 0.52s
05 - 1k hgetall done in 0.46s
06 - 1k hgetall done in 0.48s
07 - 1k hgetall done in 0.47s
08 - 1k hgetall done in 0.44s
09 - 1k hgetall done in 0.45s
10 - 1k hgetall done in 0.43s
11 - 1k hgetall done in 0.45s
12 - 1k hgetall done in 0.45s
13 - 1k hgetall done in 0.45s
14 - 1k hgetall done in 0.45s
15 - 1k hgetall done in 2.95s
16 - 1k hgetall done in 3.46s
17 - 1k hgetall done in 3.44s
18 - 1k hgetall done in 3.38s
19 - 1k hgetall done in 3.92s
20 - 1k hgetall done in 4.16s
21 - 1k hgetall done in 3.75s
22 - 1k hgetall done in 3.51s
23 - 1k hgetall done in 3.40s
24 - 1k hgetall done in 3.63s
25 - 1k hgetall done in 3.33s
26 - 1k hgetall done in 3.31s
27 - 1k hgetall done in 3.32s
28 - 1k hgetall done in 3.34s
Traceback (most recent call last):
  File "/python3.7/site-packages/redis/connection.py", line 824, in read_response
    response = self._parser.read_response(disable_decoding=disable_decoding)
  File "/python3.7/site-packages/redis/connection.py", line 467, in read_response
    self.read_from_socket()
  File "/python3.7/site-packages/redis/connection.py", line 421, in read_from_socket
    bufflen = self._sock.recv_into(self._buffer)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "redis_example.py", line 31, in <module>
    run_dummy_get()
  File "redis_example.py", line 21, in run_dummy_get
    conn.hgetall(test_key)
  File "/python3.7/site-packages/redis/commands/core.py", line 4776, in hgetall
    return self.execute_command("HGETALL", name)
  File "/python3.7/site-packages/redis/client.py", line 1242, in execute_command
    lambda error: self._disconnect_raise(conn, error),
  File "/python3.7/site-packages/redis/retry.py", line 49, in call_with_retry
    fail(error)
  File "/python3.7/site-packages/redis/client.py", line 1242, in <lambda>
    lambda error: self._disconnect_raise(conn, error),
  File "/python3.7/site-packages/redis/client.py", line 1228, in _disconnect_raise
    raise error
  File "/python3.7/site-packages/redis/retry.py", line 46, in call_with_retry
    return do()
  File "/python3.7/site-packages/redis/client.py", line 1240, in <lambda>
    conn, command_name, *args, **options
  File "/python3.7/site-packages/redis/client.py", line 1215, in _send_command_parse_response
    return self.parse_response(conn, command_name, **options)
  File "/python3.7/site-packages/redis/client.py", line 1254, in parse_response
    response = connection.read_response()
  File "/python3.7/site-packages/redis/connection.py", line 830, in read_response
    raise ConnectionError(f"Error while reading from {hosterr}" f" : {e.args}")
redis.exceptions.ConnectionError: Error while reading from localhost:6379 : (104, 'Connection reset by peer')

Process finished with exit code 1

Hello guys, after debugging the code with TCP dump and ipdb inside python I finally discovered why the provided here solutions were not working or working but not always.

The next portion of discussion will be related to the people who are using django-redis as their CACHE backend configuration and not redis-py directly.

The problem appeared in the configuration of the RedisClient, as we should setup the configuration for the CONNECTION_POOL_KWARGS and not for REDIS_CLIENT_KWARGS.

Currently we are having hte following configuration and health-check with PING-PONG commands is working correctly.

CACHES = {
    'default': {
        ...
        'OPTIONS': {
            'CLIENT_CLASS': 'django_redis.client.DefaultClient',
            'SOCKET_CONNECT_TIMEOUT': 10,
            'SOCKET_TIMEOUT': 60,
            'CONNECTION_POOL_KWARGS': {  # <-- note that here we are using the kwargs for connection pool
                'socket_keepalive': True,
                'health_check_interval': 30,  # <-- interval for the health-check if the last command executed longer that 30 seconds ago
                'retry_on_timeout': True,  # <-- perform retry on timeout (just in case)
                'retry': Retry(FullJitterBackoff(cap=5, base=1), 5),  # <-- which retry logic to apply, if you will not specify it, you will have 0 retries, meaning that if `PING-PONG` query is failed, it will hard fail instantly and you will see the traceback of the health-check command
            },
        },
    }
}

Not everything is working and retried for 5 times in case of ConnectionError or TimeoutError. Also if you need to handle special exceptions, you can specify them using special kwarg.

We are currently testing GCP MemoryStore as a replacement for memcache.

Using GAE Python 2.7, we get random “ConnectionError: Error while reading from socket: (104, ‘Connection reset by peer’)”. Typically this error occurs after about 120 seconds of waiting for some redis command to complete (delete, set, get), so they can be hard to handle with a backoff mechanism.

We use version 3.3.8. I’ve been testing using health_check_interval=30 and lower values than 30 (down to 2, currently). This seems to have made the errors less frequent, but they still occur often enough to be of concern.

Perhaps this is a purely MemoryStore/redis server issue, however.

I’ve thrown the kitchen sink at this problem with an application I’m working on. It has not yet seen enough action to conclude one way or other, but I am posting it here on off chance it may help someone. The big point is configuring keepalive options beyond the boolean socket_keepalive. In particular, the default value for TCP_KEEPIDLE is 7200 seconds (my understanding is that there is an RFC specifying 2 hours as the minimum acceptable default for TCP).

            if platform == "linux":
                ka_options = {
                    socket.TCP_KEEPIDLE: 10,
                    socket.TCP_KEEPINTVL: 5,
                    socket.TCP_KEEPCNT: 5
                }
            elif platform == "darwin":
                # python 3.10 will support socket.TCP_KEEPALIVE on MacOS which
                # is direct analog to TCP_KEEPIDLE on Linux
                ka_options = {socket.TCP_KEEPINTVL: 5, socket.TCP_KEEPCNT: 5}
            else:
                ka_options = {}

            connection = redis.StrictRedis(host=Config.REDIS_HOST,
                                           port=Config.REDIS_PORT,
                                           health_check_interval=15,
                                           socket_keepalive=True,
                                           socket_keepalive_options=ka_options)

For all of you who had issues with GCP, are you guys still facing this problem? We have health check intervals setup and are on Redis 5 - not sure what else to try. We keep getting hundreds of these errors each day.

It’s easier than that. If you get a TimeoutError or ConnectionError, you can simply call .disconnect() followed by .connect() on the connection that raised the error. You could do this by subclassing Redis.execute_command and putting your retry logic there.

For all of you who had issues with GCP, are you guys still facing this problem? We have health check intervals setup and are on Redis 5 - not sure what else to try. We keep getting hundreds of these errors each day.

I haven’t had any more issues, I’m running redis v5 and have it running in the same region as my app.

Hey guys, there is no fix for the issue?

I am also on GCP and also having random issues. I am using django_cache which is using redis-py under the hood. I am providing socket_keepalive and health_check_interval but still having mentioned issue from time to time, at least once a day.

@tastypackets Yes, every few minutes this error appears, the config changes suggested here don’t do the trick for me.

Related to GCP issues, it appears to have gone away after I changed to Redis version 5 inside MemoryStore. I have only been testing it for a day on version 5, but so far I haven’t seen this error again and I used to see it all the time.