redis-py: redis-py client side hang; is there a stale or timeout re-connect paramter?

can reproduce this from a simple python shell, make a redis connection and let it idle for some time (some hours or I don’t know what’s the actual server side timeout), try a get all keys call, or any other call, it takes 930 seconds to return, or 15 minutes, this is a simple test server only have a few small keys, and get all keys by normal is just 0.003 second, so I suppose it’s the idling has caused either side closed the connection, but why can’t the client side detect it earlier, raise an exception, or do something earlier? I tried normal connection or ConnectionPool doesn’t help;

In [18]: r = redis.Redis(connection_pool=redis.ConnectionPool(host='redis-server-...', port=6379, db='0'))

## idle for some hours

In [37]: start = datetime.now(); print r.keys(); end = datetime.now(); print "{:.3f} seconds".format((end - start).total_seconds())
# printed keys...
930.306 seconds

I’m reading the document https://pypi.python.org/pypi/redis don’t see an stale option or client side timeout option?

https://github.com/andymccurdy/redis-py/issues?q=hang I have searched this there are many hang problems, however I am not seeing a duplicate

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 18
  • Comments: 39 (13 by maintainers)

Most upvoted comments

right; that’s easy to prove if I run a python shell on the redis-server and the same code never timeout or hang for 15 minutes, but I need to understand this network situation, I did ss to show TCP internal information and tcpdump listening on port 6379:

$ ss -n4ti -o state established '( dport == 6379 )'
State      Recv-Q Send-Q                         Local Address:Port                                        Peer Address:Port              
ESTAB      0      14                               10.0.50.227:38620                                          10.5.8.30:6379                
timer:(on,42sec,13)
         ts sack cubic wscale:7,7 rto:120000 backoff:13 rtt:2.34/0.62 ato:40 mss:1448 cwnd:1 ssthresh:7 send 5.0Mbps lastsnd:77239 lastrcv:2
863224 lastack:2863224 pacing_rate 99.0Mbps unacked:1 retrans:1/14 lost:1 rcv_space:29200

## while strace is still running in background:
sendto(7, "*1\r\n$4\r\nPING\r\n", 14, 0, NULL, 0) = 14
recvfrom(7, "+PONG\r\n", 65536, 0, NULL, NULL) = 7        <= hanging here for 15 minutes to get data

## tcpdump -i eth0 -vvXX tcp port 6379

    10.0.50.227.38620 > 10.5.8.30.6379: Flags [P.], cksum 0x4f3a (incorrect -> 0x49ea), seq 28:42, ack 15, win 229, options [nop,nop,TS val 1665157632 ecr 137547475], length 14
        0x0000:  6487 8810 1090 b8ae ed78 cd34 0800 4500  d........x.4..E.
        0x0010:  0042 d90f 4000 4006 12a1 0a00 32e3 0a05  .B..@.@.....2...
        0x0020:  081e 96dc 18eb 5487 171f 7372 b597 8018  ......T...sr....
        0x0030:  00e5 4f3a 0000 0101 080a 6340 4a00 0832  ..O:......c@J..2
        0x0040:  ced3 2a31 0d0a 2434 0d0a 5049 4e47 0d0a  ..*1..$4..PING..


## Syn pack started a new TCP connection, with a new sport 38740 => redis-server:6379
02:07:01.639262 b8:ae:ed:78:cd:34 > 64:87:88:10:10:90, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 55272, offset 0, flags [DF], proto TCP (6), length 60)
    10.0.50.227.38740 > 10.5.8.30.6379: Flags [S], cksum 0x4f34 (incorrect -> 0x8fc0), seq 3417765868, win 29200, options [mss 1460,sackOK,T
S val 1665277956 ecr 0,nop,wscale 7], length 0
        0x0000:  6487 8810 1090 b8ae ed78 cd34 0800 4500  d........x.4..E.
        0x0010:  003c d7e8 4000 4006 13ce 0a00 32e3 0a05  .<..@.@.....2...
        0x0020:  081e 9754 18eb cbb6 f7ec 0000 0000 a002  ...T............
        0x0030:  7210 4f34 0000 0204 05b4 0402 080a 6342  r.O4..........cB
        0x0040:  2004 0000 0000 0103 0307                 ..........


# sysctl -a |grep tcp_retries
net.ipv4.tcp_retries1 = 3
net.ipv4.tcp_retries2 = 15

from tcpdump’s output it seems triggered Linux TCP Stack retransmitting, it retries and retries at interval 200ms, 200ms, 400ms, 800ms, 1600ms, and doubling timeout interval till 2min, and retries totally for 18 times; got the results 200 + 200 + 400 + 800 + 1600 + 3200 + 6400 + 12800 + 25600 + 51200 + 102400 + 120000 * 6 ~= 930 seconds; this is the ~15 minutes hanging; then it gave up and make a new TCP connection to server port 6379;

@Andrew-Chen-Wang TCP Keep Alive is separate from socket_timeout.

TCP Keep Alive is a technique where both sides of a TCP connection will send random data back and forth so that network devices between the two sides see regular activity. redis-py supports this if you enable the socket_keepalive (boolean) and socket_keepalive_options (platform dependent) values.

socket_timeout instructs the client side of the connection to only block up to socket_timeout seconds on any blocking socket operation.

If a connection goes dead, the connection pool should detect that the next time a connection is retrieved from the pool. You can use the health_check_interval option to make sure the connection is tested at least every health_chech_interval seconds. If you experience a lot of random disconnects in your environment this option can greatly help alleviate that.

You can use the socket_timeout=num_seconds option when creating a client instance to control how long to wait for a response before raising a TimeoutError.

Most of these types of issues are related to network issues. Routers or other appliances often shutdown idle TCP streams. You can also try using the socket_keepalive=True option to turn on standard TCP keepalive.

fwiw, ran into this issue for deployments on kubernetes in GKE / GCP (Google Cloud Platform, Kubernetes Engine) and setting socket_keepalive=True didn’t help, although adding socket_timeout=300 did.

self.redis = StrictRedis.from_url(self.url, socket_keepalive=True, socket_timeout=300)

from looking at the code, setting a health_check_interval triggers PING commands, which would also be affected by the connection timeout? https://github.com/redis/redis-py/blob/bea72995fd39b01e2f0a1682b16b6c7690933f36/redis/connection.py#L755-L759

@ss75710541 socket_timeout applies to all blocking socket operations. socket_connect_timeout applies only to blocking during the initial TCP handshake.