redis-py: redis-py client side hang; is there a stale or timeout re-connect paramter?
can reproduce this from a simple python shell, make a redis connection and let it idle for some time (some hours or I don’t know what’s the actual server side timeout), try a get all keys call, or any other call, it takes 930 seconds to return, or 15 minutes, this is a simple test server only have a few small keys, and get all keys by normal is just 0.003 second, so I suppose it’s the idling has caused either side closed the connection, but why can’t the client side detect it earlier, raise an exception, or do something earlier? I tried normal connection or ConnectionPool doesn’t help;
In [18]: r = redis.Redis(connection_pool=redis.ConnectionPool(host='redis-server-...', port=6379, db='0'))
## idle for some hours
In [37]: start = datetime.now(); print r.keys(); end = datetime.now(); print "{:.3f} seconds".format((end - start).total_seconds())
# printed keys...
930.306 seconds
I’m reading the document https://pypi.python.org/pypi/redis don’t see an stale option or client side timeout option?
https://github.com/andymccurdy/redis-py/issues?q=hang I have searched this there are many hang problems, however I am not seeing a duplicate
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 18
- Comments: 39 (13 by maintainers)
right; that’s easy to prove if I run a python shell on the redis-server and the same code never timeout or hang for 15 minutes, but I need to understand this network situation, I did
ssto show TCP internal information andtcpdumplistening on port 6379:from tcpdump’s output it seems triggered Linux TCP Stack retransmitting, it retries and retries at interval 200ms, 200ms, 400ms, 800ms, 1600ms, and doubling timeout interval till 2min, and retries totally for 18 times; got the results
200 + 200 + 400 + 800 + 1600 + 3200 + 6400 + 12800 + 25600 + 51200 + 102400 + 120000 * 6 ~= 930 seconds; this is the~15 minuteshanging; then it gave up and make a new TCP connection to server port 6379;@Andrew-Chen-Wang TCP Keep Alive is separate from
socket_timeout.TCP Keep Alive is a technique where both sides of a TCP connection will send random data back and forth so that network devices between the two sides see regular activity. redis-py supports this if you enable the
socket_keepalive(boolean) andsocket_keepalive_options(platform dependent) values.socket_timeoutinstructs the client side of the connection to only block up tosocket_timeoutseconds on any blocking socket operation.If a connection goes dead, the connection pool should detect that the next time a connection is retrieved from the pool. You can use the
health_check_intervaloption to make sure the connection is tested at least everyhealth_chech_intervalseconds. If you experience a lot of random disconnects in your environment this option can greatly help alleviate that.You can use the
socket_timeout=num_secondsoption when creating a client instance to control how long to wait for a response before raising a TimeoutError.Most of these types of issues are related to network issues. Routers or other appliances often shutdown idle TCP streams. You can also try using the
socket_keepalive=Trueoption to turn on standard TCP keepalive.fwiw, ran into this issue for deployments on kubernetes in GKE / GCP (Google Cloud Platform, Kubernetes Engine) and setting socket_keepalive=True didn’t help, although adding socket_timeout=300 did.
from looking at the code, setting a
health_check_intervaltriggersPINGcommands, which would also be affected by the connection timeout? https://github.com/redis/redis-py/blob/bea72995fd39b01e2f0a1682b16b6c7690933f36/redis/connection.py#L755-L759@ss75710541
socket_timeoutapplies to all blocking socket operations.socket_connect_timeoutapplies only to blocking during the initial TCP handshake.