ioredis: Client waiting endlessly for info from unresponsive server
It appears that in some instances the connection to the server is established but the server will not respond (we are running AWS lambdas and elasticache, which is not really a good marriage).
In this scenario the client is stuck waiting for a response and queuing commands but never making progress or giving up and retrying the connection. Things just halt forever, since no new traffic is sent to detect that the connection is lost.
It can be reproduced by opening a port with netcat and just ignoring incoming traffic (nc -l <port>).
Could a socket timeOut be set up to handle this while waiting for responses?
About this issue
- Original URL
- State: open
- Created 6 years ago
- Comments: 22
Adding timeout features to ioredis seems to be a difficult (or unwanted) task, although it is needed in some cases. I solved this by using
promise-timeoutand wrapping each and every ioredis call. This is not very elegant (and may also lead to internal reference leak when queries do not terminate), but at least I can handle timeouts in HTTP requests this way.Added a pull request trying to address this issue: https://github.com/luin/ioredis/pull/658. What’s your thoughts?
I think part of the problem is that the RESP (Redis Protocol) is really basic and most of the commands are blocking. I don’t believe that there is any kind of sequence/transaction/request ID used. So if the client timed out and issued another command, then the response to the previous command could first be received. I’m not sure how that would be handled - probably not well if there is no transaction ID. It might be safer that if a time out occurs to simply terminate the connection and reconnect.
It would be cleaner if ioredis set a timeout on every command issued and if the timeout expired, return an error (if callback used) or reject (if promise used), terminate the connection and automatically attempt to reconnect.
maxRetriesPerRequestdoes not help with this. @luin This should not be closed yet.The failure happens when the socket is connected but the server is not responding. Probably the OS will timeout at some point, but it appears that it will take 15 minutes by default on Linux to give up on that (https://pracucci.com/linux-tcp-rto-min-max-and-tcp-retries2.html)