redisson: Possibility of broken connections on the pool
Expected behavior
When using a fixed connection pool of min=64 and max=64 to every node (masters and slaves) on cluster config. Redisson is able to open a healthy connection pool of 64 to each of the nodes
Actual behavior
It seems that on some cases when our app container (dockerized app) starts there might be some network/warm up issues (also seen by getting a few CLUSTER_NODES and CLUSTER_INFO timeouts during startup) leads to some connections on the pool to be broken. No issue is easily observed at low traffic after but after increasing a bit the load, it seems some requests to those instances (we deployed 20 instances and 2 ended up like this) fail with timeouts. This doesn’t seem to happen to instances that started and opened all the connections properly at startup but seems this faulty startup instances remain in a broken state and do not recover and/or re-create those broken connections.
On any case, as said, this only happens on those cases during startup. So that’s our primary hypothesis. If any logic can be put into place to deal with potential broken connections on the pool or some monitoring of the pool we can enable or some config we could do differently. Please advice
example timeout:
at rapid.shaded.org.redisson.command.CommandBatchService$3.run(CommandBatchService.java:675)
at rapid.shaded.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:682)
at rapid.shaded.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:757)
Steps to reproduce or test case
Trying to isolate an easy way to reproduce it at the moemnt
Redis version
3.2.8
Redisson version
3.10.4
Redisson configuration
"connectTimeout": 10000,
"timeout": 100,
"retryInterval": 50,
"retryAttempts": 4,
"masterConnectionMinimumIdleSize": 64,
"masterConnectionPoolSize": 64,
"slaveConnectionMinimumIdleSize": 64,
"slaveConnectionPoolSize": 64,
"keepAlive": true,
"tcpNoDelay": true,
"readMode": "MASTER_SLAVE",
"nodeAddresses": [
"redis://redis001.prod.local:6329",
"redis://redis001.prod.local:6339",
"redis://redis001.prod.local:6349",
"redis://redis002.prod.local:6329",
"redis://redis002.prod.local:6339",
"redis://redis002.prod.local:6349",
"redis://redis003.prod.local:6329",
"redis://redis003.prod.local:6339",
"redis://redis003.prod.local:6349",
"redis://redis004.prod.local:6329",
"redis://redis004.prod.local:6339",
"redis://redis004.prod.local:6349",
"redis://redis005.prod.local:6329",
"redis://redis005.prod.local:6339",
"redis://redis005.prod.local:6349",
"redis://redis006.prod.local:6329",
"redis://redis006.prod.local:6339",
"redis://redis006.prod.local:6349"
]
},
"useLinuxNativeEpoll": true
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 15 (7 by maintainers)
Did you try to set
pingConnectionInterval
setting? This would help to avoid broken connections by using redis PING command. Broken connection get reconnected if Redis fail to response.