redisson: Possibility of broken connections on the pool

Expected behavior

When using a fixed connection pool of min=64 and max=64 to every node (masters and slaves) on cluster config. Redisson is able to open a healthy connection pool of 64 to each of the nodes

Actual behavior

It seems that on some cases when our app container (dockerized app) starts there might be some network/warm up issues (also seen by getting a few CLUSTER_NODES and CLUSTER_INFO timeouts during startup) leads to some connections on the pool to be broken. No issue is easily observed at low traffic after but after increasing a bit the load, it seems some requests to those instances (we deployed 20 instances and 2 ended up like this) fail with timeouts. This doesn’t seem to happen to instances that started and opened all the connections properly at startup but seems this faulty startup instances remain in a broken state and do not recover and/or re-create those broken connections.

On any case, as said, this only happens on those cases during startup. So that’s our primary hypothesis. If any logic can be put into place to deal with potential broken connections on the pool or some monitoring of the pool we can enable or some config we could do differently. Please advice

example timeout:

	at rapid.shaded.org.redisson.command.CommandBatchService$3.run(CommandBatchService.java:675)
	at rapid.shaded.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:682)
	at rapid.shaded.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:757)

Steps to reproduce or test case

Trying to isolate an easy way to reproduce it at the moemnt

Redis version

3.2.8

Redisson version

3.10.4

Redisson configuration

                            "connectTimeout": 10000,
                            "timeout": 100,
                            "retryInterval": 50,
                            "retryAttempts": 4,
                            "masterConnectionMinimumIdleSize": 64,
                            "masterConnectionPoolSize": 64,
                            "slaveConnectionMinimumIdleSize": 64,
                            "slaveConnectionPoolSize": 64,
                            "keepAlive": true,
                            "tcpNoDelay": true,
                            "readMode": "MASTER_SLAVE",
                            "nodeAddresses": [
                                "redis://redis001.prod.local:6329",
                                "redis://redis001.prod.local:6339",
                                "redis://redis001.prod.local:6349",
                                "redis://redis002.prod.local:6329",
                                "redis://redis002.prod.local:6339",
                                "redis://redis002.prod.local:6349",
                                "redis://redis003.prod.local:6329",
                                "redis://redis003.prod.local:6339",
                                "redis://redis003.prod.local:6349",
                                "redis://redis004.prod.local:6329",
                                "redis://redis004.prod.local:6339",
                                "redis://redis004.prod.local:6349",
                                "redis://redis005.prod.local:6329",
                                "redis://redis005.prod.local:6339",
                                "redis://redis005.prod.local:6349",
                                "redis://redis006.prod.local:6329",
                                "redis://redis006.prod.local:6339",
                                "redis://redis006.prod.local:6349"
                            ]
                        },
                        "useLinuxNativeEpoll": true

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 15 (7 by maintainers)

Most upvoted comments

Did you try to set pingConnectionInterval setting? This would help to avoid broken connections by using redis PING command. Broken connection get reconnected if Redis fail to response.

mrniko on Apr 16, 2019