redisson: Unable to unfreeze entry when redis goes down in kuberentes (bitnami/redis)

Expected behavior If the redis sentinel master goes down for a short amount of time, and comes back up in less than down-after-milliseconds (i.e. a new master is not picked), redisson should continue to use the master

Actual behavior The pod continuously streams the following output:

2022-10-06 16:14:29.868 ERROR 1 --- [isson-netty-2-3] o.r.c.balancer.LoadBalancerManager       : Unable to unfreeze entry: [freeSubscribeConnectionsAmount=0, freeSubscribeConnectionsCounter=value:87:queue:0, freeConnectionsAmount=0, freeConnectionsCounter=value:138:queue:0, freezeReason=SYSTEM, client=[addr=redis://10.0.3.173:6379], nodeType=SLAVE, firstFail=1665072869858]

java.util.concurrent.CompletionException: org.redisson.client.RedisConnectionException: Unable to connect to Redis server: 10.0.3.173/10.0.3.173:6379
	at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
	at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
	at java.base/java.util.concurrent.CompletableFuture$BiRelay.tryFire(CompletableFuture.java:1423)
	at java.base/java.util.concurrent.CompletableFuture$CoCompletion.tryFire(CompletableFuture.java:1144)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
	at org.redisson.connection.pool.ConnectionPool.lambda$createConnection$1(ConnectionPool.java:154)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
	at org.redisson.connection.pool.ConnectionPool.promiseFailure(ConnectionPool.java:317)
	at org.redisson.connection.pool.ConnectionPool.lambda$createConnection$7(ConnectionPool.java:276)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
	at org.redisson.client.RedisClient$2$2.run(RedisClient.java:307)
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.redisson.client.RedisConnectionException: Unable to connect to Redis server: 10.0.3.173/10.0.3.173:6379
	at org.redisson.connection.pool.ConnectionPool.lambda$createConnection$1(ConnectionPool.java:153)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
	at org.redisson.connection.pool.ConnectionPool.promiseFailure(ConnectionPool.java:317)
	at org.redisson.connection.pool.ConnectionPool.lambda$createConnection$7(ConnectionPool.java:276)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
	at org.redisson.client.RedisClient$1$2.run(RedisClient.java:250)
	... 8 common frames omitted
Caused by: java.util.concurrent.CompletionException: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: Host is unreachable: 10.0.3.173/10.0.3.173:6379
	at java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367)
	at java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376)
	at java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019)
	... 11 common frames omitted
Caused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: Host is unreachable: 10.0.3.173/10.0.3.173:6379
Caused by: java.net.NoRouteToHostException: Host is unreachable
	at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337)
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:829)

And:

Caused by: org.redisson.client.RedisConnectionException: Unable to connect to Redis server: 10.0.3.173/10.0.3.173:6379
	at org.redisson.connection.pool.ConnectionPool.lambda$createConnection$1(ConnectionPool.java:153)
	... 19 common frames omitted
Caused by: java.util.concurrent.CompletionException: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: Host is unreachable: 10.0.3.173/10.0.3.173:6379
	at java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367)
	at java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376)
	at java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019)
	... 11 common frames omitted
Caused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: Host is unreachable: 10.0.3.173/10.0.3.173:6379
Caused by: java.net.NoRouteToHostException: Host is unreachable
	at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337)
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:829)

Steps to reproduce or test case

  1. Install redis in sentinel configuration using the latest bitnami helm chart
helm install "${APP_NAME}-redis" bitnami/redis \
             --version "17.3.1" \
             --set sentinel.enabled=true \
             --set auth.password="${REDIS_PASSWORD}" \
             --set replica.replicaCount=3 \
             --set global.redis.password="${REDIS_PASSWORD}"
  1. Now, you have 3 redis nods running:
kubectl get pods | grep redis
test2021-redis-node-0                   2/2     Running     0             14m
test2021-redis-node-1                   2/2     Running     0             14m
test2021-redis-node-2                   2/2     Running     0             13m
  1. Some information about the sentinel master:
kubectl exec -it test2021-redis-node-1 bash -c sentinel
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
I have no name!@test2021-redis-node-1:/$ redis-cli -a $REDIS_PASSWORD -p 26379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:26379> sentinel master mymaster
 1) "name"
 2) "mymaster"
 3) "ip"
 4) "test2021-redis-node-0.test2021-redis-headless.default.svc.cluster.local"
 ...
 21) "down-after-milliseconds"
22) "60000"
....
32) "failover-timeout"
33) 180000
....
  1. As you can see, node 0 is the master, so let’s kill it:
kubectl delete pods/test2021-redis-node-0 --force --grace-period=0
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "test2021-redis-node-0" force deleted
  1. Node 0 stays as the master, since it was down for ~10 seconds, which is less than the value of down-after-milliseconds, which is set to 60s. So the output of step 3 is still the same.

  2. The application server continuously streams the exceptions which I sent, and keeps streaming them long after (I monitored the logs for ~5 minutes) the master came back up. The pod is not useable until it restarts (via failed healthcheck or manually, at that point it doesn’t matter).

Redis version 7.0.5

Redisson version 3.17.7

Redisson configuration

---
sentinelServersConfig:
  idleConnectionTimeout: 10000
  connectTimeout: 10000
  timeout: 5000
  retryAttempts: 3
  retryInterval: 1500
  failedSlaveReconnectionInterval: 3000
  failedSlaveCheckInterval: 60000
  subscriptionsPerConnection: 5
  clientName: null
  loadBalancer: !<org.redisson.connection.balancer.RoundRobinLoadBalancer> {}
  subscriptionConnectionMinimumIdleSize: 1
  subscriptionConnectionPoolSize: 50
  slaveConnectionMinimumIdleSize: 24
  slaveConnectionPoolSize: 64
  masterConnectionMinimumIdleSize: 24
  masterConnectionPoolSize: 64
  readMode: "SLAVE"
  subscriptionMode: "SLAVE"
  password: "whatever"
  sentinelAddresses:
  - "redis://test2021-redis-headless:26379"
  masterName: "mymaster"
  keepAlive: true
  database: 0
  checkSentinelsList: false
eventLoopGroup: null
threads: 16
nettyThreads: 32
codec: !<org.redisson.codec.SnappyCodec> {}
transportMode: "NIO"

#https://github.com/redisson/redisson/issues/4356
addressResolverGroupFactory: !<org.company.redis.CustomDnsAddressResolverGroupFactory> {}

And the addressResolver:

/**
 * Required to fix https://github.com/redisson/redisson/issues/4356
 * @author lbornov2
 *
 */
public class CustomDnsAddressResolverGroupFactory implements AddressResolverGroupFactory {
    
    @Override
    public DnsAddressResolverGroup create(Class<? extends DatagramChannel> channelType,
            DnsServerAddressStreamProvider nameServerProvider) {
        final DnsAddressResolverGroup group = new DnsAddressResolverGroup(new DnsNameResolverBuilder()
                .channelType(NioDatagramChannel.class)
                .authoritativeDnsServerCache(NoopAuthoritativeDnsServerCache.INSTANCE)
                .nameServerProvider(DnsServerAddressStreamProviders.platformDefault()));
        return group;
    }

}

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 35 (17 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks for the logs. Please try a new build below.

redisson-3.17.8-SNAPSHOT.jar.zip

We use 3.18.0 release, initial problem has been fixed: Redisson manages to reconnect. But, recently we saw that one of our services which uses Redisson client was polluting the log after the reconnect with next exceptions:

ERROR 1 [] [redisson-timer-4-1] org.redisson.connection.balancer.LoadBalancerManager : Unable to unfreeze entry: [freeSubscribeConnectionsAmount=0, freeSubscribeConnectionsCounter=value:85:queue:0, freeConnectionsAmount=0, freeConnectionsCounter=value:134:queue:0, freezeReason=SYSTEM, client=[addr=redis://10.83.19.20:6379], nodeType=SLAVE, firstFail=1669808100722]

 java.util.concurrent.CompletionException: org.redisson.client.RedisConnectionException: Unable to init enough connections amount! Only 0 of 4 were initialized. Server: 10.83.19.20/10.83.19.20:6379
	at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
	at java.base/java.util.concurrent.CompletableFuture.andTree(CompletableFuture.java:1452)
	at java.base/java.util.concurrent.CompletableFuture.allOf(CompletableFuture.java:2337)
	at org.redisson.connection.balancer.LoadBalancerManager.unfreeze(LoadBalancerManager.java:160)
	at org.redisson.connection.balancer.LoadBalancerManager.lambda$unfreeze$2(LoadBalancerManager.java:166)
	at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:669)
	at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:744)
	at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:469)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.redisson.client.RedisConnectionException: Unable to init enough connections amount! Only 0 of 4 were initialized. Server: 10.83.19.20/10.83.19.20:6379
	at org.redisson.connection.pool.ConnectionPool.createConnection(ConnectionPool.java:104)
	at org.redisson.connection.pool.ConnectionPool.initConnections(ConnectionPool.java:92)
	at org.redisson.connection.pool.ConnectionPool.initConnections(ConnectionPool.java:77)
	at org.redisson.connection.pool.SlaveConnectionPool.initConnections(SlaveConnectionPool.java:30)
	at org.redisson.connection.balancer.LoadBalancerManager.unfreeze(LoadBalancerManager.java:157)
	... 6 common frames omitted

10.83.19.20 - old redis IP, which no longer exists after redis has been restarted by kubernetes. Unfortunately, I have no exact steps to reproduce as it happened only once and in only one our service (all others have been reconnected as well and didn’t throw these exceptions).