redisson: Unable to unfreeze entry when redis goes down in kuberentes (bitnami/redis)
Expected behavior
If the redis sentinel master goes down for a short amount of time, and comes back up in less than down-after-milliseconds
(i.e. a new master is not picked), redisson should continue to use the master
Actual behavior The pod continuously streams the following output:
2022-10-06 16:14:29.868 ERROR 1 --- [isson-netty-2-3] o.r.c.balancer.LoadBalancerManager : Unable to unfreeze entry: [freeSubscribeConnectionsAmount=0, freeSubscribeConnectionsCounter=value:87:queue:0, freeConnectionsAmount=0, freeConnectionsCounter=value:138:queue:0, freezeReason=SYSTEM, client=[addr=redis://10.0.3.173:6379], nodeType=SLAVE, firstFail=1665072869858]
java.util.concurrent.CompletionException: org.redisson.client.RedisConnectionException: Unable to connect to Redis server: 10.0.3.173/10.0.3.173:6379
at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
at java.base/java.util.concurrent.CompletableFuture$BiRelay.tryFire(CompletableFuture.java:1423)
at java.base/java.util.concurrent.CompletableFuture$CoCompletion.tryFire(CompletableFuture.java:1144)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at org.redisson.connection.pool.ConnectionPool.lambda$createConnection$1(ConnectionPool.java:154)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at org.redisson.connection.pool.ConnectionPool.promiseFailure(ConnectionPool.java:317)
at org.redisson.connection.pool.ConnectionPool.lambda$createConnection$7(ConnectionPool.java:276)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at org.redisson.client.RedisClient$2$2.run(RedisClient.java:307)
at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.redisson.client.RedisConnectionException: Unable to connect to Redis server: 10.0.3.173/10.0.3.173:6379
at org.redisson.connection.pool.ConnectionPool.lambda$createConnection$1(ConnectionPool.java:153)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at org.redisson.connection.pool.ConnectionPool.promiseFailure(ConnectionPool.java:317)
at org.redisson.connection.pool.ConnectionPool.lambda$createConnection$7(ConnectionPool.java:276)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at org.redisson.client.RedisClient$1$2.run(RedisClient.java:250)
... 8 common frames omitted
Caused by: java.util.concurrent.CompletionException: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: Host is unreachable: 10.0.3.173/10.0.3.173:6379
at java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367)
at java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376)
at java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019)
... 11 common frames omitted
Caused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: Host is unreachable: 10.0.3.173/10.0.3.173:6379
Caused by: java.net.NoRouteToHostException: Host is unreachable
at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
And:
Caused by: org.redisson.client.RedisConnectionException: Unable to connect to Redis server: 10.0.3.173/10.0.3.173:6379
at org.redisson.connection.pool.ConnectionPool.lambda$createConnection$1(ConnectionPool.java:153)
... 19 common frames omitted
Caused by: java.util.concurrent.CompletionException: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: Host is unreachable: 10.0.3.173/10.0.3.173:6379
at java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367)
at java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376)
at java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019)
... 11 common frames omitted
Caused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: Host is unreachable: 10.0.3.173/10.0.3.173:6379
Caused by: java.net.NoRouteToHostException: Host is unreachable
at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
Steps to reproduce or test case
- Install redis in sentinel configuration using the latest bitnami helm chart
helm install "${APP_NAME}-redis" bitnami/redis \
--version "17.3.1" \
--set sentinel.enabled=true \
--set auth.password="${REDIS_PASSWORD}" \
--set replica.replicaCount=3 \
--set global.redis.password="${REDIS_PASSWORD}"
- Now, you have 3 redis nods running:
kubectl get pods | grep redis
test2021-redis-node-0 2/2 Running 0 14m
test2021-redis-node-1 2/2 Running 0 14m
test2021-redis-node-2 2/2 Running 0 13m
- Some information about the sentinel master:
kubectl exec -it test2021-redis-node-1 bash -c sentinel
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
I have no name!@test2021-redis-node-1:/$ redis-cli -a $REDIS_PASSWORD -p 26379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:26379> sentinel master mymaster
1) "name"
2) "mymaster"
3) "ip"
4) "test2021-redis-node-0.test2021-redis-headless.default.svc.cluster.local"
...
21) "down-after-milliseconds"
22) "60000"
....
32) "failover-timeout"
33) 180000
....
- As you can see, node 0 is the master, so let’s kill it:
kubectl delete pods/test2021-redis-node-0 --force --grace-period=0
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "test2021-redis-node-0" force deleted
-
Node 0 stays as the master, since it was down for ~10 seconds, which is less than the value of
down-after-milliseconds
, which is set to 60s. So the output of step 3 is still the same. -
The application server continuously streams the exceptions which I sent, and keeps streaming them long after (I monitored the logs for ~5 minutes) the master came back up. The pod is not useable until it restarts (via failed healthcheck or manually, at that point it doesn’t matter).
Redis version 7.0.5
Redisson version 3.17.7
Redisson configuration
---
sentinelServersConfig:
idleConnectionTimeout: 10000
connectTimeout: 10000
timeout: 5000
retryAttempts: 3
retryInterval: 1500
failedSlaveReconnectionInterval: 3000
failedSlaveCheckInterval: 60000
subscriptionsPerConnection: 5
clientName: null
loadBalancer: !<org.redisson.connection.balancer.RoundRobinLoadBalancer> {}
subscriptionConnectionMinimumIdleSize: 1
subscriptionConnectionPoolSize: 50
slaveConnectionMinimumIdleSize: 24
slaveConnectionPoolSize: 64
masterConnectionMinimumIdleSize: 24
masterConnectionPoolSize: 64
readMode: "SLAVE"
subscriptionMode: "SLAVE"
password: "whatever"
sentinelAddresses:
- "redis://test2021-redis-headless:26379"
masterName: "mymaster"
keepAlive: true
database: 0
checkSentinelsList: false
eventLoopGroup: null
threads: 16
nettyThreads: 32
codec: !<org.redisson.codec.SnappyCodec> {}
transportMode: "NIO"
#https://github.com/redisson/redisson/issues/4356
addressResolverGroupFactory: !<org.company.redis.CustomDnsAddressResolverGroupFactory> {}
And the addressResolver:
/**
* Required to fix https://github.com/redisson/redisson/issues/4356
* @author lbornov2
*
*/
public class CustomDnsAddressResolverGroupFactory implements AddressResolverGroupFactory {
@Override
public DnsAddressResolverGroup create(Class<? extends DatagramChannel> channelType,
DnsServerAddressStreamProvider nameServerProvider) {
final DnsAddressResolverGroup group = new DnsAddressResolverGroup(new DnsNameResolverBuilder()
.channelType(NioDatagramChannel.class)
.authoritativeDnsServerCache(NoopAuthoritativeDnsServerCache.INSTANCE)
.nameServerProvider(DnsServerAddressStreamProviders.platformDefault()));
return group;
}
}
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 35 (17 by maintainers)
Commits related to this issue
- Fixed - continues "Unable to unfreeze entry" in sentinel mode. #4584 — committed to redisson/redisson by deleted user 2 years ago
Thanks for the logs. Please try a new build below.
redisson-3.17.8-SNAPSHOT.jar.zip
We use 3.18.0 release, initial problem has been fixed: Redisson manages to reconnect. But, recently we saw that one of our services which uses Redisson client was polluting the log after the reconnect with next exceptions:
10.83.19.20 - old redis IP, which no longer exists after redis has been restarted by kubernetes. Unfortunately, I have no exact steps to reproduce as it happened only once and in only one our service (all others have been reconnected as well and didn’t throw these exceptions).