quarkus: Redis cache: Failed to connect to all nodes of the cluster

Describe the bug

The Redis cache implementation fails to work in cluster mode. Under load, it will thrown an error: Failed to connect to all nodes of the cluster. This can be found in the vert.x redis client here: https://github.com/vert-x3/vertx-redis-client/blob/4.4.6/src/main/java/io/vertx/redis/client/impl/RedisClusterClient.java#L202

Expected behavior

Cluster mode works.

Actual behavior

Cluster mode throws an exception without stacktrace.

ERROR [io.qua.ver.htt.run.QuarkusErrorHandler] (vert.x-eventloop-thread-2) HTTP Request to /cat-fact failed, error id: 1cd9b164-ba80-4118-b675-ff5cfdfd93eb-869: io.vertx.core.impl.NoStackTraceThrowable: Failed to connect to all nodes of the cluster

Setting quarkus.redis.max-pool-size to a higher value seems to postpone the error, but it will still eventually fail.

How to Reproduce?

Clone repo: https://github.com/bartm-dvb/quarkus-redis-bug
Start Redis in cluster mode with docker-compose up
./mvnw quarkus:dev
Start load test with jMeter, configuration file is in the repository.

After about 30 seconds of load testing, you should see

ERROR [io.qua.ver.htt.run.QuarkusErrorHandler] (vert.x-eventloop-thread-2) HTTP Request to /cat-fact failed, error id: 1cd9b164-ba80-4118-b675-ff5cfdfd93eb-869: io.vertx.core.impl.NoStackTraceThrowable: Failed to connect to all nodes of the cluster

Output of `uname -a` or `ver`

No response

Output of `java -version`

openjdk version “17.0.7” 2023-04-18 OpenJDK Runtime Environment Temurin-17.0.7+7 (build 17.0.7+7) OpenJDK 64-Bit Server VM Temurin-17.0.7+7 (build 17.0.7+7, mixed mode, sharing)

Quarkus version or git rev

3.5.1

Build tool (ie. output of `mvnw --version` or `gradlew --version`)

Apache Maven 3.9.3 (21122926829f1ead511c958d89bd2f672198ae9f)

Additional information

No response

About this issue

Original URL
State: closed
Created 8 months ago
Reactions: 1
Comments: 19 (15 by maintainers)

Most upvoted comments

Yes, the Vert.x Redis client intentionally doesn’t implement reconnect on error, see https://vertx.io/docs/vertx-redis-client/java/#_implementing_reconnect_on_error We should probably implement something like that in Quarkus. Please file a feature request.

Ladicek on Nov 27, 2023

I think there is an issue with reconnects here, let me know if i should file a new issue:

// manual reproduction that consistently reproduces the failure to reconnect:

Start quarkus. Start Redis (docker compose up using the docker-compose file provided in this issue). Access http://localhost:8080/cat-fact
a) stop redis (docker compose down)
b) access cat-fact again -> see the following error in the quarkus log: 2023-11-17 22:27:36,551 ERROR [io.qua.ver.htt.run.QuarkusErrorHandler] (vert.x-eventloop-thread-1) HTTP Request to /cat-fact failed, error id: 736e1d37-7d84-43fb-8e37-2d4d34f4eda6-11: io.vertx.core.impl.NoStackTraceThrowable: Cannot connect to any of the provided endpoints"
Start redis. Access cat-fact - see above error repeat over and over until quarkus is restarted. Note if 2(a) and (3) is done without 2(b) the error never occurs.

My theory is that when all endpoints of the cluster are down, the slots / endpoints being saved are incorrect and getSlots is always called with index >= endpoints.size.

sfali16 on Nov 25, 2023

PRs to Vert.x:

PR to Quarkus:

https://github.com/quarkusio/quarkus/pull/37267

I can’t see anything else we could do here.

Ladicek on Nov 22, 2023

We got a report for something similar when there is a DNS issue. It looks like the connections are not released after an error. I believe the issue is not in Quarkus but in the Vert.x Redis client. @Ladicek should know more, as he recently looked at this code.

I think that investigating what happens when a failure happens in the Vert.x redis client code would be a first great step. There should be exceptionHandlers and I suspect that they are not releasing the connection.

The Vert.x redis client code is in https://github.com/vert-x3/vertx-redis-client/tree/4.4. Select the 4.4 branch - it’s the one used in Quarkus (a forward port should be possible once we find the issue). Build it using mvn clean install -DskipTests. Then override the version in your project, just add the dependency:

<dependency>
  <groupId>io.vertx</groupId>
  <artifactId>vertx-redis-client</artifactId>
  <version>4.4.6-SNAPSHOT</version> <!-- verify it's what you built -->

cescoffier on Nov 16, 2023