lettuce-core: Randomly getting "RedisCommandTimeoutException: Command timed out"
Bug Report
Current Behavior
We’re using Redis 3.2.5 with Spring Data Redis as a client. As part of the upgrade to Spring Boot 2, we switched from Jedis to Lettuce 5.0.4.RELEASE. Ever since the move we’ve been getting the following exceptions quite often, but at random times:
org.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out
at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:70) ~[spring-data-redis-2.0.7.RELEASE.jar!/:2.0.7.RELEASE]
at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:41) ~[spring-data-redis-2.0.7.RELEASE.jar!/:2.0.7.RELEASE]
at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44) ~[spring-data-redis-2.0.7.RELEASE.jar!/:2.0.7.RELEASE]
at org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:42) ~[spring-data-redis-2.0.7.RELEASE.jar!/:2.0.7.RELEASE]
at org.springframework.data.redis.connection.lettuce.LettuceConnection.convertLettuceAccessException(LettuceConnection.java:257) ~[spring-data-redis-2.0.7.RELEASE.jar!/:2.0.7.RELEASE]
at org.springframework.data.redis.connection.lettuce.LettuceHashCommands.convertLettuceAccessException(LettuceHashCommands.java:445) ~[spring-data-redis-2.0.7.RELEASE.jar!/:2.0.7.RELEASE]
at org.springframework.data.redis.connection.lettuce.LettuceHashCommands.hMGet(LettuceHashCommands.java:321) ~[spring-data-redis-2.0.7.RELEASE.jar!/:2.0.7.RELEASE]
at org.springframework.data.redis.connection.DefaultedRedisConnection.hMGet(DefaultedRedisConnection.java:895) ~[spring-data-redis-2.0.7.RELEASE.jar!/:2.0.7.RELEASE]
at org.springframework.data.redis.connection.DefaultStringRedisConnection.hMGet(DefaultStringRedisConnection.java:483) ~[spring-data-redis-2.0.7.RELEASE.jar!/:2.0.7.RELEASE]
at org.springframework.data.redis.core.DefaultHashOperations.lambda$multiGet$7(DefaultHashOperations.java:161) ~[spring-data-redis-2.0.7.RELEASE.jar!/:2.0.7.RELEASE]
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:224) ~[spring-data-redis-2.0.7.RELEASE.jar!/:2.0.7.RELEASE]
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:184) ~[spring-data-redis-2.0.7.RELEASE.jar!/:2.0.7.RELEASE]
at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:95) ~[spring-data-redis-2.0.7.RELEASE.jar!/:2.0.7.RELEASE]
...
Caused by: io.lettuce.core.RedisCommandTimeoutException: Command timed out
at io.lettuce.core.LettuceFutures.awaitOrCancel(LettuceFutures.java:114) ~[lettuce-core-5.0.4.RELEASE.jar!/:na]
at io.lettuce.core.FutureSyncInvocationHandler.handleInvocation(FutureSyncInvocationHandler.java:62) ~[lettuce-core-5.0.4.RELEASE.jar!/:na]
at io.lettuce.core.internal.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:80) ~[lettuce-core-5.0.4.RELEASE.jar!/:na]
at com.sun.proxy.$Proxy152.hmget(Unknown Source) ~[na:na]
at org.springframework.data.redis.connection.lettuce.LettuceHashCommands.hMGet(LettuceHashCommands.java:319) ~[spring-data-redis-2.0.7.RELEASE.jar!/:2.0.7.RELEASE]
Everything may work perfectly for a few days, and then we would see this. I should note that the application workload on Redis is very low, to the point where we might not even make a single call to Redis for a few hours.
We tried reproducing this issue in several ways but failed. We verified that there are no network issues connecting to Redis, and that Redis is always up. We also looked for similar issues - the only one which seemed relevant is https://github.com/lettuce-io/lettuce-core/issues/296, but that was resolved a long time ago.
Expected behavior/code
No timeouts should occur in our setting.
Environment
- Lettuce version(s): 5.0.4.RELEASE
- Redis version: 3.2.5
- Spring Boot version: 2.0.3.RELEASE
- Spring Data Redis version: 2.0.7.RELEASE
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 4
- Comments: 25 (9 by maintainers)
To be clear here. I do not think there is any issue with the Lettuce. If application for some reason or another causes massive CPU load and goes to halt, Lettuce is not to blame to not able to handle this.
@mp911de, I have no motivation to open a new ticket. I have no questions on the matter. For me this is the most logical place to add my experience/solutions to the issue, as I imagine others end up here googling with the exception message and maybe my comments will be of value for them. If my logic is wrong here, please explain a bit more why opening a new ticket is preferred. Thanks!
@Arch-vile , thank for the detailed info! We were trying everything to reproduce this issue but couldn’t - it happened only in production. It’s great that you found out how to reproduce it. We ended up going back to Jedis due to this issue. @mp911de, @Arch-vile’s answer here is valuable and in context. I expected you to thank him for his research and reopen the issue, as there’s clearly a real problem here.
I was facing the similar issues as @MatanRubin and ended up here looking for solution. I was unable to find any solution to the issue, so now that I finally somewhat figured it out, I thought to document it here for others facing the issue and possibly because of the same reason I had.
Latest spring-boot versions has changes on password handling. We used the BCrypt to hash the passwords (the recommended approach?) which is by design a CPU heavy operation. Our application was an REST API and we did basic authentication on every request. Thus CPU load increased significantly event with moderate API traffic.
I was able to reproduce the issue by running the application locally and limiting the available CPU with
cpulimit
. Trying different combinations of generated load, limited cpu and simultaneous CPU heavy calculation threads (I randomly spawned new threads that did endless loops of calculations) eventually led to a combination that semi-reliably reproduced the issue.I assume the cause of the
Command timed out
exceptions in this case were that there just wasn’t available CPU resources to send/receive the response in given time.So to “fix” this issue, we need to decrease CPU consumption or add more. As a first remedy we just used a lookup table for the already matched password hashes (implemented by overriding the BCryptPasswordEncoder’s
matches(rawPassword, encodedPassword)
method).Related links:
Spring-boot version: 2.1.4 (with spring-boot-starter-data-redis-reactive dependency).
Has anyone solved the problem?
getting same issue - Caused by: io.lettuce.core.RedisCommandTimeoutException: Command timed out after 1 minute(s) at io.lettuce.core.ExceptionFactory.createTimeoutException(ExceptionFactory.java:51) at io.lettuce.core.protocol.CommandExpiryWriter.lambda$potentiallyExpire$0(CommandExpiryWriter.java:167) at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) at io.netty.util.concurrent.DefaultEventExecutor.run(DefaultEventExecutor.java:66) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) Disconnected from the target VM, address: ‘127.0.0.1:55714’, transport: ‘socket’
io.lettuce.core.protocol ; debug logs
[2020-09-21 21:12:13,542] [INFO] [main] [] [] [] [] [i.l.c.KqueueProvider] 70 | Starting without optional kqueue library [2020-09-21 21:12:13,684] [DEBUG] [lettuce-nioEventLoop-4-1] [] [] [] [] [i.l.c.p.CommandHandler] 162 | [channel=0x3d304c0d, [id: 0xe0ae678d] (inactive), chid=0x1] channelRegistered() [2020-09-21 21:12:13,696] [DEBUG] [lettuce-nioEventLoop-4-1] [] [] [] [] [i.l.c.p.CommandHandler] 271 | [channel=0x3d304c0d, /127.0.0.1:55791 -> localhost/127.0.0.1:6389, chid=0x1] channelActive() [2020-09-21 21:12:13,696] [DEBUG] [lettuce-nioEventLoop-4-1] [] [] [] [] [i.l.c.p.DefaultEndpoint] 371 | [channel=0x3d304c0d, /127.0.0.1:55791 -> localhost/127.0.0.1:6389, epid=0x1] activateEndpointAndExecuteBufferedCommands 0 command(s) buffered [2020-09-21 21:12:13,696] [DEBUG] [lettuce-nioEventLoop-4-1] [] [] [] [] [i.l.c.p.DefaultEndpoint] 376 | [channel=0x3d304c0d, /127.0.0.1:55791 -> localhost/127.0.0.1:6389, epid=0x1] activating endpoint [2020-09-21 21:12:13,696] [DEBUG] [lettuce-nioEventLoop-4-1] [] [] [] [] [i.l.c.p.DefaultEndpoint] 453 | [channel=0x3d304c0d, /127.0.0.1:55791 -> localhost/127.0.0.1:6389, epid=0x1] flushCommands() [2020-09-21 21:12:13,697] [DEBUG] [lettuce-nioEventLoop-4-1] [] [] [] [] [i.l.c.p.DefaultEndpoint] 468 | [channel=0x3d304c0d, /127.0.0.1:55791 -> localhost/127.0.0.1:6389, epid=0x1] flushCommands() Flushing 0 commands [2020-09-21 21:12:13,697] [DEBUG] [lettuce-nioEventLoop-4-1] [] [] [] [] [i.l.c.p.ConnectionWatchdog] 166 | [channel=0x3d304c0d, /127.0.0.1:55791 -> localhost/127.0.0.1:6389, last known addr=localhost/127.0.0.1:6389] channelActive() [2020-09-21 21:12:13,697] [DEBUG] [lettuce-nioEventLoop-4-1] [] [] [] [] [i.l.c.p.CommandHandler] 285 | [channel=0x3d304c0d, /127.0.0.1:55791 -> localhost/127.0.0.1:6389, chid=0x1] channelActive() done [2020-09-21 21:12:16,005] [DEBUG] [lettuce-nioEventLoop-4-1] [] [] [] [] [i.l.c.p.ConnectionWatchdog] 134 | [channel=0x3d304c0d, /127.0.0.1:55791 -> localhost/127.0.0.1:6389, last known addr=localhost/127.0.0.1:6389] userEventTriggered(ctx, io.lettuce.core.ConnectionEvents$Activated@16e828a7) [2020-09-21 21:12:16,016] [DEBUG] [main] [] [] [] [] [i.l.c.p.DefaultEndpoint] 339 | [channel=0x3d304c0d, /127.0.0.1:55791 -> localhost/127.0.0.1:6389, epid=0x1] write() writeAndFlush command TracedCommand [type=CONFIG, output=MapOutput [output=null, error=‘null’], commandType=io.lettuce.core.protocol.AsyncCommand] [2020-09-21 21:12:16,022] [DEBUG] [lettuce-nioEventLoop-4-1] [] [] [] [] [i.l.c.p.CommandHandler] 349 | [channel=0x3d304c0d, /127.0.0.1:55791 -> localhost/127.0.0.1:6389, chid=0x1] write(ctx, TracedCommand [type=CONFIG, output=MapOutput [output=null, error=‘null’], commandType=io.lettuce.core.protocol.AsyncCommand], promise) [2020-09-21 21:12:16,023] [DEBUG] [main] [] [] [] [] [i.l.c.p.DefaultEndpoint] 153 | [channel=0x3d304c0d, /127.0.0.1:55791 -> localhost/127.0.0.1:6389, epid=0x1] write() done