lettuce-core: Getting java.lang.OutOfMemoryError: Direct buffer memory

Lettuce version: 5.0.2.RELEASE Reproducible in Linux (Kubernetes), Windows (my local machine), likely everywhere

I’ve started testing Redis cluster in Kubernetes. So far – not bad, all failover scenarios worked fine, but there was one big problem – memory leaks. It was not evident for me at first (because it was direct memory leak and I was looking at heap charts), but I think I tackled two cases.

First one is topology refresh. I have single node redis cluster (redis-cluster) in docker compose for local testing. With this options:

ClusterTopologyRefreshOptions.builder()
    .enablePeriodicRefresh(Duration.ofSeconds(2)) // anything will do, but small value will lead to exception faster
    .enableAllAdaptiveRefreshTriggers()
    .build()

And small direct memory size, e.g. -XX:MaxDirectMemorySize=100M or 200M, I can get OOM exception in 1-2 minutes. Exception looks like this:

2018-02-17 13:33:17.243 [WARN] [lettuce-eventExecutorLoop-63-8] [i.l.c.c.t.ClusterTopologyRefresh] - Cannot retrieve partition view from RedisURI [host='redis-cluster', port=7000], error: java.util.concurrent.ExecutionException: java.lang.NullPointerException
2018-02-17 13:33:17.243 [WARN] [lettuce-nioEventLoop-65-3] [i.l.c.p.CommandHandler] - null Unexpected exception during request: java.lang.NullPointerException
java.lang.NullPointerException: null
	at io.lettuce.core.protocol.CommandHandler.channelRead(CommandHandler.java:500) ~[lettuce-core-5.0.2.RELEASE.jar:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1414) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:945) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:141) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886) [netty-common-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.21.Final.jar:4.1.21.Final]
	at java.lang.Thread.run(Thread.java:844) [?:?]

Looks like Netty is out of direct memory.

Second case I’m less sure, I did not do extensive testing, but I think they are connected. I have 7 redis node cluster in our Kubernetes environment. We killed one master to see if it will failover. It did, topology refreshed, everything seemed OK. But in background Lettuce kept pinging/trying to connect to dead node (only seen when turned on Lettuce debug log), and direct memory quickly dried up and node died.

Any thoughts?

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 42 (13 by maintainers)

Commits related to this issue

Most upvoted comments

I think we found the cause. The OOME looks related to a connection leak reported in #721.

@mp911de Tried my test in https://github.com/vleushin/lettuce-oom . Worked without problems.

Tried with 5.0.2. Indeed, connection count to Redis was growing. With 5.0.3 it was stable.