netty: OOM due to SSL key materials cached every time when there is new connection when using OpenSslCachingX509KeyManagerFactory

Expected behavior

Recently, we found the OOM issue when switching from JDK Ssl to OpenSsl in netty.

We’re using OpenSslCachingX509KeyManagerFactory explicity, so Netty will use OpenSslCachingKeyMaterialProvide to cache and reduce the overhead of parsing the chain and the key for generation of the material.

We expect to see performance optimization but shouldn’t see OOM issue.

Actual behavior

But with stress test for TLS connection, we saw the memory linearly increasing and eventually OOM.

After debugging into the Netty and OpenJDK ssl code, we found the problem is that every time when there is a new connection, handshake cert selection callback OpenSslClientCertificateCallback is called and it will try to find the alias key materials from the cache, if it doesn’t exist it will try to find the match alias from server cert chain, which created a new alias in format of seq_id.builderIndex.keyStoreAlias, like 924450.0.key. And it will parse the chain and key, put into the cache with the new alias, and this retained the refCnt of the key material and prevented the native memory being destroyed, that’s why we eventually saw the OOM issue.

Changing to use OpenSslX509KeyManagerFactory solved this problem.

Steps to reproduce

Using OpenSslCachingX509KeyManagerFactory to set up the SSLContext, and keep issuing Issuing lots of TLS connection requests.

Minimal yet complete reproducer code (or URL to code)

Netty version

We’re using 4.1.36.Final.

JVM version (e.g. java -version)

java version “10.0.1” 2018-04-17 Java™ SE Runtime Environment 18.3 (build 10.0.1+10) Java HotSpot™ 64-Bit Server VM 18.3 (build 10.0.1+10, mixed mode)

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 23 (13 by maintainers)

Commits related to this issue

Most upvoted comments

@lvfangmin after some debugging I think I also know why people usually not see this problem.

By default we use “SunX509” as algorithm when creating the KeyManagerFactory. When this is used the JDK uses SunX509KeyManagerImpl. This one uses “stable” aliases and so the caching works as expected. You specify another algorithm and so it ends up using X509KeyManagerImpl which does not provide stable aliases.

So to fix this I think we should do two things:

  • If X509KeyManagerImpl is used we should not cache if not explicit told so
  • ensure the cache can not grow without bounds…

WDYT ?

@lvfangmin thanks will have a look

@lvfangmin no worries… thanks for reporting 😃

@lvfangmin please check https://github.com/netty/netty/pull/9762

yes OpenSslCachingX509KeyManagerFactory is a bit faster as it can cache the generated “native” key material while for OpenSslX509KeyManagerFactory we will need to re-generate it on each invocation.