netty: OOM due to SSL key materials cached every time when there is new connection when using OpenSslCachingX509KeyManagerFactory
Expected behavior
Recently, we found the OOM issue when switching from JDK Ssl to OpenSsl in netty.
We’re using OpenSslCachingX509KeyManagerFactory explicity, so Netty will use OpenSslCachingKeyMaterialProvide to cache and reduce the overhead of parsing the chain and the key for generation of the material.
We expect to see performance optimization but shouldn’t see OOM issue.
Actual behavior
But with stress test for TLS connection, we saw the memory linearly increasing and eventually OOM.
After debugging into the Netty and OpenJDK ssl code, we found the problem is that every time when there is a new connection, handshake cert selection callback OpenSslClientCertificateCallback is called and it will try to find the alias key materials from the cache, if it doesn’t exist it will try to find the match alias from server cert chain, which created a new alias in format of seq_id.builderIndex.keyStoreAlias, like 924450.0.key. And it will parse the chain and key, put into the cache with the new alias, and this retained the refCnt of the key material and prevented the native memory being destroyed, that’s why we eventually saw the OOM issue.
Changing to use OpenSslX509KeyManagerFactory solved this problem.
Steps to reproduce
Using OpenSslCachingX509KeyManagerFactory to set up the SSLContext, and keep issuing Issuing lots of TLS connection requests.
Minimal yet complete reproducer code (or URL to code)
Netty version
We’re using 4.1.36.Final.
JVM version (e.g. java -version)
java version “10.0.1” 2018-04-17 Java™ SE Runtime Environment 18.3 (build 10.0.1+10) Java HotSpot™ 64-Bit Server VM 18.3 (build 10.0.1+10, mixed mode)
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 23 (13 by maintainers)
Commits related to this issue
- At the moment the cache provided by OpenSslCachingKeyMaterialProvider is not bound. We should support an upper limit Motivation: At the moment te cache is not bound and so lead to huge memory consum... — committed to netty/netty by normanmaurer 5 years ago
- At the moment the cache provided by OpenSslCachingKeyMaterialProvider… (#9759) Motivation: At the moment te cache is not bound and so lead to huge memory consumpation. We should ensure its bound... — committed to netty/netty by normanmaurer 5 years ago
- Don't cache key material if sun.security.ssl.X509KeyManagerImpl is used Motivation: sun.security.ssl.X509KeyManagerImpl will not use "stable" aliases and so aliases may be changed during invocations... — committed to netty/netty by normanmaurer 5 years ago
- Don't cache key material if sun.security.ssl.X509KeyManagerImpl is used (#9762) Motivation: sun.security.ssl.X509KeyManagerImpl will not use "stable" aliases and so aliases may be changed during i... — committed to netty/netty by normanmaurer 5 years ago
- At the moment the cache provided by OpenSslCachingKeyMaterialProvider… (#9759) Motivation: At the moment te cache is not bound and so lead to huge memory consumpation. We should ensure its bound by ... — committed to netty/netty by normanmaurer 5 years ago
- Don't cache key material if sun.security.ssl.X509KeyManagerImpl is used (#9762) Motivation: sun.security.ssl.X509KeyManagerImpl will not use "stable" aliases and so aliases may be changed during i... — committed to netty/netty by normanmaurer 5 years ago
@lvfangmin after some debugging I think I also know why people usually not see this problem.
By default we use “SunX509” as algorithm when creating the KeyManagerFactory. When this is used the JDK uses SunX509KeyManagerImpl. This one uses “stable” aliases and so the caching works as expected. You specify another algorithm and so it ends up using
X509KeyManagerImplwhich does not provide stable aliases.So to fix this I think we should do two things:
X509KeyManagerImplis used we should not cache if not explicit told soWDYT ?
@lvfangmin thanks will have a look
@lvfangmin no worries… thanks for reporting 😃
@lvfangmin please check https://github.com/netty/netty/pull/9762…
yes
OpenSslCachingX509KeyManagerFactoryis a bit faster as it can cache the generated “native” key material while forOpenSslX509KeyManagerFactorywe will need to re-generate it on each invocation.