netty: JNI classloader deadlock with latest JDK version

Expected behavior

No classloader deadlocks

Actual behavior

I am still investigating but there appears to be a possible class loading concurrency deadlock when using the epoll transport - it started when upgrading from OpenJDK 11.0.10 to 11.0.11.

It looks to be related to JNI reentrancy where the netty_epoll_native_JNI_OnLoad func, which happens in the context of static initialization of the Epoll/Native classes, calls back into java to initialize additional classes. The deadlock happens between this and another (non-native) thread in the process of loading other classes.

See stacks of deadlocked threads below.

Seems likely it’s a new JDK bug. I noticed this other recent netty issue/fix which is kind of related, but this appears to be more problematic because the second thread is loading an unrelated class: https://github.com/netty/netty/issues/10187

I was able to workaround by forcing the native classes to load upfront.

Steps to reproduce

Load Epoll at the same time as other classes are loading in another thread (I guess it might similarly apply to the other native modules e.g. OpenSsl).

Minimal yet complete reproducer code (or URL to code)

Don’t have reproducer but it happens pretty reliably (my testsuite failed every time).

Netty version

4.1.63.Final

JVM version (e.g. java -version)

openjdk version "11.0.11" 2021-04-20 LTS
OpenJDK Runtime Environment 18.9 (build 11.0.11+9-LTS)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.11+9-LTS, mixed mode, sharing)

OS version (e.g. uname -a)

Linux b7992f1e6991 4.19.121-linuxkit #1 SMP Tue Dec 1 17:50:32 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Stacktraces

[INFO] "main":
[INFO]  at java.util.zip.ZipFile.getEntry(java.base@11.0.11/ZipFile.java:346)
[INFO]  - waiting to lock <0x00000000e119d0a8> (a java.util.jar.JarFile)
[INFO]  at java.util.zip.ZipFile$1.getEntry(java.base@11.0.11/ZipFile.java:1126)
[INFO]  at java.util.jar.JarFile.getEntry0(java.base@11.0.11/JarFile.java:578)
[INFO]  at java.util.jar.JarFile.getEntry(java.base@11.0.11/JarFile.java:508)
[INFO]  at java.util.jar.JarFile.getJarEntry(java.base@11.0.11/JarFile.java:470)
[INFO]  at jdk.internal.loader.URLClassPath$JarLoader.getResource(java.base@11.0.11/URLClassPath.java:929)
[INFO]  at jdk.internal.loader.URLClassPath.getResource(java.base@11.0.11/URLClassPath.java:314)
[INFO]  at jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(java.base@11.0.11/BuiltinClassLoader.java:695)
[INFO]  at jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(java.base@11.0.11/BuiltinClassLoader.java:621)
[INFO]  - locked <0x00000000e2006a18> (a java.lang.Object)
[INFO]  at jdk.internal.loader.BuiltinClassLoader.loadClass(java.base@11.0.11/BuiltinClassLoader.java:579)
[INFO]  at jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(java.base@11.0.11/ClassLoaders.java:178)
[INFO]  at java.lang.ClassLoader.loadClass(java.base@11.0.11/ClassLoader.java:522)
[INFO]  at java.lang.ClassLoader$NativeLibrary.load0(java.base@11.0.11/Native Method)
[INFO]  at java.lang.ClassLoader$NativeLibrary.load(java.base@11.0.11/ClassLoader.java:2442)
[INFO]  at java.lang.ClassLoader$NativeLibrary.loadLibrary(java.base@11.0.11/ClassLoader.java:2498)
[INFO]  - locked <0x00000000e11d6ad0> (a java.util.HashSet)
[INFO]  at java.lang.ClassLoader.loadLibrary0(java.base@11.0.11/ClassLoader.java:2694)
[INFO]  at java.lang.ClassLoader.loadLibrary(java.base@11.0.11/ClassLoader.java:2627)
[INFO]  at java.lang.Runtime.load0(java.base@11.0.11/Runtime.java:768)
[INFO]  at java.lang.System.load(java.base@11.0.11/System.java:1837)
[INFO]  at io.netty.util.internal.NativeLibraryUtil.loadLibrary(NativeLibraryUtil.java:36)
[INFO]  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.11/Native Method)
[INFO]  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.11/NativeMethodAccessorImpl.java:62)
[INFO]  at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.11/DelegatingMethodAccessorImpl.java:43)
[INFO]  at java.lang.reflect.Method.invoke(java.base@11.0.11/Method.java:566)
[INFO]  at io.netty.util.internal.NativeLibraryLoader$1.run(NativeLibraryLoader.java:378)
[INFO]  at java.security.AccessController.doPrivileged(java.base@11.0.11/Native Method)
[INFO]  at io.netty.util.internal.NativeLibraryLoader.loadLibraryByHelper(NativeLibraryLoader.java:370)
[INFO]  at io.netty.util.internal.NativeLibraryLoader.loadLibrary(NativeLibraryLoader.java:336)
[INFO]  at io.netty.util.internal.NativeLibraryLoader.load(NativeLibraryLoader.java:200)
[INFO]  at io.netty.channel.epoll.Native.loadNativeLibrary(Native.java:290)
[INFO]  at io.netty.channel.epoll.Native.<clinit>(Native.java:70)
[INFO]  at io.netty.channel.epoll.Epoll.<clinit>(Epoll.java:39)
[INFO]  at java.lang.Class.forName0(java.base@11.0.11/Native Method)
[INFO]  at java.lang.Class.forName(java.base@11.0.11/Class.java:315)
[INFO]  at io.grpc.netty.Utils.isEpollAvailable(Utils.java:291)
[INFO]  at io.grpc.netty.Utils.<clinit>(Utils.java:109)
[INFO]  at io.grpc.netty.NettyServerBuilder.<clinit>(NettyServerBuilder.java:83)
[INFO]  at c.i.w.m.e.ApplicationCode.applicationMethod(ApplicationCode.java:112)
[INFO]  at c.i.w.m.e.ApplicationCode.<init>(ApplicationCode.java:108)
[INFO]  at c.i.w.m.e.ApplicationCode.<init>(ApplicationCode.java:101)
[INFO]  at c.i.w.m.AppUnitTest.initialize(AppUnitTest.java:90)
[INFO]  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.11/Native Method)
[INFO]  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.11/NativeMethodAccessorImpl.java:62)
[INFO]  at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.11/DelegatingMethodAccessorImpl.java:43)
[INFO]  at java.lang.reflect.Method.invoke(java.base@11.0.11/Method.java:566)
[INFO]  at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
[INFO]  at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
[INFO]  at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
[INFO]  at org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
[INFO]  at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
[INFO]  at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
[INFO]  at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
[INFO]  at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
[INFO]  at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:364)
[INFO]  at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272)
[INFO]  at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:237)
[INFO]  at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:158)
[INFO]  at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:428)
[INFO]  at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:162)
[INFO]  at org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:562)
[INFO]  at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:548)
[INFO] "other-thread":
[INFO]  at java.lang.ClassLoader$NativeLibrary.loadLibrary(java.base@11.0.11/ClassLoader.java:2462)
[INFO]  - waiting to lock <0x00000000e11d6ad0> (a java.util.HashSet)
[INFO]  at java.lang.ClassLoader.loadLibrary0(java.base@11.0.11/ClassLoader.java:2694)
[INFO]  at java.lang.ClassLoader.loadLibrary(java.base@11.0.11/ClassLoader.java:2648)
[INFO]  at java.lang.Runtime.loadLibrary0(java.base@11.0.11/Runtime.java:830)
[INFO]  at java.lang.System.loadLibrary(java.base@11.0.11/System.java:1873)
[INFO]  at sun.security.ec.SunEC$1.run(jdk.crypto.ec@11.0.11/SunEC.java:63)
[INFO]  at sun.security.ec.SunEC$1.run(jdk.crypto.ec@11.0.11/SunEC.java:61)
[INFO]  at java.security.AccessController.doPrivileged(java.base@11.0.11/Native Method)
[INFO]  at sun.security.ec.SunEC.<clinit>(jdk.crypto.ec@11.0.11/SunEC.java:61)
[INFO]  at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(java.base@11.0.11/Native Method)
[INFO]  at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(java.base@11.0.11/NativeConstructorAccessorImpl.java:62)
[INFO]  at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(java.base@11.0.11/DelegatingConstructorAccessorImpl.java:45)
[INFO]  at java.lang.reflect.Constructor.newInstance(java.base@11.0.11/Constructor.java:490)
[INFO]  at java.util.ServiceLoader$ProviderImpl.newInstance(java.base@11.0.11/ServiceLoader.java:780)
[INFO]  at java.util.ServiceLoader$ProviderImpl.get(java.base@11.0.11/ServiceLoader.java:722)
[INFO]  at java.util.ServiceLoader$3.next(java.base@11.0.11/ServiceLoader.java:1395)
[INFO]  at sun.security.jca.ProviderConfig$ProviderLoader.load(java.base@11.0.11/ProviderConfig.java:340)
[INFO]  at sun.security.jca.ProviderConfig$3.run(java.base@11.0.11/ProviderConfig.java:248)
[INFO]  at sun.security.jca.ProviderConfig$3.run(java.base@11.0.11/ProviderConfig.java:242)
[INFO]  at java.security.AccessController.doPrivileged(java.base@11.0.11/Native Method)
[INFO]  at sun.security.jca.ProviderConfig.doLoadProvider(java.base@11.0.11/ProviderConfig.java:242)
[INFO]  at sun.security.jca.ProviderConfig.getProvider(java.base@11.0.11/ProviderConfig.java:222)
[INFO]  - locked <0x00000000e20c9f58> (a sun.security.jca.ProviderConfig)
[INFO]  at sun.security.jca.ProviderList.loadAll(java.base@11.0.11/ProviderList.java:315)
[INFO]  at sun.security.jca.ProviderList.removeInvalid(java.base@11.0.11/ProviderList.java:332)
[INFO]  at sun.security.jca.Providers.getFullProviderList(java.base@11.0.11/Providers.java:165)
[INFO]  - locked <0x00000000e20bc668> (a java.lang.Class for sun.security.jca.Providers)
[INFO]  at java.security.Security.getProviders(java.base@11.0.11/Security.java:483)
[INFO]  at sun.security.x509.AlgorithmId.computeOidTable(java.base@11.0.11/AlgorithmId.java:637)
[INFO]  at sun.security.x509.AlgorithmId.oidTable(java.base@11.0.11/AlgorithmId.java:627)
[INFO]  - locked <0x00000000e2bdd3d0> (a java.lang.Class for sun.security.x509.AlgorithmId)
[INFO]  at sun.security.x509.AlgorithmId.algOID(java.base@11.0.11/AlgorithmId.java:609)
[INFO]  at sun.security.x509.AlgorithmId.get(java.base@11.0.11/AlgorithmId.java:441)
[INFO]  at sun.security.pkcs.SignerInfo.verify(java.base@11.0.11/SignerInfo.java:380)
[INFO]  at sun.security.pkcs.PKCS7.verify(java.base@11.0.11/PKCS7.java:578)
[INFO]  at sun.security.pkcs.PKCS7.verify(java.base@11.0.11/PKCS7.java:595)
[INFO]  at sun.security.pkcs.SignerInfo.getTimestamp(java.base@11.0.11/SignerInfo.java:545)
[INFO]  at sun.security.pkcs.SignerInfo.verify(java.base@11.0.11/SignerInfo.java:318)
[INFO]  at sun.security.pkcs.PKCS7.verify(java.base@11.0.11/PKCS7.java:578)
[INFO]  at sun.security.pkcs.PKCS7.verify(java.base@11.0.11/PKCS7.java:595)
[INFO]  at sun.security.util.SignatureFileVerifier.processImpl(java.base@11.0.11/SignatureFileVerifier.java:283)
[INFO]  at sun.security.util.SignatureFileVerifier.process(java.base@11.0.11/SignatureFileVerifier.java:259)
[INFO]  at java.util.jar.JarVerifier.processEntry(java.base@11.0.11/JarVerifier.java:316)
[INFO]  at java.util.jar.JarVerifier.update(java.base@11.0.11/JarVerifier.java:230)
[INFO]  at java.util.jar.JarFile.initializeVerifier(java.base@11.0.11/JarFile.java:759)
[INFO]  at java.util.jar.JarFile.ensureInitialization(java.base@11.0.11/JarFile.java:1038)
[INFO]  - locked <0x00000000e119d0a8> (a java.util.jar.JarFile)
[INFO]  at java.util.jar.JavaUtilJarAccessImpl.ensureInitialization(java.base@11.0.11/JavaUtilJarAccessImpl.java:69)
[INFO]  at jdk.internal.loader.URLClassPath$JarLoader$2.getManifest(java.base@11.0.11/URLClassPath.java:870)
[INFO]  at jdk.internal.loader.BuiltinClassLoader.defineClass(java.base@11.0.11/BuiltinClassLoader.java:786)
[INFO]  at jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(java.base@11.0.11/BuiltinClassLoader.java:698)
[INFO]  at jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(java.base@11.0.11/BuiltinClassLoader.java:621)
[INFO]  - locked <0x00000000e204d228> (a java.lang.Object)
[INFO]  at jdk.internal.loader.BuiltinClassLoader.loadClass(java.base@11.0.11/BuiltinClassLoader.java:579)
[INFO]  at jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(java.base@11.0.11/ClassLoaders.java:178)
[INFO]  at java.lang.ClassLoader.loadClass(java.base@11.0.11/ClassLoader.java:522)
[INFO]  at c.i.w.m.OtherAppCode.<init>(OtherAppCode.java:371)
[INFO]  at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(java.base@11.0.11/Native Method)
[INFO]  at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(java.base@11.0.11/NativeConstructorAccessorImpl.java:62)
[INFO]  at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(java.base@11.0.11/DelegatingConstructorAccessorImpl.java:45)
[INFO]  at java.lang.reflect.Constructor.newInstance(java.base@11.0.11/Constructor.java:490)
[INFO]  at java.lang.Class.newInstance(java.base@11.0.11/Class.java:584)
[INFO]  at c.i.w.l.server.DefaultServer.lambda$doStart$0(DefaultServer.java:347)
[INFO]  at c.i.w.l.server.DefaultServer/0x00000008401c4840.run(Unknown Source)
[INFO]  at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.11/ThreadPoolExecutor.java:1128)
[INFO]  at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.11/ThreadPoolExecutor.java:628)
[INFO]  at java.lang.Thread.run(java.base@11.0.11/Thread.java:829)
[INFO]  at c.i.w.l.ThreadPoolHelper$3$1.run(ThreadPoolHelper.java:94)

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 28 (19 by maintainers)

Commits related to this issue

Most upvoted comments

Workaround was pulled in… Thanks a lot all for your help 😃

… So it seems very likely that you’ve managed to reproduce the same deadlock (thanks!). But I noticed that in my case the locked objects are a JarFile and a HashSet rather than a JarFile and a Runtime in your case.

@njhill The difference we are observing here (HashSet versus Runtime) does not materially affect how to proceed with this issue from a Netty perspective (Norman’s workaround is the best solution we have for now). The difference is easily understood if you look at 8231584 [1], which has been backported to OpenJDK 11u, but not Oracle JDK 11u (which I used for my testing).

[1] https://bugs.openjdk.java.net/browse/JDK-8231584

@ChrisHegarty yeah seems like it is at least “easier” to reproduce with more recent JDKs. Thanks for “confirm” my workaround idea! 👍

FYI - This kind of issue is not unique to Netty, e.g. https://github.com/openjdk/jdk/commit/991f7c13

I suspect that the race (potential issue) is possible with prior versions of 11u. Maybe Netty is just getting lucky! 😉