netty: JNI classloader deadlock with latest JDK version
Expected behavior
No classloader deadlocks
Actual behavior
I am still investigating but there appears to be a possible class loading concurrency deadlock when using the epoll transport - it started when upgrading from OpenJDK 11.0.10 to 11.0.11.
It looks to be related to JNI reentrancy where the netty_epoll_native_JNI_OnLoad func, which happens in the context of static initialization of the Epoll/Native classes, calls back into java to initialize additional classes. The deadlock happens between this and another (non-native) thread in the process of loading other classes.
See stacks of deadlocked threads below.
Seems likely it’s a new JDK bug. I noticed this other recent netty issue/fix which is kind of related, but this appears to be more problematic because the second thread is loading an unrelated class: https://github.com/netty/netty/issues/10187
I was able to workaround by forcing the native classes to load upfront.
Steps to reproduce
Load Epoll at the same time as other classes are loading in another thread (I guess it might similarly apply to the other native modules e.g. OpenSsl).
Minimal yet complete reproducer code (or URL to code)
Don’t have reproducer but it happens pretty reliably (my testsuite failed every time).
Netty version
4.1.63.Final
JVM version (e.g. java -version)
openjdk version "11.0.11" 2021-04-20 LTS
OpenJDK Runtime Environment 18.9 (build 11.0.11+9-LTS)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.11+9-LTS, mixed mode, sharing)
OS version (e.g. uname -a)
Linux b7992f1e6991 4.19.121-linuxkit #1 SMP Tue Dec 1 17:50:32 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Stacktraces
[INFO] "main":
[INFO] at java.util.zip.ZipFile.getEntry(java.base@11.0.11/ZipFile.java:346)
[INFO] - waiting to lock <0x00000000e119d0a8> (a java.util.jar.JarFile)
[INFO] at java.util.zip.ZipFile$1.getEntry(java.base@11.0.11/ZipFile.java:1126)
[INFO] at java.util.jar.JarFile.getEntry0(java.base@11.0.11/JarFile.java:578)
[INFO] at java.util.jar.JarFile.getEntry(java.base@11.0.11/JarFile.java:508)
[INFO] at java.util.jar.JarFile.getJarEntry(java.base@11.0.11/JarFile.java:470)
[INFO] at jdk.internal.loader.URLClassPath$JarLoader.getResource(java.base@11.0.11/URLClassPath.java:929)
[INFO] at jdk.internal.loader.URLClassPath.getResource(java.base@11.0.11/URLClassPath.java:314)
[INFO] at jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(java.base@11.0.11/BuiltinClassLoader.java:695)
[INFO] at jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(java.base@11.0.11/BuiltinClassLoader.java:621)
[INFO] - locked <0x00000000e2006a18> (a java.lang.Object)
[INFO] at jdk.internal.loader.BuiltinClassLoader.loadClass(java.base@11.0.11/BuiltinClassLoader.java:579)
[INFO] at jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(java.base@11.0.11/ClassLoaders.java:178)
[INFO] at java.lang.ClassLoader.loadClass(java.base@11.0.11/ClassLoader.java:522)
[INFO] at java.lang.ClassLoader$NativeLibrary.load0(java.base@11.0.11/Native Method)
[INFO] at java.lang.ClassLoader$NativeLibrary.load(java.base@11.0.11/ClassLoader.java:2442)
[INFO] at java.lang.ClassLoader$NativeLibrary.loadLibrary(java.base@11.0.11/ClassLoader.java:2498)
[INFO] - locked <0x00000000e11d6ad0> (a java.util.HashSet)
[INFO] at java.lang.ClassLoader.loadLibrary0(java.base@11.0.11/ClassLoader.java:2694)
[INFO] at java.lang.ClassLoader.loadLibrary(java.base@11.0.11/ClassLoader.java:2627)
[INFO] at java.lang.Runtime.load0(java.base@11.0.11/Runtime.java:768)
[INFO] at java.lang.System.load(java.base@11.0.11/System.java:1837)
[INFO] at io.netty.util.internal.NativeLibraryUtil.loadLibrary(NativeLibraryUtil.java:36)
[INFO] at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.11/Native Method)
[INFO] at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.11/NativeMethodAccessorImpl.java:62)
[INFO] at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.11/DelegatingMethodAccessorImpl.java:43)
[INFO] at java.lang.reflect.Method.invoke(java.base@11.0.11/Method.java:566)
[INFO] at io.netty.util.internal.NativeLibraryLoader$1.run(NativeLibraryLoader.java:378)
[INFO] at java.security.AccessController.doPrivileged(java.base@11.0.11/Native Method)
[INFO] at io.netty.util.internal.NativeLibraryLoader.loadLibraryByHelper(NativeLibraryLoader.java:370)
[INFO] at io.netty.util.internal.NativeLibraryLoader.loadLibrary(NativeLibraryLoader.java:336)
[INFO] at io.netty.util.internal.NativeLibraryLoader.load(NativeLibraryLoader.java:200)
[INFO] at io.netty.channel.epoll.Native.loadNativeLibrary(Native.java:290)
[INFO] at io.netty.channel.epoll.Native.<clinit>(Native.java:70)
[INFO] at io.netty.channel.epoll.Epoll.<clinit>(Epoll.java:39)
[INFO] at java.lang.Class.forName0(java.base@11.0.11/Native Method)
[INFO] at java.lang.Class.forName(java.base@11.0.11/Class.java:315)
[INFO] at io.grpc.netty.Utils.isEpollAvailable(Utils.java:291)
[INFO] at io.grpc.netty.Utils.<clinit>(Utils.java:109)
[INFO] at io.grpc.netty.NettyServerBuilder.<clinit>(NettyServerBuilder.java:83)
[INFO] at c.i.w.m.e.ApplicationCode.applicationMethod(ApplicationCode.java:112)
[INFO] at c.i.w.m.e.ApplicationCode.<init>(ApplicationCode.java:108)
[INFO] at c.i.w.m.e.ApplicationCode.<init>(ApplicationCode.java:101)
[INFO] at c.i.w.m.AppUnitTest.initialize(AppUnitTest.java:90)
[INFO] at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.11/Native Method)
[INFO] at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.11/NativeMethodAccessorImpl.java:62)
[INFO] at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.11/DelegatingMethodAccessorImpl.java:43)
[INFO] at java.lang.reflect.Method.invoke(java.base@11.0.11/Method.java:566)
[INFO] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
[INFO] at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
[INFO] at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
[INFO] at org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
[INFO] at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
[INFO] at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
[INFO] at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
[INFO] at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
[INFO] at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:364)
[INFO] at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272)
[INFO] at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:237)
[INFO] at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:158)
[INFO] at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:428)
[INFO] at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:162)
[INFO] at org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:562)
[INFO] at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:548)
[INFO] "other-thread":
[INFO] at java.lang.ClassLoader$NativeLibrary.loadLibrary(java.base@11.0.11/ClassLoader.java:2462)
[INFO] - waiting to lock <0x00000000e11d6ad0> (a java.util.HashSet)
[INFO] at java.lang.ClassLoader.loadLibrary0(java.base@11.0.11/ClassLoader.java:2694)
[INFO] at java.lang.ClassLoader.loadLibrary(java.base@11.0.11/ClassLoader.java:2648)
[INFO] at java.lang.Runtime.loadLibrary0(java.base@11.0.11/Runtime.java:830)
[INFO] at java.lang.System.loadLibrary(java.base@11.0.11/System.java:1873)
[INFO] at sun.security.ec.SunEC$1.run(jdk.crypto.ec@11.0.11/SunEC.java:63)
[INFO] at sun.security.ec.SunEC$1.run(jdk.crypto.ec@11.0.11/SunEC.java:61)
[INFO] at java.security.AccessController.doPrivileged(java.base@11.0.11/Native Method)
[INFO] at sun.security.ec.SunEC.<clinit>(jdk.crypto.ec@11.0.11/SunEC.java:61)
[INFO] at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(java.base@11.0.11/Native Method)
[INFO] at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(java.base@11.0.11/NativeConstructorAccessorImpl.java:62)
[INFO] at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(java.base@11.0.11/DelegatingConstructorAccessorImpl.java:45)
[INFO] at java.lang.reflect.Constructor.newInstance(java.base@11.0.11/Constructor.java:490)
[INFO] at java.util.ServiceLoader$ProviderImpl.newInstance(java.base@11.0.11/ServiceLoader.java:780)
[INFO] at java.util.ServiceLoader$ProviderImpl.get(java.base@11.0.11/ServiceLoader.java:722)
[INFO] at java.util.ServiceLoader$3.next(java.base@11.0.11/ServiceLoader.java:1395)
[INFO] at sun.security.jca.ProviderConfig$ProviderLoader.load(java.base@11.0.11/ProviderConfig.java:340)
[INFO] at sun.security.jca.ProviderConfig$3.run(java.base@11.0.11/ProviderConfig.java:248)
[INFO] at sun.security.jca.ProviderConfig$3.run(java.base@11.0.11/ProviderConfig.java:242)
[INFO] at java.security.AccessController.doPrivileged(java.base@11.0.11/Native Method)
[INFO] at sun.security.jca.ProviderConfig.doLoadProvider(java.base@11.0.11/ProviderConfig.java:242)
[INFO] at sun.security.jca.ProviderConfig.getProvider(java.base@11.0.11/ProviderConfig.java:222)
[INFO] - locked <0x00000000e20c9f58> (a sun.security.jca.ProviderConfig)
[INFO] at sun.security.jca.ProviderList.loadAll(java.base@11.0.11/ProviderList.java:315)
[INFO] at sun.security.jca.ProviderList.removeInvalid(java.base@11.0.11/ProviderList.java:332)
[INFO] at sun.security.jca.Providers.getFullProviderList(java.base@11.0.11/Providers.java:165)
[INFO] - locked <0x00000000e20bc668> (a java.lang.Class for sun.security.jca.Providers)
[INFO] at java.security.Security.getProviders(java.base@11.0.11/Security.java:483)
[INFO] at sun.security.x509.AlgorithmId.computeOidTable(java.base@11.0.11/AlgorithmId.java:637)
[INFO] at sun.security.x509.AlgorithmId.oidTable(java.base@11.0.11/AlgorithmId.java:627)
[INFO] - locked <0x00000000e2bdd3d0> (a java.lang.Class for sun.security.x509.AlgorithmId)
[INFO] at sun.security.x509.AlgorithmId.algOID(java.base@11.0.11/AlgorithmId.java:609)
[INFO] at sun.security.x509.AlgorithmId.get(java.base@11.0.11/AlgorithmId.java:441)
[INFO] at sun.security.pkcs.SignerInfo.verify(java.base@11.0.11/SignerInfo.java:380)
[INFO] at sun.security.pkcs.PKCS7.verify(java.base@11.0.11/PKCS7.java:578)
[INFO] at sun.security.pkcs.PKCS7.verify(java.base@11.0.11/PKCS7.java:595)
[INFO] at sun.security.pkcs.SignerInfo.getTimestamp(java.base@11.0.11/SignerInfo.java:545)
[INFO] at sun.security.pkcs.SignerInfo.verify(java.base@11.0.11/SignerInfo.java:318)
[INFO] at sun.security.pkcs.PKCS7.verify(java.base@11.0.11/PKCS7.java:578)
[INFO] at sun.security.pkcs.PKCS7.verify(java.base@11.0.11/PKCS7.java:595)
[INFO] at sun.security.util.SignatureFileVerifier.processImpl(java.base@11.0.11/SignatureFileVerifier.java:283)
[INFO] at sun.security.util.SignatureFileVerifier.process(java.base@11.0.11/SignatureFileVerifier.java:259)
[INFO] at java.util.jar.JarVerifier.processEntry(java.base@11.0.11/JarVerifier.java:316)
[INFO] at java.util.jar.JarVerifier.update(java.base@11.0.11/JarVerifier.java:230)
[INFO] at java.util.jar.JarFile.initializeVerifier(java.base@11.0.11/JarFile.java:759)
[INFO] at java.util.jar.JarFile.ensureInitialization(java.base@11.0.11/JarFile.java:1038)
[INFO] - locked <0x00000000e119d0a8> (a java.util.jar.JarFile)
[INFO] at java.util.jar.JavaUtilJarAccessImpl.ensureInitialization(java.base@11.0.11/JavaUtilJarAccessImpl.java:69)
[INFO] at jdk.internal.loader.URLClassPath$JarLoader$2.getManifest(java.base@11.0.11/URLClassPath.java:870)
[INFO] at jdk.internal.loader.BuiltinClassLoader.defineClass(java.base@11.0.11/BuiltinClassLoader.java:786)
[INFO] at jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(java.base@11.0.11/BuiltinClassLoader.java:698)
[INFO] at jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(java.base@11.0.11/BuiltinClassLoader.java:621)
[INFO] - locked <0x00000000e204d228> (a java.lang.Object)
[INFO] at jdk.internal.loader.BuiltinClassLoader.loadClass(java.base@11.0.11/BuiltinClassLoader.java:579)
[INFO] at jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(java.base@11.0.11/ClassLoaders.java:178)
[INFO] at java.lang.ClassLoader.loadClass(java.base@11.0.11/ClassLoader.java:522)
[INFO] at c.i.w.m.OtherAppCode.<init>(OtherAppCode.java:371)
[INFO] at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(java.base@11.0.11/Native Method)
[INFO] at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(java.base@11.0.11/NativeConstructorAccessorImpl.java:62)
[INFO] at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(java.base@11.0.11/DelegatingConstructorAccessorImpl.java:45)
[INFO] at java.lang.reflect.Constructor.newInstance(java.base@11.0.11/Constructor.java:490)
[INFO] at java.lang.Class.newInstance(java.base@11.0.11/Class.java:584)
[INFO] at c.i.w.l.server.DefaultServer.lambda$doStart$0(DefaultServer.java:347)
[INFO] at c.i.w.l.server.DefaultServer/0x00000008401c4840.run(Unknown Source)
[INFO] at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.11/ThreadPoolExecutor.java:1128)
[INFO] at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.11/ThreadPoolExecutor.java:628)
[INFO] at java.lang.Thread.run(java.base@11.0.11/Thread.java:829)
[INFO] at c.i.w.l.ThreadPoolHelper$3$1.run(ThreadPoolHelper.java:94)
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 28 (19 by maintainers)
Commits related to this issue
- Preload classes before calling native OnLoad function to prevent classloader deadlock Motivation: It turns out it is quite easy to cause a classloader deadlock in more recent java updates if you cau... — committed to netty/netty by normanmaurer 3 years ago
- Preload classes so we can't see a classloader deadlock Motivation: It turns out it is quite easy to cause a classloader deadlock in more recent java updates if you cause classloading while you are i... — committed to netty/netty-tcnative by normanmaurer 3 years ago
- Preload classes before calling native OnLoad function to prevent clas… (#11215) Motivation: It turns out it is quite easy to cause a classloader deadlock in more recent java updates if you cause c... — committed to netty/netty by normanmaurer 3 years ago
- Preload classes before calling native OnLoad function to prevent clas… (#11215) Motivation: It turns out it is quite easy to cause a classloader deadlock in more recent java updates if you cause c... — committed to netty/netty by normanmaurer 3 years ago
- Preload classes before calling native OnLoad function to prevent clas… (#11215) Motivation: It turns out it is quite easy to cause a classloader deadlock in more recent java updates if you cause cla... — committed to netty/netty-incubator-transport-io_uring by normanmaurer 3 years ago
- Preload classes before calling native OnLoad function to prevent clas… (#11215) Motivation: It turns out it is quite easy to cause a classloader deadlock in more recent java updates if you cause cla... — committed to netty/netty-incubator-transport-io_uring by normanmaurer 3 years ago
- [SPARK-36729][BUILD] Upgrade Netty from 4.1.63 to 4.1.68 ### What changes were proposed in this pull request? This PR upgrades Netty from `4.1.63` to `4.1.68`. All the changes from `4.1.64` to `4.1... — committed to apache/spark by sarutak 3 years ago
- [SPARK-36729][BUILD] Upgrade Netty from 4.1.63 to 4.1.68 ### What changes were proposed in this pull request? This PR upgrades Netty from `4.1.63` to `4.1.68`. All the changes from `4.1.64` to `4.1... — committed to apache/spark by sarutak 3 years ago
- Preload classes before calling native OnLoad function to prevent clas… (#11215) Motivation: It turns out it is quite easy to cause a classloader deadlock in more recent java updates if you cause c... — committed to raidyue/netty by normanmaurer 3 years ago
Workaround was pulled in… Thanks a lot all for your help 😃
@njhill The difference we are observing here (
HashSetversusRuntime) does not materially affect how to proceed with this issue from a Netty perspective (Norman’s workaround is the best solution we have for now). The difference is easily understood if you look at 8231584 [1], which has been backported to OpenJDK 11u, but not Oracle JDK 11u (which I used for my testing).[1] https://bugs.openjdk.java.net/browse/JDK-8231584
@ChrisHegarty yeah seems like it is at least “easier” to reproduce with more recent JDKs. Thanks for “confirm” my workaround idea! 👍
FYI - This kind of issue is not unique to Netty, e.g. https://github.com/openjdk/jdk/commit/991f7c13
I suspect that the race (potential issue) is possible with prior versions of 11u. Maybe Netty is just getting lucky! 😉