grpc-java: JVM crash with grpc-java 1.42.x and alpine docker image
An attempt to upgrade from grpc-java 1.41.1 to 1.42.x ends with JVM crash.
It looks like the problem is specific to Alpine Linux. It reproduces on openjdk:15-jdk-alpine
and openjdk:8-alpine
and goes away with a switch on openjdk:X-slim
[debian] images.
Maybe also affected by the fact that openjdk:X-alpine
images are not maintained anymore, hence have no new JDK updates.
The first version of grpc-java with the problem is 1.42.0, the versions before work fine.
It may be related to https://github.com/grpc/grpc/issues/27995
#
| # A fatal error has been detected by the Java Runtime Environment:
| #
| # SIGSEGV (0xb) at pc=0x0000000000003efe, pid=372, tid=0x00007fbffbc9bb10
| #
| # JRE version: OpenJDK Runtime Environment (8.0_212-b04) (build 1.8.0_212-b04)
| # Java VM: OpenJDK 64-Bit Server VM (25.212-b04 mixed mode linux-amd64 compressed oops)
| # Derivative: IcedTea 3.12.0
| # Distribution: Custom build (Sat May 4 17:33:35 UTC 2019)
| # Problematic frame:
| # C 0x0000000000003efe
| #
| # Core dump written. Default location: /temporal-java-client/temporal-kotlin/core or core.372
| #
| # An error report file with more information is saved as:
| # /temporal-java-client/temporal-kotlin/hs_err_pid372.log
| #
| # If you would like to submit a bug report, please include
| # instructions on how to reproduce the bug and visit:
| # https://icedtea.classpath.org/bugzilla
| #
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 8
- Comments: 19 (8 by maintainers)
Commits related to this issue
- Update transport-native-epoll compile flags - Move libraries to LIBS where they should be, avoiding need for -Wl,--no-as-needed. - Use -O2 instead of -O3; there are no tight loops so -O3 just incre... — committed to Hello71/netty by Hello71 2 years ago
- Update transport-native-epoll compile flags - Move libraries to LIBS where they should be, avoiding need for -Wl,--no-as-needed. - Use -O2 instead of -O3; there are no tight loops so -O3 just incre... — committed to Hello71/netty by Hello71 2 years ago
- Update transport-native-epoll compile flags - Move libraries to LIBS where they should be, avoiding need for -Wl,--no-as-needed. - Use -O2 instead of -O3; there are no tight loops so -O3 just incre... — committed to Hello71/netty by Hello71 2 years ago
- Update transport-native-epoll compile flags (#12272) Motivation: Loading of the native epoll transport might segfault the JVM if musl is used and no glibc-compat is installed: - Move librarie... — committed to netty/netty by Hello71 2 years ago
- Update transport-native-epoll compile flags (#12272) Motivation: Loading of the native epoll transport might segfault the JVM if musl is used and no glibc-compat is installed: Modifications: - Mov... — committed to netty/netty by Hello71 2 years ago
- Update transport-native-epoll compile flags (#12272) (#12313) Motivation: Loading of the native epoll transport might segfault the JVM if musl is used and no glibc-compat is installed: Modifica... — committed to netty/netty by normanmaurer 2 years ago
- COLLECTOR-1092. Include gcompat in the Docker image build There is an issue with the latest alpine versions and io.grpc used by google libs (check https://github.com/grpc/grpc-java/issues/8751) Chan... — committed to streamsets/datacollector-docker by jbodiaz 2 years ago
- Update transport-native-epoll compile flags (#12272) (#12313) Motivation: Loading of the native epoll transport might segfault the JVM if musl is used and no glibc-compat is installed: Modifica... — committed to raidyue/netty by normanmaurer 2 years ago
I had the same issue having
grpc
libraries in the classpath like:and
apline:3.15.0
with JDKThanks to @ejona86, what I’ve made:
Or Dockerfile should include just:
TL;DR: Alpine doesn’t have compatibility for the __strndup symbol. I don’t know why the behavior is k8s-dependent, though. And it’ll take some more research to determine appropriate next steps.
Looking at objdump output, it looks like the problem is happening in
parsePackagePrefix
.libio_grpc_netty_shaded_netty_transport_native_epoll_x86_642526953976876250345.so+0xb487
:But I don’t see any obvious places the stack could get corrupted in
parsePackagePrefix
and there’s no callout to a passed function. I’m suspicious though that the linker is broken. This is inparsePackagePrefix
:The address displayed is relative to the .so, so isn’t the problem itself. It jumps to the PLT:
And the indirect jump goes to the GOT which should be filled with the adjusted address of 3efe. But maybe something is broken in the linker and it didn’t get adjusted?
Well, there we go… Older versions of epoll linked against strndup, not __strndup. This difference may have been caused by a glibc upgrade when compiling.
Thank you, @artemptushkin ! Your solution works. 😃 But there’s a typo in the path. It should be
/lib/libgcompat.so.0
like you used for the previous command.Relying on glibc-compat is not a safe or wise thing to do… and I can confirm that using
-Dio.grpc.netty.shaded.io.netty.transport.noNative=true
avoids the segfault. E.g.adding the
grpc-java
package did not seem to help.TL;DR: Try setting the
LD_PRELOAD=/lib/libgcompat.so.0
environment variable on ~old~ Alpine versions, so gcompat is actually used. Hopefully that won’t break any of the musl binaries. ~Best approach is probably “upgrade Alpine to 3.13 or later” though.~ (Edit: New versions exhibit similar behavior, because this flow avoids libc6-compat glue)openjdk:8-alpine
is based on alpine 3.9.4, and usingapline:3.9.4
directly produces similar ldd results. Interestinglyapline:3.15.0
(latest
) has more unresolved symbols:But I think that isn’t the full story. Looking at the older Alpine:
lib64/ld-linux-x86-64.so.2
is used for x86_64, notlib/ld-linux-x86-64.so.2
. So I suspect gcompat is useless on this older version. Instead, libc6-compat is providing the linker which just forwards to the musl linker, and is missing the symbol. So I think ldd on this older Alpine is accurate.Here though, gcompat is providing the linker which loads lib/libgcompat.so.0. That means I don’t think the ldd output is accurate. I see that gcompat 0.3.0 (Alpine 3.9) and 1.0.0 (Alpine 3.15) have __strndup, so I think the wrong linker on older Alpine versions is the trouble. Trying the gcompat linker approach “manually” on the old Alpine seems to work:
Given what I saw with grpc/grpc#27995, I expect that if the binary you execute is musl-based then gcompat wouldn’t be used automatically. That’s just a deficiency of how gcompat linker works. libc6-compat could have provided symbols, but not with its symlink-to-musl approach. So I guess LD_PRELOAD is with us long-term.
@alexfeigin we hope for the one after
3.15
Not likely. We have this issue on an alpine image with installed glibc. Also, the image works with grpc-java (,1.42.0) and doesn’t work with [1.42.0,). So, the problem doesn’t look like just an absent glibc and not handling it by netty gracefully.