dd-trace-java: dd-trace-java v1.11.0 crashes the JVM

Our server automatically downloaded the latest trace agent. v1.11.0 After a few minutes, our servers started to crash and reboot in loop. After some investigation, it seems that the JVM was crashing. So maybe it’s more a JVM issue, but it seems to be your recent changes that introduced this behaviour.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f171904b992, pid=23811, tid=24534
#
# JRE version: OpenJDK Runtime Environment Corretto-17.0.6.10.1 (17.0.6+10) (build 17.0.6+10-LTS)
# Java VM: OpenJDK 64-Bit Server VM Corretto-17.0.6.10.1 (17.0.6+10-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libjavaProfiler16018171302964888844.so+0x7992]  Buffer::putVar64(unsigned long long)+0x102
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# JFR recording file will be written. Location: /usr/share/tomcat/hs_err_pid23811.jfr
#
# If you would like to submit a bug report, please visit:
#   https://github.com/corretto/corretto-17/issues/
#

---------------  S U M M A R Y ------------

---------------  T H R E A D  ---------------

Current thread (0x00007f1755ab1900):  JavaThread "dd-trace-processor" daemon [_thread_in_Java, id=24534, stack(0x00007f16fc328000,0x00007f16fc429000)]

Stack: [0x00007f16fc328000,0x00007f16fc429000],  sp=0x00007f16fc4266d8,  free space=1017k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libjavaProfiler16018171302964888844.so+0x7992]  Buffer::putVar64(unsigned long long)+0x102
C  [libjavaProfiler16018171302964888844.so+0x1b956]  Profiler::recordSample(void*, unsigned long long, int, int, Event*)+0x256
C  [libjavaProfiler16018171302964888844.so+0x1c2c6]  PerfEvents::signalHandler(int, siginfo_t*, void*)+0x116


siginfo: si_signo: 11 (SIGSEGV), si_code: 2 (SEGV_ACCERR), si_addr: 0x00007f17081c3673

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 17
  • Comments: 18 (7 by maintainers)

Most upvoted comments

Fixed by #4981

Seeing the same issue with Temurin base image and Java 11.0.18.

We managed to reproduce the issue and verified a fix for this, released in 1.11.2. If the crash persists after upgrading to 1.11.2, please report back with the backtrace from the hs_err file, the JDK version and the base docker base image being used (or the linux and libc versions otherwise).

We acknowledge that this crash is still possible, hence reopening the issue, and are working on getting 1.11.2 out with full mitigation.

Just in case people haven’t seen the workaround Jaroslav posted on slack:

As a quick remedy you can add -Ddd.profiling.ddprof.enabled=false to disable the native profiler library. This will disable code hotspots but if you need to use 1.11.0 for other fixes/features this would be the fastest way to get you on track

1.11.2 has resolved this for me. Thank you.

1.11.1 didn’t solve the issue for us

29-Mar-2023 16:13:57.110 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log CATALINA_BASE:         /usr/local/tomcat
29-Mar-2023 16:13:57.110 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Vendor:            Amazon.com Inc.
29-Mar-2023 16:13:57.110 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Version:           1.8.0_362-b08
29-Mar-2023 16:13:57.110 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Java Home:             /usr/lib/jvm/java-1.8.0-amazon-corretto/jre
29-Mar-2023 16:13:57.110 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Architecture:          amd64
29-Mar-2023 16:13:57.110 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Version:            5.4.228-131.415.amzn2.x86_64
29-Mar-2023 16:13:57.110 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Name:               Linux
29-Mar-2023 16:13:57.110 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server version number: 8.5.87.0
29-Mar-2023 16:13:57.110 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server built:          Feb 27 2023 19:32:33 UTC
29-Mar-2023 16:13:57.108 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server version name:   Apache Tomcat/8.5.87
    Using VM: OpenJDK 64-Bit Server VM
    Ergonomics Machine Class: server
    Max. Heap Size (Estimated): 2.88G
VM settings:
[dd.trace 2023-03-29 16:13:56:456 -0700] [dd-task-scheduler] INFO datadog.trace.agent.core.StatusLogger - DATADOG TRACER CONFIGURATION {"version":"1.11.1~514d7ebfe1","os_name":"Linux","os_version":"5.4.228-131.415.amzn2.x86_64","architecture":"amd64","lang":"jvm","lang_version":"1.8.0_362","jvm_vendor":"Amazon.com Inc.","jvm_version":"25.362-b08","java_class_version":"52.0","http_nonProxyHosts":"null","http_proxyHost":"null","enabled":true,"service":"warapps-myasu","agent_url":"http://10.120.43.39:8126","agent_error":false,"debug":false,"trace_propagation_style_extract":["datadog"],"trace_propagation_style_inject":["datadog"],"analytics_enabled":false,"sampling_rules":[{},{}],"priority_sampling_enabled":true,"logs_correlation_enabled":true,"profiling_enabled":true,"remote_config_enabled":true,"debugger_enabled":false,"appsec_enabled":"ENABLED_INACTIVE","telemetry_enabled":true,"dd_version":"ci-1680128715949","health_checks_enabled":true,"configuration_file":"no config file present","runtime_id":"e50c8cdd-1856-45cd-a69e-eb0bf3eebd6e","logging_settings":{"levelInBrackets":false,"dateTimeFormat":"'[dd.trace 'yyyy-MM-dd HH:mm:ss:SSS Z']'","logFile":"System.err","configurationFile":"simplelogger.properties","showShortLogName":false,"showDateTime":true,"showLogName":true,"showThreadName":true,"defaultLogLevel":"INFO","warnLevelString":"WARN","embedException":false},"cws_enabled":false,"cws_tls_refresh":5000,"datadog_profiler_enabled":true,"datadog_profiler_safe":true}
[dd.trace 2023-03-29 16:13:56:302 -0700] [main] INFO com.datadog.appsec.AppSecSystem - AppSec is ENABLED_INACTIVE with powerwaf(libddwaf: 1.8.2) no rules loaded
#
#   https://github.com/corretto/corretto-8/issues/
# If you would like to submit a bug report, please visit:
#
# /usr/local/tomcat/hs_err_pid7.log
# An error report file with more information is saved as:
#
# Core dump written. Default location: /usr/local/tomcat/core or core.7
#
# C  [libjavaProfiler3165790072185085033.so+0x78b2]  Buffer::putVar64(unsigned long long)+0x102
# Problematic frame:
# Java VM: OpenJDK 64-Bit Server VM (25.362-b08 mixed mode linux-amd64 compressed oops)
# JRE version: OpenJDK Runtime Environment (8.0_362-b08) (build 1.8.0_362-b08)
#
#  SIGSEGV (0xb) at pc=0x00007f82c658c8b2, pid=7, tid=0x00007f825d9ea700
#
# A fatal error has been detected by the Java Runtime Environment:
#
[error occurred during error reporting (null), id 0xb]```

We are validating the fix now and will do a patch release once we are sure the root cause is fixed. Will keep you posted.

Thank you, but this does not seem to be working yet. we are on Java 8. it still crashes with the latest agent.

SIGSEGV (0xb) at pc=0x00007fc0f07a28b2, pid=45, tid=0x00007fc078b5f700

+1 to above we ended up pinning to the latest stable version we observed 1.10.1

RUN wget -O dd-java-agent.jar 'https://github.com/DataDog/dd-trace-java/releases/download/v1.10.1/dd-java-agent.jar'

instead of

RUN wget -O dd-java-agent.jar 'https://dtdg.co/latest-java-tracer

We are seeing similar with Java 11 + CentOS. Identical application instances with version 1.10.0~c545cdc5a3 do not experience this issue.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f9d35f84992, pid=6088, tid=14890
#
# JRE version: OpenJDK Runtime Environment (Red_Hat-11.0.18.0.10-1.el7_9) (11.0.18+10) (build 11.0.18+10-LTS)
# Java VM: OpenJDK 64-Bit Server VM (Red_Hat-11.0.18.0.10-1.el7_9) (11.0.18+10-LTS, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C  [libjavaProfiler15813639711709130809.so+0x7992]  Buffer::putVar64(unsigned long long)+0x102
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   https://bugzilla.redhat.com/enter_bug.cgi?product=Red%20Hat%20Enterprise%20Linux%207&component=java-11-openjdk
#

---------------  S U M M A R Y ------------
Command Line: -D[Standalone] -Xms2048m -Xmx10240m -XX:MaxPermSize=1024m .....
Host: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz, 4 cores, 15G, CentOS Linux release 7.9.2009 (Core)

---------------  T H R E A D  ---------------

Current thread (0x0000555d8a4d4800):  JavaThread "default task-16" [_thread_in_vm, id=14890, stack(0x00007f9d31a39000,0x00007f9d31b3a000)]

Stack: [0x00007f9d31a39000,0x00007f9d31b3a000],  sp=0x00007f9d31b35ad8,  free space=1010k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libjavaProfiler15813639711709130809.so+0x7992]  Buffer::putVar64(unsigned long long)+0x102
C  [libjavaProfiler15813639711709130809.so+0x1b956]  Profiler::recordSample(void*, unsigned long long, int, int, Event*)+0x256
C  [libjavaProfiler15813639711709130809.so+0x1c2c6]  PerfEvents::signalHandler(int, siginfo_t*, void*)+0x116
C  [libpthread.so.0+0xf630]
V  [libjvm.so+0xd411cc]  SharedRuntime::montgomery_square(int*, int*, int, long, int*)+0x15c


siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000555dfaa25f48