async-profiler: Async Profiler is blaming the wrong method

That could be rephrased too, into: “the JVM doesn’t report the right method?” - don’t know yet 😛

image

By running https://github.com/franz1981/quarkus-reactive-beer eg from script just run ./benchmark -a and using

openjdk version "19.0.2" 2023-01-17
OpenJDK Runtime Environment Temurin-19.0.2+7 (build 19.0.2+7)
OpenJDK 64-Bit Server VM Temurin-19.0.2+7 (build 19.0.2+7, mixed mode, sharin

The blamed method is likely wrong, see https://github.com/openjdk/jdk19/blob/967a28c3d85fdde6d5eb48aa0edd8f7597772469/src/java.base/share/classes/java/lang/reflect/Method.java#L609

It could be a case of skidding, but -e cycles report the same, and increasing the frequency still won’t help too.

Changing JDK version instead seems to help, and switching to

openjdk version "17.0.7" 2023-04-18
OpenJDK Runtime Environment Temurin-17.0.7+7 (build 17.0.7+7)
OpenJDK 64-Bit Server VM Temurin-17.0.7+7 (build 17.0.7+7, mixed mode, sharing)

image

is not even showing it

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 16 (7 by maintainers)

Most upvoted comments

but any idea why?

I bet it’s again a compiler issue related to lost or corrupted during transformation DebugInfo. Given that we can’t fix it in JDK 19 anyway, I see no value in spending more time on this.

so what JDK versions we can fully trust at this point?

Unfortunately, none. Mapping PC addresses back to bytecode indices has always been problematic. This is an area of JIT compiler’s responsibility, Java profilers can’t do much about this. What can be relied on so far - is the first non-inlined Java method. The top physical frame can be determined reliably, but all virtual (inlined) frames are reconstructed from DebugInfo provided by the JVM.

I see two possible approaches to improving profiling experience:

  1. continuously invest in providing more complete and accurate DebugInfo on the JVM side, including C2 fixes, additional tests, etc.
  2. add async-profiler feature to collect detailed instruction-level profile.

Reflection is something that caused problems before, see https://bugs.openjdk.org/browse/JDK-8302320. I’m surprised how often ASGCT is right, considering the lack of tests. This is one of the reasons for having a properly specified new API.

My point is…there are no tests on async profiler or on the Jdk to detect that the collected trace is not what’s expected? And…they are worthy?

There is a single test, but even this test did only test the topmost method before I fixed it. The testing situation is dire.

https://github.com/parttimenerd/trace_validation/ and https://github.com/parttimenerd/trace_tester should improve this in the future.

If we are talking about the future this may be a solution https://openjdk.org/jeps/435, but for now it’s just a candidate.

I’m currently doing my test to get it in and then backport it.

Yep, JB and some DataDog folks raised the point about the jmethodID and concurrent class unloading in some recent Jdk mailing list too

This is the reason why the JEP 435 doesn’t use jmethodIDs anymore.

@franz1981 I believe @parttimenerd can give some input here how AsyncGetCallTrace is tested in Hotspot. I believe that async-profiler can do nothing here, it’s just a proxy of lie from JVM to end user.

I know that those bugs looks scary and make our work much harder, but that’s the world we are living now. Both perf and eBPF used with perf-map-agent are also vulnerable to those bugs, so there is no alternative right now (instrumentation and safepoint bias “vintage” sampling are even worse IMO). Sorry, all profilers are lying, you just need get used to it 😃

From what I know all Hotspot’s ports are vulnerable to these issues. One friend told me that Azul Prime is not, I don’t know if it’s suitable choice to you, I believe you don’t have to buy license for benchmarking (but you should check by yourself).

JDK 19 is no longer supported. I tried JDK 20.0.1 and the profile looks sane to me. Please confirm if you see the issue with the latest JDK.