quarkus: OOM in Quarkus 3.0.0.Beta1 caused by okio via OpenTelemetry

Describe the bug

okio starts a thread via the class okio.AsyncTimeout.Watchdog once “it has to deal with a timeout”. The thread’s configured as a daemon thread and seems to have some shutdown logic, and also aborts when its interrupted.

The behavior isn’t an issue in production usecases, but it’s an issue when running tests.

okio.AsyncTimeout.Watchdog is loaded by a Quarkus class loader for every test, so it implicitly keeps a reference to its class loader and transitively to all the resources that one holds - just because the thread’s still running.

Setting the following parameters disables timeouts and in turn does not start that watchdog thread and the OOM doesn’t happen. It’s maybe a legit workaround for tests, until the issue’s fixed.

quarkus.otel.exporter.otlp.timeout=0
quarkus.otel.exporter.otlp.traces.timeout=0

This behavior seems to be introduced after Quarkus 3.0.0.Alpha5, but I’m not sure why, because okio seems to behave this way “forever”.

Expected behavior

No response

Actual behavior

No response

How to Reproduce?

No response

Output of uname -a or ver

No response

Output of java -version

No response

GraalVM version (if different from Java)

No response

Quarkus version or git rev

No response

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

No response

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 33 (32 by maintainers)

Commits related to this issue

Most upvoted comments

I got native to work, so now I think it’s more a matter of hardening

Yeah, I see that. My point is that in reality, OkHttpGrpcExporter doesn’t really need to know much about gRPC.

And actually this is true, as the description of the original PR that introduced the class above says.

It’s a legit bug caused by the usage of okio/okhttp w/ that watchdog thread in tests.

It’s not about flushing, it’s that the watchdog thread never terminates and keeps all classes on the heap, eventually producing an OOM.

Can repro at will:

You can also attach a debugger to it and check that the Watchdog thread keeps running and never terminates.