quarkus: Sporadic "Failed to export spans. The request could not be executed. Full error message: Stream was closed"

Describe the bug

I’m seeing the following error in the logs every now and then:

SEVERE [io.qua.ope.run.exp.otl.VertxGrpcExporter] (vert.x-eventloop-thread-1) Failed to export spans. The request could not be executed. Full error message: Stream was closed

OTLP sink is a Jaeger container (1.48.0, no collector in between), all apps are running in a docker compose setup. I can see those errors locally and on our central testing VMs.

I do see plenty of spans in Jaeger though.

I haven’t seen this issue before in Quarkus 2.16.

Any hints on where to start looking/debugging?

Expected behavior

No response

Actual behavior

No response

How to Reproduce?

n/a ATM

Output of uname -a or ver

No response

Output of java -version

Java 17

GraalVM version (if different from Java)

No response

Quarkus version or git rev

3.3.1

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

I’m also seeing sporadic messages like:

WARN [io.qua.ope.run.QuarkusContextStorage] (executor-thread-143) Context in storage not the expected context, Scope.close was not called correctly

and

ERROR [io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-1) NULL

But not everytime are those messages logged together.

About this issue

  • Original URL
  • State: open
  • Created 10 months ago
  • Reactions: 2
  • Comments: 31 (28 by maintainers)

Most upvoted comments

I might have found the issue, it might be related to connection pooling or the actually the lack of it, I will do some more tests tomorrow and if everything goes right I will open the PR with the fixes

After the changes above I am not seeing the apgrpc/zapgrpc.go:195 [core] [Server #1] grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing" {"grpc_log": true} error in the server logs anymore in my tests =]

Have been sniffing packets to see if there is any difference between OkHttp and Vert.x… the OkHttp client adds content-length, whereas Vert.x adds grpc-accept-encoding (OkHttp uses only accept-encoding)

OkHttp: image

Vert.x: image

Besides that didn’t find a major difference that would make the connection die, as it happens with Vert.x but doesn’t with OkHttp…

One thing that I noted is that OkHttp has a call timeout which is what OTEL uses to set the timeout, and it works by verifying the whole operation… Vert.x on the other hand is setting read timeout with the same value as the OkHttp timeout (tracesConfig.timeout())… just that read timeout is totally different than the call timeout… with vert.x grpc client there is no way to set the whole timeout because we have no access to the RequestOptions… doesn’t matter much since the default timeout in the RequestOptions is 0, which means no timeout…

I’ve found one issue with the current Vert.x implementation is that we have an streaming response, and in the ClientRequestOnSuccessHandler we do request.send(buffer).onSuccess and the handler there is trying to parse status and other stuff that won’t exist at that time, while debugging I’ve found that GrpcStatus is always null and and statusMessage is null as well… the fix for that would be actually processing the status when the response ends… we need to run the code/message extraction in the responde endhandler the code would look something like:

request.send(buffer).onSuccess(new Handler<>() {
                    @Override
                    public void handle(GrpcClientResponse<Buffer, Buffer> response) {
                        response.endHandler(new Handler<Void>() {
                                    @Override
                                    public void handle(Void event) {
                                        GrpcStatus status = getStatus(response);
                                        if (status == GrpcStatus.OK) {
                                            exporterMetrics.addSuccess(numItems);
                                            result.succeed();
                                            return;
                                        }
                                        String statusMessage = getStatusMessage(response);
                                        if (statusMessage == null) {
                                            exporterMetrics.addSuccess(numItems);
                                            result.succeed();
                                            return;
                                        }

                                        logAppropriateWarning(status, statusMessage);
                                        exporterMetrics.addFailed(numItems);
                                        result.fail();
                                    }
                                })
                                .exceptionHandler(new Handler<Throwable>() {
                                    @Override
                                    public void handle(Throwable t) {
                                        exporterMetrics.addFailed(numItems);
                                        logger.log(
                                                Level.SEVERE,
                                                "Failed to export "
                                                        + type
                                                        + "s. Stream failed. Full error message: "
                                                        + t.getMessage());
                                        result.fail();
                                    }
                                });
                    }

Still don’t think that fixing the response parsing will fix the issue… just wanted to share these findings…

In the mean time internally we have made an okhttp exporter extension and we are using that with Quarkus 3.5 overriding (removing) the Vert.x exporter, and not seeing any errors… one thing that I noted that might have brought errors to the default exporter on Quarkus is that the OkHttp version is defaulted to 3.14.9 and okio 1.7.9… and OTEL uses 4.11.0 which brings also a new okio… so there was a version mismatch there… but with our extension we forced the new okhttp version 4.12.0 and okio 3.6.0… with that we had no memory issues nor anything else…

Don’t think that it will help as the PR is handling the shutdown, and those logs are happening during the app execution. Even increasing the timeout, and decreasing the batch size didn’t solve it, it reduced the amount of logs but it is still there, but if with OkHTTP that didn’t happen, something is being handled differently by the Vert.x GRPC client…

Hi @famod. Thanks for the debugging. I was able to reproduce the issue by killing Jaeger: SEVERE traceId=, parentId=, spanId=, sampled= [io.qu.op.ru.ex.ot.VertxGrpcExporter] (vert.x-eventloop-thread-1) Failed to export spans. The request could not be executed. Full error message: Stream was closed This message will show up if either end of the connection abruptly disconnects while transmitting the spans. Will create a PR…

Setting vert.x’ ConnectionBase to DEBUG reveals:

[io.ver.cor.net.imp.ConnectionBase] (vert.x-eventloop-thread-5) The connection will be closed due to timeout

Trying to dig deeper…