spring-boot: Spring Boot 3 Webflux project missing traceId and spanId in logs
When upgrading to Spring Boot 3 (and Micrometer) I noticed the traceId and spanId are not added to the logs when it’s a Webflux project.
Webflux: https://github.com/sanderino666/webflux-micrometer-logs
2022-11-26T17:27:44.991+01:00 INFO [,,] 19376 --- [ctor-http-nio-3] com.example.demo.DemoApplication : home() has been called
MVC: https://github.com/sanderino666/mvc-micrometer-logs
2022-11-26T17:24:12.050+01:00 INFO [,63823dabe8475ad8d31ec1fc81853dcc,d31ec1fc81853dcc] 1848 --- [nio-8080-exec-1] com.example.demomvc.DemoMvcApplication : home() has been called
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 27
- Comments: 69 (25 by maintainers)
Yes, that would be required.
We are working on that 😃
For now, if you want to have the loggers have correlated MDC context, then yes, it will require some work. Because you could instead of doing
doOnNextjust calltap()orhandle()and then you’ll get the MDC correlated for you.I’m in touch with @OlegDokuka and the reactor team to work on something more automated that would resemble how Sleuth worked but would be more performant and less buggy. But that will take some time.
This’s way too complicated after Sleuth, still no solution to propagate same traceID between microservices ?
P.s. I’m having same issue with Java spring and graphql client
@silverwind85 I had the same problem. Lately I was able to get traceIds back in logs with spring-boot-starter version 3.0.5 without using the fix to set
What I had to do:
Hooks.enableAutomaticContextPropagation()in the application’s main classI found that as I debugged and saw that somehow the observationHandlers were not registered for the ObservationRegistry, thus making it return
truewhen called isNoop().I found that within io.micrometer.observation.SimpleObservationRegistry.isNoop()
@marcingrzejszczak Ok thank you. Last question: So this means that I can use the function:
in any place in my Spring Boot code and the Reactor context will have the correct values so that trace and span id will be included in the logs?
Even with
Hooks.enableAutomaticContextPropagation(), unable to get the traceId in logs for a WebFilter ? Is there a workaround, please ?@marcingrzejszczak using new versions, the behavior of the traceId and SpanId would be the same as Sleuth had? I tried using those but it is still not propagating the traceId and spanId to the other threads.
11:27:45->[XNIO-1 task-2] [INFO] - [63f8c961d374727e2acbbd4585e64428, 3ef262641c3e01ce] - log entry 1 11:29:57->[XNIO-1 task-2] [INFO] - [63f8c961d374727e2acbbd4585e64428, 3ef262641c3e01ce] - log entry 2 11:29:57->[reactor-http-nio-2] [INFO] - [, ] - log entry 3 11:29:57->[reactor-http-nio-2] [INFO] - [, ] - log entry 4
Still not able to see the traceid and spanid propagate to calls inside methods like .doOnError() or .onErrorResume(). Were those cases be handled as well? @marcingrzejszczak
Great to know!
One last thing, it’d be good to document how to proceed with Kotlin coroutines as well. This approach doesn’t look quite idiomatic, is there a better (more idiomatic) way to do the MDC setup in this case?
Interesting, we are switching from Spring Cloud Sleuth to Micrometer and with SCS this all worked just like it did in a MVC app. Was that sheer luck?
I am trying to understand the answers spring-projects/spring-framework#29466, does it mean we have to wrap each time we want pass the traceId and spanId to the logs?
@gabrielfmagalhaes if you require assistance for instrumenting your library, please reach out to the micrometer community on the Slack channel so we can help you.
@rsercano if there is a bug with graphql instrumentation, please share a sample application demonstrating the problem and create a new issue in the spring graphql project.
To anyone here: this issue is closed and is not meant to address any issue related to observability and Spring. Adding more comments about somewhat related problems is not helping. The issue title might be too broad in the first place.
@marcingrzejszczak do we have a release plan for that?
WebClient automatically injects trace context to the headers - https://github.com/spring-projects/spring-framework/blob/main/spring-webflux/src/test/java/org/springframework/web/reactive/function/client/WebClientObservationTests.java . We have also standalone samples for that here https://github.com/micrometer-metrics/micrometer-samples/tree/main/webclient
You don’t have to because you will pollute the threads. What we do is we require the
ObservationThreadLocalAccessor.KEYto be there in Reactor Context and then we manually correlate observations - https://github.com/spring-projects/spring-framework/blob/main/spring-webflux/src/main/java/org/springframework/web/reactive/function/client/DefaultWebClient.java#L464Have you created the WebFilter as a bean like this? https://github.com/micrometer-metrics/micrometer-samples/blob/main/webclient/src/main/java/com/example/micrometer/WebClientApplication.java#L48 Maybe your ObservationRegistry wasn’t injected. For sure you shouldn’t open scopes as you showed that in the sample.
Hi @silverwind85, I was able to use the micrometer correctly…after some time I realized it works only with reactor version: ‘3.5.3+’
Kafka and Rabbit (messaging frameworks) They need to have in the factory this setting below .setObservationEnabled(true);
Reactive programming Despite the new libraries and stuff, the property below needs to be added for each context opened. .contextCapture() Sample: objectSampleService.getObjectSampleDTO(entityId) .flatMap(objectSampleDTO -> this.sendMessage(objectSampleDTO) .doOnError(throwable -> this.fallbackProcessing(entityId, throwable)) .contextCapture() .subscribe();
I hope that helps you.
After checking with several samples, #34201 will indeed fix the problem initially reported. I’m closing this issue as superseded. We don’t expect this issue to cover all possible behavior differences with Sleuth, so others should have dedicated issues.
@fernandofraga109 please create a new issue with a sample application demonstrating the behavior. It’s hard to know without a concrete application where the problem is. It’s very likely not in Spring Boot, but probably more in Spring Framework, Reactor Netty or Micrometer directly. We can start with an issue in Spring Framework. Thanks!
@VinishKumar1 this part will be covered with #34201
Please check this entry in the docs https://micrometer.io/docs/observation#instrumentation_of_reactive_libraries . For your convenience I’m copying it below
In Reactor 3.5.3 release (through this PR) an option to turn on automated context propagation was added. To use this, please ensure that you’re using the following projects at minimum in the following versions:
To use the feature call the new Reactor’s Hook method (e.g. in your public static void main method) like this
So just manually bump the versions of all of the libraries presented above and you will have this feature available. I guess we’re planning to add this support to the next Boot release cc @bclozel ?
It won’t work at this point the way it worked with Sleuth. We’re working on improving that. Currently you need to use tap, handle or ContextSnapshot API directly https://micrometer.io/docs/observation#instrumentation_of_reactive_libraries
I am also a bit lost here… I went throught the migration guides and I can’t honestly make Spring Webflux work with tracing as I used to with Sleuth.
@codependent Yes unfortunately my solution has a downside to adapt every log statement or at least the important ones like debug, error needed for log correlation when debugging. But I did not found any other less invasive solution yet.
I think you just need to add
.contextCapture(). It work’s without it but according to the docs they are adding it after thehandleortap. But I did not figure out yet why it seems to work without it. Maybe @marcingrzejszczak can help here@codependent I’m also interested in getting this to work in a webflux+coroutine environment and I think I know why you sample code doesn’t work.
As far as I’ve understood, the
observationRegistry.asContextElement()relies on theThreadLocalto load theContextSnapshot(not 100% sure what is this, but I understand that it’s where the “observation settings and values” for the given span are “stored”). In your example, theThreadLocalwould be empty as there’s nothing that configures/decorates for that request (unless something leaks from other requests ofc) . I guess this extensions was intended to be used on the 1 thread per request model, where theThreadLocalcan be (most of the times) trusted.As we are in the “reactor” world here, we need to have a way of building this context element from the reactor context instead. I managed to get it to work by using the following snippet on filter (I’m using functional endpoints):
I’ve quickly tested this and it seems to be working fine, I’m able to log with all MDC fields, including trace and span id.
Anyway, I think there could be a simpler way to achieve this, as this is verbose and also feels that is doing more steps than needed:
contextElement.I’m deducting this based on my understanding of coroutines and reactor contexts, I don’t understand much what happens behind the scenes on the tracing part, so be aware that I could be assuming wrong things here.
It would be nice to have a better way to achieve here. Maybe an extension that could build the context element directly from the reactor context?
Would it be also nice it we could manage to get the traceId/spanId and any other baggage fields from the reactor context directly, instead of having to having to inject a tracer and doing these workarounds (on spring sleuth, it was possible to set and extract baggage values purely on the reactor context, here I couldn’t find a direct way).
@codependent You should inject the auto-configured
WebClient.Builderrather than creating a new one and then callbuild()on it to create yourWebClientbean. Using the auto-configured builder ensures that the client’s instrumented for observation.I was also surprised when upgraded and didn’t see the traceId/spanId in the logs anymore. It’d be really helpful to document the change of behaviour and how to make it work in webflux applications.