spring-boot: Spring Boot 3 Webflux project missing traceId and spanId in logs

When upgrading to Spring Boot 3 (and Micrometer) I noticed the traceId and spanId are not added to the logs when it’s a Webflux project.

Webflux: https://github.com/sanderino666/webflux-micrometer-logs

2022-11-26T17:27:44.991+01:00 INFO [,,] 19376 --- [ctor-http-nio-3] com.example.demo.DemoApplication : home() has been called

MVC: https://github.com/sanderino666/mvc-micrometer-logs

2022-11-26T17:24:12.050+01:00 INFO [,63823dabe8475ad8d31ec1fc81853dcc,d31ec1fc81853dcc] 1848 --- [nio-8080-exec-1] com.example.demomvc.DemoMvcApplication : home() has been called

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 27
  • Comments: 69 (25 by maintainers)

Most upvoted comments

@marcingrzejszczak I also tried to use that manual instrumentation however that seems to require using the following construct in every class:

Yes, that would be required.

Wondering if it is worth resurrecting the Spring Sleuth logic for Reactor but with micrometer in mind.

We are working on that 😃

It seems the change from Sleuth to Micrometer Tracing is going to need quite a lot of work on our codebases. Can you confirm this is the proper usage?

For now, if you want to have the loggers have correlated MDC context, then yes, it will require some work. Because you could instead of doing doOnNext just call tap() or handle() and then you’ll get the MDC correlated for you.

I’m in touch with @OlegDokuka and the reactor team to work on something more automated that would resemble how Sleuth worked but would be more performant and less buggy. But that will take some time.

This’s way too complicated after Sleuth, still no solution to propagate same traceID between microservices ?

P.s. I’m having same issue with Java spring and graphql client

@silverwind85 I had the same problem. Lately I was able to get traceIds back in logs with spring-boot-starter version 3.0.5 without using the fix to set

@EnableReactiveMethodSecurity(useAuthorizationManager = false)

What I had to do:

  1. call Hooks.enableAutomaticContextPropagation() in the application’s main class
  2. define a configuration like this:
@Configuration
class ObservationConfig(
    private val observationHandlers: Set<ObservationHandler<*>>
) {
    @Bean
    fun observationRegistry(tracer: Tracer): ObservationRegistry {
        val observationRegistry = ObservationRegistry.create()
        val observationConfig = observationRegistry.observationConfig()
        observationHandlers.forEach(observationConfig::observationHandler)
        return observationRegistry
    }
}

I found that as I debugged and saw that somehow the observationHandlers were not registered for the ObservationRegistry, thus making it return true when called isNoop().

I found that within io.micrometer.observation.SimpleObservationRegistry.isNoop()

@marcingrzejszczak Ok thank you. Last question: So this means that I can use the function:

suspend fun Logger.d(message: String) {
    Mono.just(Unit).handle { t, u ->
        debug(message)
        u.next(t)
    }.awaitSingleOrNull()
}

in any place in my Spring Boot code and the Reactor context will have the correct values so that trace and span id will be included in the logs?

Even with Hooks.enableAutomaticContextPropagation(), unable to get the traceId in logs for a WebFilter ? Is there a workaround, please ?

Please check this entry in the docs https://micrometer.io/docs/observation#instrumentation_of_reactive_libraries . For your convenience I’m copying it below

In Reactor 3.5.3 release (through this PR) an option to turn on automated context propagation was added. To use this, please ensure that you’re using the following projects at minimum in the following versions:

To use the feature call the new Reactor’s Hook method (e.g. in your public static void main method) like this

Hooks.enableAutomaticContextPropagation();

So just manually bump the versions of all of the libraries presented above and you will have this feature available. I guess we’re planning to add this support to the next Boot release cc @bclozel ?

@marcingrzejszczak using new versions, the behavior of the traceId and SpanId would be the same as Sleuth had? I tried using those but it is still not propagating the traceId and spanId to the other threads.

11:27:45->[XNIO-1 task-2] [INFO] - [63f8c961d374727e2acbbd4585e64428, 3ef262641c3e01ce] - log entry 1 11:29:57->[XNIO-1 task-2] [INFO] - [63f8c961d374727e2acbbd4585e64428, 3ef262641c3e01ce] - log entry 2 11:29:57->[reactor-http-nio-2] [INFO] - [, ] - log entry 3 11:29:57->[reactor-http-nio-2] [INFO] - [, ] - log entry 4

Still not able to see the traceid and spanid propagate to calls inside methods like .doOnError() or .onErrorResume(). Were those cases be handled as well? @marcingrzejszczak

It seems the change from Sleuth to Micrometer Tracing is going to need quite a lot of work on our codebases. Can you confirm this is the proper usage?

For now, if you want to have the loggers have correlated MDC context, then yes, it will require some work. Because you could instead of doing doOnNext just call tap() or handle() and then you’ll get the MDC correlated for you.

I’m in touch with @OlegDokuka and the reactor team to work on something more automated that would resemble how Sleuth worked but would be more performant and less buggy. But that will take some time.

Great to know!

One last thing, it’d be good to document how to proceed with Kotlin coroutines as well. This approach doesn’t look quite idiomatic, is there a better (more idiomatic) way to do the MDC setup in this case?

    @PostMapping
    @ResponseStatus(CREATED)
    suspend fun registerMetric(
        Mono.deferContextual { contextView: ContextView ->
            ContextSnapshot.setThreadLocalsFrom(contextView, ObservationThreadLocalAccessor.KEY)
            logger.info("registerMetric()")
            metricEventProducer.send(authentication.clientId, metric).toMono()
        }.awaitSingle()
    }

Interesting, we are switching from Spring Cloud Sleuth to Micrometer and with SCS this all worked just like it did in a MVC app. Was that sheer luck?

I am trying to understand the answers spring-projects/spring-framework#29466, does it mean we have to wrap each time we want pass the traceId and spanId to the logs?

@gabrielfmagalhaes if you require assistance for instrumenting your library, please reach out to the micrometer community on the Slack channel so we can help you.

@rsercano if there is a bug with graphql instrumentation, please share a sample application demonstrating the problem and create a new issue in the spring graphql project.

To anyone here: this issue is closed and is not meant to address any issue related to observability and Spring. Adding more comments about somewhat related problems is not helping. The issue title might be too broad in the first place.

@marcingrzejszczak do we have a release plan for that?

WebClient automatically injects trace context to the headers - https://github.com/spring-projects/spring-framework/blob/main/spring-webflux/src/test/java/org/springframework/web/reactive/function/client/WebClientObservationTests.java . We have also standalone samples for that here https://github.com/micrometer-metrics/micrometer-samples/tree/main/webclient

I guess we are expected to open the scope using the observation located in the context, aren’t we?

You don’t have to because you will pollute the threads. What we do is we require the ObservationThreadLocalAccessor.KEY to be there in Reactor Context and then we manually correlate observations - https://github.com/spring-projects/spring-framework/blob/main/spring-webflux/src/main/java/org/springframework/web/reactive/function/client/DefaultWebClient.java#L464

Have you created the WebFilter as a bean like this? https://github.com/micrometer-metrics/micrometer-samples/blob/main/webclient/src/main/java/com/example/micrometer/WebClientApplication.java#L48 Maybe your ObservationRegistry wasn’t injected. For sure you shouldn’t open scopes as you showed that in the sample.

I have the same problem like you @fernandofraga109. The first thread shows tracking Id, the second thread doesn’t ( not propagating ). You opened some issue @fernandofraga109? I don’t use spring security so this solution is not needed @EnableReactiveMethodSecurity(useAuthorizationManager = false) I have following version: > > * Reactor 3.5.4

  • Micrometer Context-Propagation 1.0.3
  • Micrometer 1.10.4
  • Micrometer Tracing 1.0.2

Hi @silverwind85, I was able to use the micrometer correctly…after some time I realized it works only with reactor version: ‘3.5.3+’

Kafka and Rabbit (messaging frameworks) They need to have in the factory this setting below .setObservationEnabled(true);

Reactive programming Despite the new libraries and stuff, the property below needs to be added for each context opened. .contextCapture() Sample: objectSampleService.getObjectSampleDTO(entityId) .flatMap(objectSampleDTO -> this.sendMessage(objectSampleDTO) .doOnError(throwable -> this.fallbackProcessing(entityId, throwable)) .contextCapture() .subscribe();

I hope that helps you.

After checking with several samples, #34201 will indeed fix the problem initially reported. I’m closing this issue as superseded. We don’t expect this issue to cover all possible behavior differences with Sleuth, so others should have dedicated issues.

@fernandofraga109 please create a new issue with a sample application demonstrating the behavior. It’s hard to know without a concrete application where the problem is. It’s very likely not in Spring Boot, but probably more in Spring Framework, Reactor Netty or Micrometer directly. We can start with an issue in Spring Framework. Thanks!

@VinishKumar1 this part will be covered with #34201

Please check this entry in the docs https://micrometer.io/docs/observation#instrumentation_of_reactive_libraries . For your convenience I’m copying it below

In Reactor 3.5.3 release (through this PR) an option to turn on automated context propagation was added. To use this, please ensure that you’re using the following projects at minimum in the following versions:

To use the feature call the new Reactor’s Hook method (e.g. in your public static void main method) like this

Hooks.enableAutomaticContextPropagation();

So just manually bump the versions of all of the libraries presented above and you will have this feature available. I guess we’re planning to add this support to the next Boot release cc @bclozel ?

It won’t work at this point the way it worked with Sleuth. We’re working on improving that. Currently you need to use tap, handle or ContextSnapshot API directly https://micrometer.io/docs/observation#instrumentation_of_reactive_libraries

I am also a bit lost here… I went throught the migration guides and I can’t honestly make Spring Webflux work with tracing as I used to with Sleuth.

@codependent Yes unfortunately my solution has a downside to adapt every log statement or at least the important ones like debug, error needed for log correlation when debugging. But I did not found any other less invasive solution yet.

I think you just need to add .contextCapture() . It work’s without it but according to the docs they are adding it after the handle or tap . But I did not figure out yet why it seems to work without it. Maybe @marcingrzejszczak can help here

@codependent I’m also interested in getting this to work in a webflux+coroutine environment and I think I know why you sample code doesn’t work.

As far as I’ve understood, the observationRegistry.asContextElement() relies on the ThreadLocal to load the ContextSnapshot (not 100% sure what is this, but I understand that it’s where the “observation settings and values” for the given span are “stored”). In your example, the ThreadLocal would be empty as there’s nothing that configures/decorates for that request (unless something leaks from other requests ofc) . I guess this extensions was intended to be used on the 1 thread per request model, where the ThreadLocal can be (most of the times) trusted.

As we are in the “reactor” world here, we need to have a way of building this context element from the reactor context instead. I managed to get it to work by using the following snippet on filter (I’m using functional endpoints):

val coContextElement = ContextSnapshot.setThreadLocalsFrom(
                coroutineContext[ReactorContext]!!.context,
                ObservationThreadLocalAccessor.KEY
            ).use {
                observationRegistry.asContextElement()
            }

withContext(coContextElement){
// Everything should work just fine inside this scope
}

I’ve quickly tested this and it seems to be working fine, I’m able to log with all MDC fields, including trace and span id.

Anyway, I think there could be a simpler way to achieve this, as this is verbose and also feels that is doing more steps than needed:

  • Get reactor context from coroutine context,
  • Get observation “settings” (not sure how to call it 😄 ) from reactor context.
  • Set Thread locals just to get the correct contextElement.
  • Shift the coroutine context to this new context element.

I’m deducting this based on my understanding of coroutines and reactor contexts, I don’t understand much what happens behind the scenes on the tracing part, so be aware that I could be assuming wrong things here.

It would be nice to have a better way to achieve here. Maybe an extension that could build the context element directly from the reactor context?

Would it be also nice it we could manage to get the traceId/spanId and any other baggage fields from the reactor context directly, instead of having to having to inject a tracer and doing these workarounds (on spring sleuth, it was possible to set and extract baggage values purely on the reactor context, here I couldn’t find a direct way).

@codependent You should inject the auto-configured WebClient.Builder rather than creating a new one and then call build() on it to create your WebClient bean. Using the auto-configured builder ensures that the client’s instrumented for observation.

I was also surprised when upgraded and didn’t see the traceId/spanId in the logs anymore. It’d be really helpful to document the change of behaviour and how to make it work in webflux applications.