kotlinx.coroutines: Possible bug with `ThreadLocal.asContextElement`

Hello, I’m observing a strange behavior with the following code:

	val t: ThreadLocal<String> = ThreadLocal.withInitial { "a" }

	@Test
	fun testContext1() = runBlocking {
		println(t.get())
		withContext(coroutineContext.plus(t.asContextElement("b"))) {
			println(t.get())
		}
		println(t.get())
	}

	@Test
	fun testContext2() = runBlocking {
		println(t.get())
		withContext(coroutineContext.plus(t.asContextElement("b"))) {
			println(t.get())
			delay(Random.nextLong(100))
			println(t.get())
		}
		println(t.get())
	}

	@Test
	fun testContext3() = runBlocking {
		println(t.get())
		async(coroutineContext.plus(t.asContextElement("b"))) {
			println(t.get())
			delay(Random.nextLong(100))
			println(t.get())
		}.await()
		println(t.get())
	}

The result is:

  • testContext1 prints aba - expected
  • testContext2 prints abbb - It looks like the value is not restored for some reason. Is this a bug? What’s the explanation?
  • testContext3 prints abba - expected

Please help!

I’m using org.jetbrains.kotlinx:kotlinx-coroutines-core:1.1.1

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 6
  • Comments: 34 (11 by maintainers)

Commits related to this issue

Most upvoted comments

We assume that kotlinx-coroutines-slf4j is used in the following way. Consider the application that is using SLF4J for logging, uses coroutines, and uses MDC. This application adds kotlinx-coroutines-slf4j to its dependencies and adds MDCContext(emptyMap()) to its top-level scope to give a “signal” to coroutine machinery that it is going to use MDCContext. After that, its MDC will be always correctly set/restored and reset and switching between different coroutines.

Until there’s a new release, it would be great to have a clear example of what this “expected usage” looks like. Specifically, the documentation for kotlinx-coroutines-slf4j is extremely short and could benefit from a warning about the caveat caused by this bug and an example of a workaround. Might be best tracked as a separate issue, not sure.

Yes, this is the existing limitation of MDCContext exposed more strictly.

Quoting our KDoc:

Note that you cannot update MDC context from inside of the coroutine simply using MDC.put. These updates are going to be lost on the next suspension and reinstalled to the MDC context that was captured or explicitly specified in contextMap when this object was created on the next resumption.

This is a by-design limitation, that become much more clearer when more than one coroutine is launched.

E.g.:

runBlocking((MDCContext()) {
    launch(someDispatcher) {
        // mutate MDC
    }.join()
    // use MDC here as well
}

do you expect mutated MDC to be available in runBlocking coroutine. What if someDispatcher is Dispatchers.Unconfined and what if it’s concurrent? And what if it’s concurrent, but it happened to be dispatched on the same thread as runBlocking coroutine?

Yes, we are going to release the fix with the next release

Hello,

I am experiencing the same issue even with the code base that is not huge and is mostly under our control. Our services use MDC logging, opencensus tracing, thread locals, and GRPC contexts. It has been a huge challenge to wire everything up correctly to work with coroutine context propagation.

For the code under our control, we need to remember to seed all the different context elements every time somebody starts a coroutine. This leads to tight coupling, every piece of code that wants to start a coroutine needs to be aware of all the thread context elements in the entire codebase. And of course, it is very error-prone.

The problem is exacerbated by GRPC launching coroutines without giving everyone the opportunity to override the behavior. I imagine other libraries wanting to do the same.

The described behavior is surprising, to say the least. I don’t totally understand why global injection is needed to fix this. In the examples above, there are clear scope/context entry and exit points. Could you explain why the restoreThreadContext is not invoked at the block’s end? It’s especially surprising behavior because if you remove delay invocation from the examples above, withContext works as expected!

Are there any updates on this issue?

We assume that kotlinx-coroutines-slf4j is used in the following way. Consider the application that is using SLF4J for logging, uses coroutines, and uses MDC. This application adds kotlinx-coroutines-slf4j to its dependencies and adds MDCContext(emptyMap()) to its top-level scope to give a “signal” to coroutine machinery that it is going to use MDCContext. After that, its MDC will be always correctly set/restored and reset and switching between different coroutines.

Unfortunately, this does not currently affect GlobalScope and requires some boiler-plate code from the application writer. E.g., if my application has code like val myScope = CoroutineScope(someContext) then I, as an application author, have to remember to add MDCContext(emptyMap()) to this scope’s context.

What we can do to improve this situation is to give integration modules like kotlinx-coroutines-slf4j and ability to globally inject context elements to all scopes (including GlobalScope). This should considerably improve usability. All you’ll need to do in this case is to add kotlinx-coroutines-slf4j to dependencies and that’s it. What do you think?

P.S. The downside of this approach to consider is that anyone who adds kotlinx-coroutines-slf4j to their dependencies will get a performance hit even if they don’t use MDC (as an afterthought – there’s no other reason to add kotlinx-coroutines-slf4j, but for MDC support).

It would also be nice to add the warning to the documentation of ThreadContextElement too. At the moment it does not mention that restoreThreadContext() may not be called when it would be expected by users implementing ThreadContextElement themselves.

And it would be much better to have thread state restored outside withContext { }, because a library may not have control over initial context of the coroutine, so this feature becomes unusable there.