kotlinx.coroutines: Thread local is not cleaned up sometimes

I’m using a ThreadContextElement that sets value of a ThreadLocal. After resolving of #985 it worked perfectly. But after upgrade to 1.5.0 I’ve got a similar problem: sometimes the last value of the thread local stucks in a worker thread. Equivalent code:

while(true) {
    someCode {
        // here the thread local may already have a value from previous iteration
        withContext(threadLocal.asContextElement("foo")) {
            someOtherCode()
        }
    }
}

Actual code of the ThreadContextElement implementation is here.

It is hard to reproduce the issue, but I’m facing it periodically in production (it may take hours or days to arise). Tested 1.5.0 and 1.5.2, both behaves the same. Running it with -ea.

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 2
  • Comments: 29 (13 by maintainers)

Commits related to this issue

Most upvoted comments

Hi! Here I am again. The following code fails on 1.6.4:

val threadLocal = ThreadLocal<String>()

suspend fun main() {
    doSomeJob()
    doSomeJob()
    doSomeJob()
}

private suspend fun doSomeJob() {
    check(threadLocal.get() == null)
    withContext(threadLocal.asContextElement("foo")) {
        withTimeoutOrNull<Any>(100) {
            withContext(CoroutineName("foo")) {
                awaitCancellation()
            }
        }
    }
    println("done")
}

playground

Thanks for both Ktor and regular reproducer!

The source of the issue is indeed non-kotlinx.coroutines related entry point that Ktor leverages in order to optimize its internal machinery (SuspendFunGun). #3155 fixed completely different bug that happened to be reproducible with the very same snippet 😃

I have a potential solution in mind (#3252) and also future-proof plan to avoid similar problems (#3253), I believe this issue itself is enough to release 1.6.2 with a fix, though I cannot give you a strict timeline here

Great job with a reproducer! Verified it reproduces, we’ll fix it in 1.6.1

Is there a planned release date for 1.6.1?

Finally I’ve managed to write a small reproducer:

val threadLocal = ThreadLocal<String>()

suspend fun main() {
    while (true) {
        coroutineScope {
            repeat(100) {
                launch {
                    doSomeJob()
                }
            }
        }
    }
}

private suspend fun doSomeJob() {
    check(threadLocal.get() == null)
    withContext(threadLocal.asContextElement("foo")) {
        val semaphore = Semaphore(1, 1)
        suspendCancellableCoroutine<Unit> { cont ->
            Dispatchers.Default.asExecutor().execute {
                cont.resume(Unit)
            }
        }
        cancel()
        semaphore.acquire()
    }
}

It completes almost instantly on my machine and takes some time on play.kotlinlang.org.

Hello.

We are getting something like this after few days in production…

We have a loop like this:


class RequestContextsStorage()
val threadLocalForRequestContext = ThreadLocal<RequestContextsStorage>()

class RequestContextThreadContextElement(private val storage: RequestContextsStorage) :
    ThreadContextElement<RequestContextsStorage> {

    // Key for CoroutineContext key-value storage
    private object Key : CoroutineContext.Key<RequestContextThreadContextElement>

    override val key: CoroutineContext.Key<*> get() = Key

    override fun updateThreadContext(context: CoroutineContext): RequestContextsStorage {
        val oldState = threadLocalForRequestContext.get()
        threadLocalForRequestContext.set(storage)
        return oldState
    }

    override fun restoreThreadContext(context: CoroutineContext, oldState: RequestContextsStorage) {
        threadLocalForRequestContext.set(oldState)
    }
}

private var otherThreadLocal = ThreadLocal<String?>()
private val scope = CoroutineScope(Dispatchers.IO)

scope.launch(otherThreadLocal.asContextElement("x")) {
    while (isActive) {
        delay(100)
        // sometimes here we could see some value in **threadLocalForRequestContext**
        someStuff()
    }
}

Also, all builders are pretty standard. Maybe some tricks with cancellation\exceptions\etc…

1.5.1 version.

And of course, we have a lot of code like this:

runBlocking(Dispatchers.IO) {
    withContext(RequestContextThreadContextElement(someValue) + otherThreadLocal.asContextElement("x")) {
         // everything seems be fine here
    }
}