armeria: Fatal error in gRPC coroutines stack caused by "An error occurred while pushing a context"
Today we had a very strange fatal crash in one of our server instances (hosted in ECS/Fargate). We had to restart the instance, otherwise it kept crashing…
Our stack is Armeria 1.13.1 / kotlin 1.5.31; kotlin coroutines 1.5.2; gRPC kotlin 1.2.0; gRPC java 1.41.0;. The gRPC service is based on kotlin coroutines with armeria blocking task executor.
Seems that the initial exception that was recorded in logs, was “An error occurred while pushing a context”:
java.lang.IllegalStateException: Trying to call object wrapped with context [sreqId=e4462b39, chanId=72864b63, raddr=10.204.22.170:49513, laddr=10.204.21.148:8081][h2c://ip-10-204-21-148.ap-southeast-2.compute.internal/<service and operation name that failed>#POST], but context is currently set to [sreqId=5f01fe01, chanId=966cfccd, raddr=10.204.22.170:54644, laddr=10.204.21.148:8081][h1c://ip-10-204-21-148.ap-southeast-2.compute.internal/<full service and operation name>t#POST]. This means the callback was called from unexpected thread or forgetting to close previous context.\
at com.linecorp.armeria.internal.common.RequestContextUtil.newIllegalContextPushingException(RequestContextUtil.java:139)\
at com.linecorp.armeria.server.ServiceRequestContext.push(ServiceRequestContext.java:234)\
at com.linecorp.armeria.common.RequestContext.lambda$makeContextAware$3(RequestContext.java:546)\
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)\
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)\
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)\
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)\
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)\
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\
at java.base/java.lang.Thread.run(Unknown Source)\
After that, we had hundreds of coroutine-related exceptions - all caused by the same wrong context (sreqId=5f01fe01):
CoroutinesInternalError
Fatal exception in coroutines machinery for DispatchedContinuation[com.linecorp.armeria.common.util.DefaultBlockingTaskExecutor@34e9f91f, Continuation at io.grpc.kotlin.ServerCalls$serverCallListener$1.invokeSuspend(ServerCalls.kt)@61c5d0c6]. Please read KDoc to 'handleFatalException' method and report this incident to maintainers
....
IllegalStateException
Trying to call object wrapped with context [sreqId=067cd880, chanId=217d479a, raddr=10.204.21.9:22674, laddr=10.204.21.148:8081][h2c://ip-10-204-21-148.ap-southeast-2.compute.internal/<service and operation name>#POST], but context is currently set to [sreqId=5f01fe01, chanId=966cfccd, raddr=10.204.22.170:54644, laddr=10.204.21.148:8081][h1c://ip-10-204-21-148.ap-southeast-2.compute.internal/<the same original service and operation name>#POST]. This means the callback was called from unexpected thread or forgetting to close previous context.
com.linecorp.armeria.internal.common.RequestContextUtil in newIllegalContextPushingException at line 139
com.linecorp.armeria.server.ServiceRequestContext in push at line 234
com.linecorp.armeria.server.grpc.ArmeriaRequestCoroutineContext in updateThreadContext at line 46
com.linecorp.armeria.server.grpc.ArmeriaRequestCoroutineContext in updateThreadContext at line 30
kotlinx.coroutines.internal.ThreadContextKt$updateState$1 in invoke at line 54
kotlinx.coroutines.internal.ThreadContextKt$updateState$1 in invoke at line 52
kotlin.coroutines.CoroutineContext$Element$DefaultImpls in fold at line 70
kotlin.coroutines.AbstractCoroutineContextElement in fold at line 15
kotlin.coroutines.CombinedContext in fold at line 131
2
kotlinx.coroutines.internal.ThreadContextKt in updateThreadContext at line 72
kotlinx.coroutines.DispatchedTask in run at line 224
java.util.concurrent.Executors$RunnableAdapter in call
java.util.concurrent.FutureTask in run
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask in run
java.util.concurrent.ThreadPoolExecutor in runWorker
java.util.concurrent.ThreadPoolExecutor$Worker in run
io.netty.util.concurrent.FastThreadLocalRunnable in run at line 30
java.lang.Thread in run
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 2
- Comments: 16 (8 by maintainers)
Thanks, @kwazii1231 for the thorough analysis. I agree that https://github.com/Kotlin/kotlinx.coroutines/issues/2930 caused the problem and it is fixed by https://github.com/Kotlin/kotlinx.coroutines/releases/tag/1.6.2.
after trying to reproduce this issue with @ikhoon i found very mandatory condition of this. if project use coroutine version below, this issue can be reproduce implementation(“org.jetbrains.kotlinx:kotlinx-coroutines-core-jvm:1.5.x”)
in my case, my project use spring version 2.6.3 and gradle plugin for managing dependency plugin automatically downloaded “org.jetbrains.kotlinx:kotlinx-coroutines-core-jvm:1.5.2” after forced change version to 1.6.4 or upgrade spring boot version to 2.7.x it didn’t happen anymore.
kotlin 1.6.1 release note contain fix of this issue
Rollback of time-related functions dispatching on Dispatchers.Main. This behavior was introduced in 1.6.0 and then found inconvenient and erroneous (https://github.com/Kotlin/kotlinx.coroutines/issues/3106, https://github.com/Kotlin/kotlinx.coroutines/issues/3113). Reworked the newly-introduced CopyableThreadContextElement to solve issues uncovered after the initial release (https://github.com/Kotlin/kotlinx.coroutines/pull/3227). Fixed a bug with ThreadLocalElement not being properly updated in racy scenarios (https://github.com/Kotlin/kotlinx.coroutines/issues/2930). Reverted eager loading of default CoroutineExceptionHandler that triggered ANR on some devices (https://github.com/Kotlin/kotlinx.coroutines/issues/3180). New API to convert a CoroutineDispatcher to a Rx scheduler (https://github.com/Kotlin/kotlinx.coroutines/issues/968, https://github.com/Kotlin/kotlinx.coroutines/issues/548). Thanks @recheej! Fixed a memory leak with the very last element emitted from flow builder being retained in memory (https://github.com/Kotlin/kotlinx.coroutines/issues/3197). Fixed a bug with limitedParallelism on K/N with new memory model throwing ClassCastException (https://github.com/Kotlin/kotlinx.coroutines/issues/3223). CoroutineContext is added to the exception printed to the default CoroutineExceptionHandler to improve debuggability (https://github.com/Kotlin/kotlinx.coroutines/issues/3153). Static memory consumption of Dispatchers.Default was significantly reduced (https://github.com/Kotlin/kotlinx.coroutines/pull/3137). Updated slf4j version in kotlinx-coroutines-slf4j from 1.7.25 to 1.7.32.
maybe leak is related with one of above issues i think this one is related https://github.com/Kotlin/kotlinx.coroutines/issues/2930