kotlinx.coroutines: CancellationException can be thrown not only to indicate cancellation
https://betterprogramming.pub/the-silent-killer-thats-crashing-your-coroutines-9171d1e8f79b raises a good point: CancellationException should indicate that the coroutine in which it is raised is cancelled, but it’s not always actually the case:
fun testFoo() {
val scope = CoroutineScope(Dispatchers.IO)
val f = scope.async {
delay(1000)
}
scope.cancel()
var result = false
runBlocking {
launch {
f.await() // throws CancellationException, silently stopping the `launch` coroutine
result = true
}.join()
}
check(result) { "issue" }
}
Here, the programmer didn’t make the mistake of catching a CancellationException and rethrowing it somewhere else, but still, the CancellationException leaked from a different scope.
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 15
- Comments: 18 (9 by maintainers)
Commits related to this issue
- Improve the explanation of how `await` throws exceptions Fixes two issues: * It is surprising for some users that the same exception can be thrown several times. Clarified this point explicitly. * ... — committed to Kotlin/kotlinx.coroutines by dkhalanskyjb 5 months ago
- Improve the explanation of how `await` throws exceptions Fixes two issues: * It is surprising for some users that the same exception can be thrown several times. Clarified this point explicitly. * ... — committed to Kotlin/kotlinx.coroutines by dkhalanskyjb 5 months ago
- Improve the explanation of how `await` throws exceptions Fixes two issues: * It is surprising for some users that the same exception can be thrown several times. Clarified this point explicitly. * ... — committed to Kotlin/kotlinx.coroutines by dkhalanskyjb 5 months ago
Please don’t make that the recommended pattern. The point of
CancellationExceptionis that it’s the only exception that safely bubbles up to the next checkpoint (be thatlaunchor sometryblock with a custom subclass ofCancellationException). This bubbling behavior is very important. The proposed pattern is bad because intermediate try-catch layers will believe there is an error instead of controlled backtracking. We useCancellationExceptionas a signal for safe abortion in our codebase because that’s the only exception which gets actively ignored by intermediate code layers.The proper solution is to fix the APIs where possible and introduce new fixed APIs where backwards compatibility is needed. The
ensureActive()workaround is imprecise and not a true causal relationship.If you are using
deferred.await()in our code and you want to distinguish the case where the coroutine you’re awaiting for was cancelled or not, you already can do it by explicitly checkingdeferred.isCancelled. However, I don’t think the need to make this distinction happens very often.IMO, the great takeaway from the original post on the “The Silent Killer” is this pattern:
That should be the recommended pattern to use instead of trying to catch
CancellationExceptionand ignoring it. That is a clear documentation thing, but we might also consider doing some utilities to help with it.However, there is also one thing that can be fixed in the coroutines library itself in a backwards compatible way. The current state of affairs is that the coroutine that dies by a rogue
CancellationExceptionindeed does so completely silently, even though it is not cancelled itself. This can be clearly demonstrated by the following code:Neither coroutine “one” nor coroutine “two” is cancelled, both fail with exception, yet only the exception for the failure of the first coroutine is shown. We can consider changing this logic to report that the second coroutine has failed with
CancelletationException, too.Overall, I agree with your points.
This is the main problem. Yes, this would be painful, to the point I personally would call hopeless.
kotlinx-coroutinesguarantees, for example, that if libraryAdepends onB, whereBonly uses stable APIs, andCdepends onkotlinx-coroutines, then upgrading the version ofkotlinx-coroutinesshould not require recompiling the libraryB. So, even if we somehow magically manage to make all code automatically behave strictly better after it’s recompiled, there’s a whole world of binary compatibility headaches of the form “what ifAsends aDeferredtoB,” or “what ifBreturns aDeferredtoA,” or “what ifBcallscancel, butAchecks the cancellation exception,” and so on, and so on. There’s a plethora of interactions we’d need to consider to even hope to do this correctly. By my estimates, this would amount to dropping everything else we’re doing and brainstorming just this task, and even then, there wouldn’t be a guarantee that we’d be able to evolve this (really ubiquitous) API.If there ever is a
kotlinx.coroutines2.0 that is allowed to break compatibility freely (NB: I don’t mean indiscriminately), I’m certain this is an area we’d do differently, more in line with what you’re saying.I agree that the pattern proposed above is suboptimal. Here’s a better one:
The pattern without
if (e is CancellationException)isn’t in line with the prompt cancellation guarantee that a lot of our functions give. In essence, prompt cancellation states that cancellation of the calling coroutine takes precedence over normal values, but not over exceptions. Without theif, the cancellation exception really does suppress the other exception, which is inconsistent and probably non-desirable.I was debugging something like this for two days(in a production environment, had to add many log/counter and deploy and wait a reproducer, do that again for at least 20 times, VERY bad experience), even suspected at one point that it was a compiler BUG…
the
all data was processedgot printed so I assume thatall data was processed(seriously, is this code looks anything wrong for anybody? but guess what, it’s not, and NO EXCEPTIONS IS LOGGED! it looks like a mysteriousbreakwas in theforloop, the coroutine just vanishes… no return, no exception, how is that possible…well, this is because deep down in the
handleData(data)callstack, someone introducedwithTimeout… a reasonable change, but blow up the whole thing 😦Anyway I diverge, I agree with @elizarov “We can consider changing this logic to report that the second coroutine has failed with CancelletationException, too”, and here’s my advice:
CancellationExceptionshould have a private mark on it / or the cancelling coroutine should have a private mark of the current propagatingCancellationExceptionCancellationExceptionpropagated, and is in the correct structured concurrency way, keep the current behaviorCancellationExceptionis not propagating in the correct structured concurrency way(for example the case mentioned in the original post, or the user grab that reference and throwing them in another coroutine etc), treat them as error(throw outwards and don’t swallow themResultor un-subclassingTimeoutCancellationExceptionfromCancellationException(backwards incompatible and very hard to find in tons ofcatch(e:CancellationException){...}2 cents on the design, please destroy everything I say.
AFAIU
CancellationExceptionis just a signal to terminate a cancelled coroutine promptly without breaking the exceptions effect system (aka run the finally blocks). If there were no exceptions, then a coroutine could be terminated by simply suspending it without resuming it. However, currently kotlinx coroutinesCancellationExceptionnot only as a signal, but also as a result (the result of a cancelled deferred coroutine can beCancellationException).CancellationException(even me manually throwing aCancellationExceptioncan result in a silent coroutine death)Kotlinx coroutines also encodes three possible states:
with a single
cause: Throwable?where the cancelled and failure states are unionized in thecause != null(cancelled=cause is CancellationException). In practice it makes sense, as the result of a couroutine can be technically an exception or a result, but IMHO it’s important to remember that we use an exception just because we are forced to do so just for the cancellation signal to not break the exception system. Once that the exception reaches the coroutine it should be unpacked to its actual meaning: is this the cancellation signal? then the coroutine state changes to cancelled. Is this not the cancellation signal? The the coroutine state changes to completed exceptionally. Again, if there were no exceptions and we implemented cancellation by just suspending the coroutine, we would likely have the cancelled state as a seperate one instead of encoding it ascause is CancellationException.If the above makes sense, then kotlinx coroutines could:
CancellationExceptionas the result of a deferred coroutine. For example, if the deferred coroutine is cancelled, thenawaitcould throw aCancelledCoroutineException(a failure).CancellationExceptionwhich allows the coroutine completed with it to verify the token and update the state in response.In an ideal world,
CancellationExceptioncould be an internal implementation detail of kotlinx coroutines which is used to implement the signal transparently. In practice, developers catchingThrowablewill always also catch theCancellationException, so they have to rethrow it.Instead of
we could have
The state of the coroutine conceptually would be:
and
Deferred.awaitwould doThis is just to say that IMHO this should not be “solved” by the developer using the
coroutineContext.ensureActive()pattern, which is just a way of condensing the following logic:This is not a great pattern IMHO, and should be handled by the library if possible, even if getting there with backwards compatibility could be painful.
@throwable-one, it seems your problem is not with how exceptions are handled in the library overall but with the specific behavior of
await. Here’s one way to implement the function you want:Just a
Value(with the implicitThrowablethat can surface at any point in a JVM program) is even more minimal.Well, this is as minimal as
Value | Throwableunion can be at the momentThis is a robust solution, but non-Kotlin-style. Kotlin tries to keep hierarchies of return types as thin as possible to save the users the boilerplate of unpacking the results. For example,
Map<Int, String?>::getdoesn’t return anOptional<String?>, it just returnsString?.The above could be implemented as an extension function:
await(probably another function since this one is taken) should returnkotlin.Resultawaitresumes with CE, then it means the current coroutine was cancelled;awaitresumes withResult, thenasyncwas completed before the current coroutine was cancelled, and it represents a value/CE/failure of theasync.Need to dive deep into the implementation to see if this is possible, but it looks like modifying
Deferredto throw a non-CancellationExceptionmay not be a good solution.Consider this simplified scenario:
Here, we possibly have a race: the global scope is cancelled, which causes the coroutines in
asyncandlaunchalso to be cancelled. However, it can happen that this sequence of events happens beforelaunchgets cancelled:asyncgets cancelled,Deferredlearns about the cancellation exception,awaitfinishes with a non-CancellationExceptionin thelaunch,launchto fail with a non-CancellationException,runBlockingfails!So, the code can fail with one exception or rarely with another. This is just one simplified example of a broader class of races that we could introduce.