reactor-core: onErrorContinue() design

Not a bug per-se, but a discussion - is onErrorContinue() (and associated onErrorStop()) the right solution, and if not, can the same functionality be designed in a more obvious way?

I keep noticing people tripping up on it, and I’d be lying if I said it also hadn’t caught me out a few times. There’s two main issues I keep seeing with onErrorContinue().

Friendly name

There’s no hint from the name or Javadoc that it’s a specialist operator that should be used with care - it seems analogous to onErrorResume(). The name seems very “friendly” - as in “hey, that name looks like what I want to do, so I’ll use it!” This is in contrast to (for example) block(), which makes it obvious it’s not something you want to do.

The Javadoc doesn’t help much either - there’s only a small note below the diagram that “this error handling mode is not necessarily implemented by all operators”. I’ve therefore understandably seen people use onErrorContinue() in complete ignorance that it’s a specialist operator they should use with care (onErrorResume() would have been a better choice in almost all those cases.)

Complex reasoning

However, I think the problem goes deeper than the above. Even for an admittedly “specialist” operator that needs to be used with care, and even for someone who knows their way around reactor reasonably, it still makes a stream incredibly hard to reason about in a whole bunch of scenarios:

Swallowing unexpected exceptions

There’s certain exceptions (or exceptions that occur in certain places) that you wouldn’t necessarily expect it to “swallow”, but it does (such as retry exceptions, as per this issue raised a while ago.)

Going “too far” up the chain

This one is a bit looser, but library writers returning a Flux will often assume, as per the reactive streams spec, that an error will always cause the Flux to halt, where they can either let the operator propagate or resume to take corrective action. If someone has called onErrorContinue() downstream, this can mess up the library code in a way the author didn’t intend, causing unexpected consequences.

This could of course be fixed by all library writers appending onErrorStop() to each publisher at the end of the chain, but in reality no-one seems to do that.

No operator chaining

Since onErrorContinue() only affects operators above it, and doesn’t chain with other onErrorContinue() calls, you wind up with counter-intuitive situations like this:

    Flux.range(1,10).flatMap(x -> Mono.error(new RuntimeException()))
            .onErrorContinue(IllegalArgumentException.class, (t,o)-> System.out.println("A")) //Not called
            .onErrorContinue(RuntimeException.class, (t,o)-> System.out.println("B")) //Not called

…which of course, won’t cause either of those blocks to be executed since the IllegalArgumentException block is declared first.

Inconsistent continuing with a single value

If the Flux just contains a single value through use of Flux.just(), continuing doesn’t seem to always work reliably:

    Flux.just(1,1)
            .flatMap(ctx -> Flux.push((sink)->{ throw new RuntimeException();}))
            .onErrorContinue((ex, obj)->{
                System.err.println("Caught"); //Called, prints twice
            })

…but:

    Flux.just(1)
            .flatMap(ctx -> Flux.push((sink)->{ throw new RuntimeException();}))
            .onErrorContinue((ex, obj)->{
                System.err.println("Caught"); //Not called
            })

…but:

    Flux.just(1)
            .flatMap(x -> Mono.error(new RuntimeException()))
            .onErrorContinue((ex, obj)->{
                System.err.println("Caught"); //Called
            })

(I can’t quite reason about what’s causing this - it feels like a bug more than expected behaviour.)

Unclear “cause” - inner or outer?

The cause object isn’t necessarily unambiguous - what’s the cause here? I’ve seen cases where people would expect it to be 1 (the top level object), but of course it’s actually A (the inner object.)

    Flux.just(1)
            .flatMap(i -> Flux.just("A").flatMap(x -> Mono.error(new IllegalArgumentException())))
            .onErrorContinue((ex, obj) -> {
                System.out.println(obj); //A
            })

Unclear “cause” #2

…but what about when there is no “cause”?

    Flux.just(1,1)
            .flatMap(i -> Flux.push((sink)->{ throw new IllegalArgumentException("bum!");}))
            .onErrorContinue((ex, obj) -> {
                System.out.println(obj); //null
            })

…it’s null, which follows the “inner” pattern of the above, but that isn’t necessarily obvious - I’ve seen some who would expect it to fallback to the outer cause (i) rather than being null.

I’m sure there’s a bunch of other non-obvious behaviour situations - and I’m deliberately omitting the more “obvious” caveats that apply (such as operator support has to be explicit, it affects above the stream not below it, etc.) My point is that while most operators are reasonably simple to reason about, onErrorContinue() is a big exception to this rule that can make writing & maintaining reactive code far harder than it needs to be.

Possible solutions

I’m not sure on this, really. Changes go from “nothing” to “light touch” to “redesign somehow”. Some possible solutions might be:

Mostly leave everything as-is, but make it much clearer in the Javadoc that onErrorContinue() is a specialist operator that should be avoided wherever possible;
Rename the method to something a bit more “warning-inducing”, such as ignoreUpstreamErrorsWherePossible() (just an “off the cuff” example)
Do something with types - maybe:
- Introduce a different type of Flux that supports onErrorContinue(), and that type only works with supported operators;
- Introduce a type that forces only one “in-bound” onErrorContinue in the whole chain at compile time (can convert to the type by means of using onErrorContinue() and back by using onErrorStop()
Introduce more overloaded methods for supported operators that take an “error handler” for continuing rather than setting it globally in the chain;
Remove onErrorContinue() entirely (not feasible I suspect, are there any situations where this behaviour simply can’t be emulated by using other operators?)

Thanks for reading the ramble - would be interesting to hear anyone’s thoughts…!

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 10
Comments: 18 (11 by maintainers)

Commits related to this issue

WIP #2184 #2148 Add scope operator to deal with Context manipulatio — committed to simonbasle/reactor-core by simonbasle 4 years ago
WIP #2184 #2148 Add scope operator to deal with Context manipulatio — committed to simonbasle/reactor-core by simonbasle 4 years ago
WIP #2184 #2148 Add scope operator to deal with Context manipulatio — committed to simonbasle/reactor-core by simonbasle 4 years ago
fix #2184 Add scope operator to limit range of Context manipulation Provide an option (now Spec) per subscriber, don't expose ctx write methods. We expose one ScopeMutableSpec per Subscriber, which ... — committed to simonbasle/reactor-core by simonbasle 4 years ago
fix #2184 Add scope operator to limit range of Context manipulation Provide an option (now Spec) per subscriber, don't expose ctx write methods. We expose one ScopeMutableSpec per Subscriber, which ... — committed to simonbasle/reactor-core by simonbasle 4 years ago
fix #2184 Add scope operator to limit range of Context manipulation Provide an option (now Spec) per subscriber, don't expose ctx write methods. We expose one ScopeMutableSpec per Subscriber, which ... — committed to simonbasle/reactor-core by simonbasle 4 years ago

Most upvoted comments

I might be missing something as I am a bit new to reactive world, but I feel continuous processing can be achieved easily using onErrorResume and flat/concat/whatnot/Map. After being burned by onErrorContinue (as it is very tricky to use) I came up with pattern like that (kotlin, but I hope it is not a problem):

fluxOfRawEvents.flatMap {
    processingThatCanThrowError(it)
        .doOnError { logger.error("How unfortunate!")}
        .onErrorResume { Flux.empty() }
}.map(::furtherProcessing)
    .subscribe()

It easily wraps any existing processing, you can see what happens when processing fails, you never mess with existing publishers (when it fails, it is done and never restored, you just subscribe to new one), and it seems to be doing what I need (which is not breaking that raw events stream when processing fails for some reason). It also gives you option to actually fail on some errors.

Is this approach ok or I am missing something? If this is ok maybe would be worth putting in docs as common pattern?

piotrplazienski on Mar 15, 2021