nom: Potential changes to nom error handling story for 6.0

I haven’t tested this stuff out yet so I’m not sure how feasible any of it is but here are the ideas I think are worth investigating for 6.0:

Move off of nom::error::ParseError and use std::error::Error instead

  • see if instead of append and add_context combinators that need to merge errors can instead use std::ops::Add (https://github.com/Geal/nom/issues/1131).
    • I’m unsure of what the difference between append and or is in practice, so this may instead apply to or.
  • use From instead of from_error_kind for converting from a root cause error to a user provided error type
    • this could possibly involve introducing a new root cause error type that functionally replaces the (I, ErrorKind) tuple.

Inter-operate with foreign error types instead of discarding

  • change map_res to try to use From into the user Error type from the error type of the closure, specifically make sure to support preserving std ParseIntError and similar errors
  • change verify to work with Result instead of bool to allow custom error messages upon verify failures.

Look into interoperating with error reporting mechanisms that require 'static

Probably with the ToOwned trait in std, I’m not sure how easy it would be to write the impl for this as a user in practice.

Alternatively: we can look into removing the lifetime bound on errors that keep the input with them, my understanding is that the main reason that these carry around references is to do offsets against the original reference, we could instead carry a ptr and require that the original reference is provided when printing the final error report to get the nicely rendered contexts.

Stop treating context as Errors

The specific location in a program / context of what it was doing is not an error, or indicative of a failure, and I don’t think it should be stored in the error chain/tree the same way the various parse failures are. Instead I’d like to see it gathered in a similar way to how I’m proposing we support Error Return Traces

Additional Comments

I’m not positive is now is the best time to apply these changes. I’m currently working on this RFC https://github.com/rust-lang/rfcs/pull/2895 to add a way to get custom context out of dyn Errors, and a few of the changes I’m imagining would probably want to use this functionality as part of the error reporting mechanism, though it gets a little hairy because nom errors like to have references in them, and the context extraction mechanisms would work for any members with lifetimes in them…

The vague idea im throwing around in my head is something like this.

/// Assuming the error has been made owned
///
/// if the input references in leaf errors are stored entirely for the offset, could we instead
/// store ptrs?
///
/// using println here to be lazy, would actually use `std::fmt` machinery
fn example_nom_error_report(input: &I, error: &dyn std::error::Error + 'static) {
    let mut cur = Some(error);
    let mut ind = 0;

    eprintln!("Error:");
    while let Some(error) = cur {
        eprintln!("    {}: {}", ind, error);

        if let Some(causes) = error.context::<[nom::error::Error]>() {
            eprintln!("    Various parse failures:");
            for (ind, cause) in causes.iter().enumerate() {
                eprintln!("        {}: {}", ind, cause.print_with(input));
            }
        }

        if let Some(contexts) = error.context::<[nom::error::Context]>() {
            eprintln!("    Various contexts:");
            for (ind, context) in contexts.iter().enumerate() {
                eprintln!("        {}: {}", ind, context.print_with(input));
            }
        }

        cur = error.source();
        ind += 1;
    }
}

This would hopefully support printing all these errors

Error:
    0: Unable to parse Type from Input
       Various parse failures:
           0: at line 2:
             "c": { 1"hello" : "world"
                    ^
           expected '}', found 1
       Various contexts:
           1: at line 2, in map:
             "c": { 1"hello" : "world"
                  ^

           2: at line 0, in map:
             { "a" : 42,

Error:
    0: Unable to parse Type from Input
    1: invalid digit found in string

Error:
    0: Unable to parse Type from Input
       Various contexts:
           1: at line 2, in map:
             "c": { 1"hello" : "world"
                  ^

           2: at line 0, in map:
             { "a" : 42,
    1: failed verify for some specific reason

The specific syntax and error report shape are not really the point here, this would probably need a lot of tweeking, but hopefully this gets the idea across.

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 7
  • Comments: 20 (1 by maintainers)

Most upvoted comments

with af7d8fe custom error types will now be able to store the error from map_res

so, here are th recent changes around error handling:

  • 9a555fe6464887451: the finish() method that transforms Result<(I,O), nom::Err<E>> to Result<(I, O), E>, by merging the Error and Failure cases, and panicking if it encounters Incomplete (because Incomplete should never get out of the parsing/IO loop)
  • 981d036ca8dbc: moving the basic error type from (I, ErrorKind) to a struct, so I can implement std::error::Error on it
  • bf83d5d97898: implementing std::error::Error on nom::error::Error and nom::error::VerboseError. Those implementations are not particularly great, but we can improve them

@thirtythreeforty what ind of integration do you have in mind, and why are you trying to add the ParseError trait instead of the parser error type you chose? I’m probably failing to see the method, because generally, in my own code, the parse error stop at the IO layer, and do not bubble up to the user

@torokati44 you might need a custom error type with a wrapper combinator you insert here and there for that, because recording errors for many* or opt rarely make sense: you’re expecting an error here, either as a terminating signal, or because the data you’re looking for is not there but the next piece of data is still valid for the next parser. For what you want nom-trace can still work, there’s a combinator compatible with nom 5 functions: https://docs.rs/nom-trace/0.2.1/nom_trace/#nom-5-functions-support

while or combines errors coming from various branches of alt

At least, ostensibly; none of the implementations that nom ships (as of nom 5 and nom 6-alpha1) do anything with or besides discard the old branch error in favor of the incoming one

If I may add my two cents to this issue… In my mind, logically, opt(), and many* are branches just the same as alt(), so the user might be interested in the “complete” error tree (why exactly the opt() parser failed, or why many_* stopped where it did) but instead, only alt() provides this info. It might be because with alt(), all the branches are right there, in one place, so it’s easier to do - hence or() in the error type. For the other combinators, the branching is spread out a bit (opt() doesn’t know about what might be accepted after, if the optional element wasn’t), so there might be a need to add this kind of tracing to the error type? I have looked at nom-trace and nom-tracable for similar effect, but they are a bit too macro-heavy. Is there a possibility to make this error accumulation work out of the box with all branching parsers, the same as with alt()?

If you wouldn’t mind my adding my own two cents here, I would welcome an overhaul of the error handling. (&[u8], ErrorKind) is proving extremely difficult to integrate into my programs - all is fine as long as you call other Nom functions, but there is a big impedance mismatch between Nom errors and Rust errors.

This is partially because the ParseError trait isn’t very useful - there’s no way to convert it to a std::Error, and because it’s a trait, it’s difficult to put into my own error types. Instead I’m hardcoding my errors to do a clumsy mapping between (&[u8], ErrorKind), but there don’t seem to even be any helper facilities for this error type.

If there’s a better way to interoperate with std::Result, please let me know as it would eliminate a lot of contortions in my code.

this could possibly involve introducing a new root cause error type that functionally replaces the (I, ErrorKind) tuple

I just checked, and… yes 😭 Lots of tests to rewrite