nom: IResult VS Result: making Incomplete part of errors?

There have been a lot of demands that I change nom’s basic type from IResult to std::result::Result. IResult has the following definition:

pub enum IResult<I,O,E=u32> {
  Done(I,O),
  Error(Err<I,E>),
  Incomplete(Needed)
}

This was originally inspired from attoparsec’s IResult in which the Partial branch contained a closure to be called when more data is available. For various reasons, I was not able to make the closure idea work (note that Rust was very far from 1.0 at the time), so I chose to show how much data was needed to ask the user to parse again.

I open this issue to study what a change from IResult to Result would entail. I make no promise to do that change, and I will not put the issue to a vote. I will however take into account the responses I get.

The proposal is to make the Incomplete branch part of the Error branch, which would allow employing Result. I do not know yet what the end type would look like. I see two possibilities for the Err type: containing either a Needed or the other error branches, or flattening Needed at the same level.

So, to detail the arguments:

pro:

  • can reuse all of the Result methods and the code relying on it
  • less surprising for developers
  • nom users can easily ignore all of the Incomplete usage, parsers are easier to write
  • it might make the macros simpler, because in pattern matching, most combinators would handle 2 cases instead of 4: Done and Error. In some combinators, there will be only three of them: Done, Error or Incomplete(Needed::Unknown), Incomplete::Needed::Size(sz), because the calculation of needed data must still happen
  • a lot of nom parsers are stuck on an old version and likely will never update (because it does the job as is), so less code to rewrite, maybe?
  • a lot of people find Incomplete confusing, since they work on complete data (like a file completely read in memory)
  • I could add in there a “non backtrackable error” that can contain the same thing as a normal error, but would make everything return instead of testing other branches
  • it might make compilation faster (compilation time is the reason for the nom fork in syn)

con:

  • it means a nearly complete rewrite of nom. It’s not necessarily an issue, I’m willing to make the time for it if needed
  • there are still a lot of nom users relying on Incomplete, and this is a big breaking change for them
  • even parsers that do not use Incomplete would need to be updated to use Result
  • it might make some combinators confusing. All of the backtracking combinators like alt! currently return on Incomplete instead of testing the next branch. So, instead of alt! and alt_complete!, do a alt! and alt_incomplete! ?
  • nom was designed with streaming in mind, I am worried hiding Incomplete will hide this benefit of the library

also, I am not sure about the timeline here. I am doing nom 2.0 very soon and it introduces some breaking changes, but I’m worried this change might be too big and drive users away. On the other hand, a 3.0 would likely happen far in the future, and there would be even more code (in nom and in code relying on nom) to update.

so, what do people think?

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 5
  • Comments: 26 (11 by maintainers)

Most upvoted comments

To be honest, I’d actually suggest that futures-rs has a good type that would be worth stealing, here:

pub type Poll<T, E> = Result<Async<T>, E>;

/// Return type of future, indicating whether a value is ready or not.
#[derive(Copy, Clone, Debug, PartialEq)]
pub enum Async<T> {
    /// Represents that a value is immediately ready.
    Ready(T),

    /// Represents that a value is not ready yet, but may be so later.
    NotReady,
}

Rather than putting NotReady in the error (for similar reasons to IResult), it puts it in the Ok. In addition, it allows seamless interoperation with the (growing) futures ecosystem, and trivial implementations of Future.

A user could simply do an and_then on whatever future-based IO they are doing to get their input, and call their nom-based parser inside.

It’s worth noting that std::io::ErrorKind has some variants that represent recoverable kinds of errors, like WouldBlock and TimedOut. So such a thing isn’t inconsistent with the rest of rust.

Also I don’t see how hiding Incomplete behind an error kind would make the streaming story that much harder, besides being mentioned in the documentation, I’d imagine most people would use the built-in nom Producer/Consumer construction (IE: They’d never have to deal with the Incomplete error kind directly).

another interesting thing, while you got me worrying about struct size, I have a way to make simple_errors and verbose_errors compatible, by using the same enum in simple_errors, but with only one element. But things get more interesting: I can add more context in simple_errors (adding the current slice) without growing the structure, and make the verbose version smaller than in nom 3.0.

Basically, instead of:

type Result_err<I, O, E=u32> = Result<(I, O), Err<E>>;

pub enum Err<E=u32> {
  Error(ErrorKind<E>),
  Incomplete(Needed),
}

I would have:

type Result_err<I, O, E=u32> = Result<(I, O), Err<I,E>>;

pub enum Err<I,E=u32> {
  Error(Context<I,E>),
  Incomplete(Needed),
}

pub enum Context<I,E> {
  Code(I,ErrorKind<E>),
}

and in verbose mode:

pub enum Context<I,E> {
  Code(I,ErrorKind<E>),
  List<Vec(I,ErrorKind<E>)>),
}

That way, the verbose_errors feature would be additive, and as you can see in this playground, it’s smalller than in nom 3:

IResult: 40
IResult verbose: 64
Incomplete in Ok: 48
Incomplete in Err: 40
Incomplete in Err extended: 48

FYI this is now done, with an additional element in the Err structure: Failure(ErrorKind<E>) to prevent a parser from trying other branches and make combinators like many* fail instead of returning a value

mmmh, so, a few things here:

  • a lot of people think they do not need to handle incomplete data, especially for textual formats and languages. But it’s in fact very common to end up with large JSON, XML, or even C or PHP files that you would not want to load entirely in memory. And then, trying to retrofit a parser in streaming mode is painful
  • I don’t think the compilation time gets this large because of IResult vs Result, but mostly for the calculations of how much data is needed. Right now I’m not sure we always need to know how much data we need when a parser returns Incomplete, especially with network protocols. Maybe Needed could in fact hold the input slice and a needed number of bytes, so we only need to calculate once (offset from base slice + needed bytes) and move the calculation in a separate function. That would reduce the code size greatly
  • I’m open to moving some stuff out of nom in separate crates, but the core types will stay in nom. Streaming is a core feature of nom and it will stay there. There’s already a compilation feature to activate or not verbose error handling (with position information and such), and it’s painful to maintain. I’m not sure adding another compilation feature for streaming would be a good idea. Especially when nom can be included multiple times in a project from various libraries.
  • as for where Needed should be, I’m torn between both arguments, but I lean a bit more on putting it in Ok:
    • putting it in Err: it’s the common approach, but mainly because of historical reasons. C functions returning integer fell into the “0 is success, everything else an error” pattern that caused so many bugs. We have better types in Rust now. The ability to ignore Incomplete is great though.
    • putting it in Ok: Incomplete is not an error. It’s a very common case. I think that pattern matching on Ok(Done(i,o)) and putting Ok(Incomplete(_)) and Err in the same case (maybe with a helper function?) would not be that hard.

From my point of view, it would remove a difficulty from nom usage. Now I have no idea what will be the result but it seems very interesting!