ockam: Make Rust Errors more idiomatic
I wrote
let mut forwarding_address = String::new();
std::io::stdin().read_line(&mut forwarding_address)?;
Got
error[E0277]: `?` couldn't convert the error to `ockam::Error`
--> examples/alice.rs:15:54
|
15 | std::io::stdin().read_line(&mut forwarding_address)?;
| ^ the trait `From<std::io::Error>` is not implemented for `ockam::Error`
|
= note: the question mark operation (`?`) implicitly performs a conversion on the error value using the `From` trait
= help: the following implementations were found:
<ockam::Error as From<CredentialError>>
<ockam::Error as From<NodeError>>
<ockam::Error as From<OckamError>>
<ockam::Error as From<ockam_core::routing::error::RouteError>>
and 7 others
= note: required by `from`
Update:
I think we need to take a deeper look at the design of our Errors so that they carry source, backtrace, cause etc. as they propagate within Ockam or surface to a user. We chose the code and domain approach because we’ll need to expose them over FFI. But that shouldn’t hurt the experience of Rust users. Rust Users should get source, backtrace, cause etc.
About this issue
- Original URL
- State: open
- Created 3 years ago
- Comments: 24 (22 by maintainers)
I’ve been thinking about this a bit, and feel like I should write down my thoughts in case I’m getting something massively wrong about the
ockamcodebase — totally possible, I’m very new. A lot of this is informed by my work on the stdlib and mozilla/application-services (which had some similar constraints, but, well, fewer).I think a design like the
std::io::Errortype is probably most appropriate for this. This should allow us to avoid giving up the ability to expose things on the other side of the FFI cleanly (at least in some cases). In particular, the fact that in io::Error, you always have an ErrorKind — for our errors we could enforce always having an error code and a domain (which I would change to a&'static str, or even a&'static CStr, or something that wraps it anyway, for FFI compat).It hopefully also would allow us to avoid allocation in many cases, see the internal
io::Error::const_newfunction (or if https://github.com/rust-lang/rust/pull/87869 has landed,io::const_io_error!… unless it has been renamed, which seems possible), and even for the custom error case, eventually (perhaps not immediately), something based onunsizeorstackboxcould be used to have aBox<dyn Error+Send+Sync>that doesn’t use the heap (it’s also possible that heapless has options for non-heap-allocating boxed trait objects, I haven’t looked). It would also allow natural support for backtraces, of course, on platforms and configurations where they’re available.Unfortunately, the approach you get with something like thiserror on an enum tends to be better if you don’t have to ferry user errors through it. It also grows fairly large over time, and is hard to expose to the FFI, except manually. We could probably design our own proc macro for this (mostly for assigning error codes, which is otherwise tedious) and I do still consider that an option (I haven’t made up my mind here), but… My experience is it’s a bit of a headache, and in https://github.com/mozilla/application-services became a bit unwieldy. This approach feels sloppy and bad to use from user code unless you curtail that, which can be very difficult (although it’s probably the cleanest and simplest to write, ironically).
Also, direct use of
anyhow(or similar, such aseyre) from library crates isn’t great, since those types don’t implementError(they can’t for coherence reasons), which means users can’t wrap them. While we could invent something similar (I’m pretty familiar with the internals, and have patches in anyhow’s low level guts), it wouldn’t get the ergonomics we need, and I’m unsure that we could do it in a way that maintians both the no_std support we want, as well as the ability to expose to the FFI.All that said, one constraint I haven’t fully worked through is serialization. In #967 the errors were reworked due to the need to be deserializable, and that… complicates the design somewhat, especially the desire to have static data. Hrm.
Sorry that this is a bit disjointed, I felt like getting it down when I had the thoughts rather than waiting until morning (which may have been advisable).
(This is slightly related to #1562, which is partially about FFI error ergonomics)
Some further thoughts on this came to me over the weekend.
For one, I don’t really know what the use of the error domain is, besides helping out the other side of the FFI, perhaps.
The second takes some explanation:
One hesitation I have with the
io::Error-like scheme is that it won’t work with?. I alluded to this kind of issue before by mentioning that anyhow/eyre can’t implement Error for coherence reasons, but essentially the issue is:In order for
?to work, some error type (lets call itMyErrorbut obviously in this case it would beockam_core::Error— I’m being vague just becauseanyhow::Errorandeyre::Errorhit the same issue) needs to beFrom<UserErrorType>. If you want to handle this for all possible user error types, you want to do this asThen, to allow users to interact with our errors nicely, we need to implement
std::error::Error. This will let them put it in a type-erased error container likeBox<dyn Error>,io::Error, or even{anyhow, eyre}::Error(orfailure, from back in the day):Unfortunately, we can’t. Consider the case above where we invoke
(*1)onMyError, that is, we’re callingFrom<MyError> for MyError. Well, there are actually two of these. There’s the(*1)we wrote above, and then there’s one in the stdlib. Every typeTalready implsFrom<T>. This is partially so thatx.into()is a noop if you’re already the required type, and generally is useful for generic programming. The coherence issue is that you can’t have both of these.(There are additional complexities too —
From<E> for Box<dyn Error>existing tends not to help matters when trying to work around this issue…)There are two solutions to this:
From<E: Error>(e.g.(*1)), the wayio::Errordoes, and have users go through anError::new-constructor which hurts ergonomics.impl std::error::Error(e.g.(*2)), the wayanyhow/eyre(orfailurein the past…) do, and make it harder for your users to “get away” from your error type. I think there might be things we can do here, but we don’t really want to be in a space competing with anyhow/eyre on our error type — it would be needless.I’ve been leaning towards 1, part of this is that I don’t know how we can ever have a
Fromimpl that assigns the correctErrordomain, and assigns a reasonable Error code that the user chooses1…But the downside here is that it’s friction for anybody writing an
ockam::Workerimplementation — kind of a lot of friction. Users would have to quite frequently do.map_err(|e| ockam::Error::new(e, ...))?everywhere you need to return an error inside an implementation of yourockam::Worker, which kind of sucks2. However, the solution-space here could just be to tweak theWorkertrait, or the API onContextthat accepts a worker… Which makes me think that this is related to the problems in https://github.com/ockam-network/ockam/issues/1564 or at least could share part of a solution.That is, one way you could imagine this working is:
type Error: std::error::Error + Send + Synctoockam::Worker.ockam::Worker<Error = UserErrorType, ...>.ockam::Errorby converting the return values, which effectively type-erases theErrorparameter.This still has a few issues. The main one being that now we might end up with
ockam::Errors that wrapockam::Errors, which is kinda janky. In practice it’s not that big of a deal… but an approach based fully on adjusting the#[ockam::worker](and possibly#[ockam::node]) proc macros might be able to do some magic under the hood to avoid it (faking specialization via autoref, for example).I haven’t gotten that far in experimenting here (it basically came to me over the weekend that this approach might be solve more problems), but so far it seems promising.
1: I’m also not sure how to ensure users pick a good error code, though — even as it is we probably have cases where the error code contains a
u32payload, which depending on the value, can theoretically cause it to overlap with another error code. It’s not clear this can happen in practice, though, probably not.I also have been imagining that maybe implementing a proc-macro that auto-assigns error domains / codes like
#[ockam::error(code = ...)], similar to athiserror-style thing. I’m not sure this is worth the trouble, though.2: It’s not as bad for
io::Errorof course, because you don’t implement the io traits yourself very often, you just use them. Whereas, I get the impression that implementing ockam::Workers is a significant part of the well, “whole point”.Multiple messages are great. Don’t hold back … your train of thought is insightful and I’m learning a lot. Same for thoughts above from @antoinevg & @adrianbenavides.
Having slept on it, I think there are two approaches here I hadn’t really considered yesterday (and it’s possible that the right solution will involve both in some capacity).
Use
heapless::String. Even if we have the stdlib, it doesn’t seem unreasonable to put a max length on the “error domain” string, although maybe I’m mistaken here.Have deserialization convert a
&'static strinto aString.For example:
When serializing
Repr::Static, it would contain data independent of which variant it used, but it would deserialize asRepr::NonStatic. This means we’d need to write manual Serialize/Deserialize implementations, but I don’t think this is the end of the world.A variant of this approach is to pack the
&'static strandStringinto one bundle, similar to aCow<'static, str>. (Other crates to see along these lines are https://github.com/thomcc/arcstr, and https://github.com/maciejhirsz/beef, although I don’t think of these we’d actually use).I still have to work through my thoughts here on whether or not this actually buys us anything (aside from some perf), but it’s an interesting consideration I hadn’t thought of when making the writeup last night.
(Sorry for all the messages, this is both one of my first issues here, and also something I think is critically important to having a good-feeling API, so I’d rather over-share my thought process than under-share it. It also serves to document some of the logic for future users)
Yep, it’s a great and easily-overlooked example of Rust API design (and matklad’s whole blog is top-notch, I highly recommend it).
Great article about
std::io::Errorfor everyone (like me!) who didn’t realize just how neat the implementation is:https://matklad.github.io/2020/10/15/study-of-std-io-error.html
+1 to use
anyhowwhen writing client applications.I agree with @antoinevg that the best thing for now is to wait. Given my recent experience (which you can read below), points 1 (backtraces) and 2 (u32 to strrepr) should be easier down the road if errors don’t contain parameters. Finally, point 3 (human-readable error messages) should be feasible if the
core::fmt::Displaytrait is implemented, right?Here’s my two cents on whether enum’s variants should contain parameters to build custom messages or not and how this impact the above three points.
In the project I’ve been working on for the past two years (close to 100k lines) we use
thiserror. We don’t use any feature other than theErrorandDisplaytrait implementations to save us from writing some boilerplate code. That is, we don’t make use of thebacktraceorsourcefeatures.We follow a similar approach to what’s done in the
ockamcrates. We have astructlike the following:where:
ErrorCode: is a string code, where multiple errors can return the same code (project requirements)internal_message: prints theDisplayimplementation, which contains a detailed description of the errorfriendly_message: prints the message we want to display to the final user (API), usually a simplified version of theinternal_messageThen we have a bunch of
enumswith the errors of a given layer (api, commands, infrastructure, core, …) implementing theFrom<AbcdError> for Error, where each variant can have arbitrary parameters. For example:We really struggled for a while deciding how to deal with error handling. We were also new to Rust. And, honestly, we are not 100% happy with this approach, but it has proved to be pretty solid so far. Having a good error handling strategy can make a huge difference in the developer experience, that’s for sure.
The biggest disadvantage we’ve found: variants with parameters make debugging easier but are limiting if you want to use
thiserror’ssourceandbacktracefeatures, for example. Having parameters also prevents us from converting anErrorCodeback to the original error. So unless it’s strictly necessary, I would use the parameters to build log messages and keep the errors variants without parameters.Hey, I would like to help implement this, but I’m new to ockam. I see the error definition has an error code and domain string.
What should those be when there is a
std::io::Error?Over the weekend I’ve been revisiting the error design, which has been fairly fruitful — I didn’t really understand the issues or design constraints last time… and I think the proposals I made would not have helped.
Or, they’d help a little, but would have left many of the same problems that exist in the current design[^1]. (In my defense, it was my first task).
[^1]: For example, the error codes would be just as bad. A proc-macro to help generating them would solve essentially none of their problems, even if it would be nifty. Backtrace/SpanTrace would be nice to have (they are are better than nothing), but they are not going be magic — our notion of a “call” for us is not the same as a stack frame, and its probably not a tracing span either… Etc
Anyway, after a lot more understanding of ockam, how it’s used, and its constraints, and a bunch of research… I think I have come to a good understanding about the use cases
ockam_core::Errorneeds to support, and how to get there.Note that… this is mostly ignoring things that I think are bigger issues that I intend to address in another write up (that I have in progress), like swallowing errors, serialization, and several more.
Anyway, here’s my current thinking about what
ockam_core::Errorshould be good at:ockam_core::Errorrequirementsockam_core::Errorhas (at least) three important use cases:Communicating errors in a way that other code could interpret and interpret and handle. This should work across API and protocol boundaries, even to remote machines.
The intent behind the current error design seems to address this case, as you need to know both of those things. The implementation is very hard to use in practice though, and having only this seems to cause problems.
It is worth noting, that the current node implementation ignores errors from message handling after logging them. In order to support doing better things than this, the system probably needs to be able to distinguish between different severities of errors.
That said, I have another writeup on this that I’m working on, so I won’t get into it here too much.
Communicating an error in a way that someone who is not an expert in
ockam[^2] can understand the error quickly, and if the error indicates a problem they need to fix, resolve the problem easily – even if they are .This also should work across remote machines, but probably doesn’t need to be that much more than decent error message.
Needless to say, we
ockam_core::Error’s current implementation does not help here.Communicating an error with sufficient context and detail so that a developer working with or debugging Ockam (or ockam-using code) can identify and fix the problem.
Largely speaking, this needs to work very well on the same machine, but (at least in debug builds or when an environment variable is enabled), it should have some support for transmitting it remotely. (That said, I think most of the unserializable stuff here can be handled by converting to a string or similar).
This is also only relevant for certain kinds of errors — routine I/O failures probably don’t need this kind of handling, at least by default
As you probably already know,
ockam_core::Errorcurrently has no support for this.[^2]: Perhaps they’re a developer getting started with it, someone who is operating a ockam-using tool, or invoking the Ockam CLI tool.
How to get there (… roughly)
(Caveat: I have a bit more of the design worked out than this, enough to have cleared up most holes, but still not 100% and probably am not summarizing it in enough details anyway)
I think all three of these have different solutions. My current thinking for how to handle the use cases above is (in reverse order):
Location::caller()and#[track_caller]to get the file/line/etc where the error is generated, and even collecting a Backtrace and/or SpanTrace at the site of capture if an env var is flipped. (Anyway, I’m aware that I’m handwaving this one, and my plan here is actually pretty cool, but I haven’t finished it and want to move on)[^3]: Assuming fixing the log-and-ignore error handling we have in some places.
For the 2nd — error messages a technical non-expert can understand — probably requires either requesting an explicit error message. It might be enough to accept a
causeerror too. This seems an easy-ish one, and will help a lot especially moving forward, but if you can’t tellFinally, the first despite “sort of working for that use case” has several very significant areas where it can be improved — I don’t think our crates should be in charge of assigning their own error codes for the most part (and I think we want to store real enums, meaning we get a human readable name for the code, etc).
Fixing this without causing issues for FFI and/or targets that can’t rely on the message/metadata being present is… the hardest — see Appendix A for the current sketch of the design.
Anyway, thats all I have now.
Appendix: Replacing
ockam_core::Error::code.Basic idea: two new enums (at least) are added to
ockam_corethat describe “canonical error codes” and “canonical error origins”. Something like:enum ockam_core::ErrorOrigin, which describes the component at the level of detail likeTransport,Vault,Node, etc (probably things likeApplicationandUnknowntoo).We can probably automatically assign this in most cases based on the module_path of the code that produced the error, which we should be able to get.
Note that this explicitly is not at the level of granularity of crates, modules, or even conceptual components — (e.g,
ockam_transport_tcpandockam_transport_websocketare bothErrorOrigin::Transport)enum ockam_core::ErrorKind, which looks a lot likestd::io::ErrorKind, but with a set of errors likely to happen inside ockam, rather than a general set for all I/O.The main idea behind these is that most code that checks errors will probably prefer writing checks like
which then handles all transports, rather than having to figure out which error code number comes from TCP, which from WS, etc.
Nice to see love for dtolnay/thiserror and dtolnay/anyhow! IMO they’re excellent defaults when it comes to Rust error handling.
That said, Ockam has some special needs.
@mrinalwadhwa already mentioned the need to support FFI but our existing error handling strategy also has a number of important advantages for targeting embedded (
no_std) platforms:code: u32field only consumes 4 bytes and thedomain: Stringfield is optional. This matters because we want to be able to target resource constrained platforms. 1ockam_core::Errortype which allows us to have per-crate error definitions while avoiding theBox<dyn std::error::Error>pattern. This matters becauseno_std:std::error::Errortrait andBoxtype. 2So the big question is, without losing these advantages, how can we get features like:
It would be nice to also support these on
no_stdbut, just as with FFI targets, it’s not the end of the world if we can’t have all of them.i.e. Neither thiserror or anyhow 3 provide full
no_stdsupport but this is not necessarily a problem if it’s possible to integrate them with our existing error implementation in a way that won’t require a bunch of#[cfg(not(feature="std"))]boilerplate to avoid breakingno_stdbuilds.In general, the situation is not great when it comes to idiomatic unified error-handling and
no_stdand is unlikely to improve in the future:So maybe we should also be looking at using a collection of simple crates & code to implement our wish list individually?
A couple that spring to mind are:
no_stdsupport)no_stdCan anyone suggest some more?
Finally, Rust’s error-handling story is starting to solidify but there are still a few big changes coming (e.g. RFC-2504) so it would be nice to not lock the library or dependent apps into anything that’s going to be deprecated.
[1] One change I would like to see here is to define
domainas a&'static strrather thanStringas, again,Stringneedsallocwhen we’re targetingno_std.[2] We can get
Boxviaallocbut requiringallocfor something as fundamental as Error handling is probably not a great idea.[3] The anyhow crate does provide partial
no_stdsupport but this comes at the cost of relying onalloc.As for extending Error information with stacktrace and additional data, that would be great, but I’m not very familiar with how that works and how that can or can’t be compatible with our Actor framework. There’s some research work to be done by someone
@mrinalwadhwa I’m not sure we should implement Fromstd::io::Error for ockam::Error. That may lead to worse errors experience. E.g. in this case when std::io::Error is thrown, from a function caller side I would expect error that says “Invalid forwarder address user input” or something like that, but if we implement default conversion from std::io::Error caller will get something like “IO error”, which is less informative.
@mcastorina thank you for picking this up 🙏 @jared-s @SanjoDeundiak @antoinevg what are your thoughts on good values here.
I think we need to take a deeper look at the design of our Errors so that they carry source, backtrace, cause etc. as they propagate within Ockam or surface to a user. We chose the code and domain approach because we’ll need to expose them over FFI. But that shouldn’t hurt the experience of Rust users. Rust Users should get source, backtrace, cause etc.