embedded-hal: [RFC] Revamp `Error` types
As demonstrated by #181 and other issues there’s a real world usability issue with the Error type of the Result of operations being an associated type. An associated type means that a driver or user of an HAL impl is required to adapt to the custom Error type; while possible for applications this makes universal drivers impossible.
In my estimation the main reason for this was probably caused by embedded-hal predating the improvements to Rust type and specifically Error handling. Now we do have better possibilities and with the upcoming big 1.0 breakage it should be possible to also revamp the Error types without too much grief.
Currently our trait definition e.g. for I2C looks as follows:
pub trait Read {
/// Error type
type Error;
...
fn try_read(&mut self, address: u8, buffer: &mut [u8]) -> Result<(), Self::Error>;
}
My proposal would be to instead provide a specific Error type such as:
#[non_exhaustive]
pub enum Error {
/// An unspecific bus error occured
BusError,
/// The arbitration was lost, e.g. electrical problems with the clock signal
ArbitrationLoss,
/// A bus operation received a NACK, e.g. due to the addressed device not being online
NoAcknowledge,
/// The peripheral buffer was overrun
Overrun,
/// The peripheral buffer ran out of data
Underrun,
}
and then redefine all traits as:
pub trait Read {
fn try_read(&mut self, address: u8, buffer: &mut [u8]) -> Result<(), Error>;
}
The #[non_exhaustive] flag (available from Rust 1.40.0, cf. https://blog.rust-lang.org/2019/12/19/Rust-1.40.0.html) allows us to easily extend the reasons in the feature without being a breaking change because it requires implementations which care about the actual error to also have a wildcard match branch, which is a good idea anyway.
By having the Error types defined in embeded-hal we do some advantages:
- Drivers can now universally check the error kind reported by a HAL implementation (cf. #181 where a chip might return a NACK due to currently not being able to handle more requests, yet the driver cannot make use of this information and retry later) and act accordingly
- More uniform and exhaustive documentation of available error conditions next to the API itself
- Better guidelines for HAL authors which cases might be work distinguishing
- Improved portability of embedded applications between different hardware
Dowsides:
- More breakage right now
- MRSV 1.40
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 13
- Comments: 123 (96 by maintainers)
A few hours ago we were discussing the issue again and the solution proposed in https://github.com/rust-embedded/embedded-hal/issues/229#issuecomment-648403992 came out very much in favour.
I haven’t realised it at the time but after some careful explaining by @eldruin of ideas behind this approach, I did some exploration work to figure out whether there’s any cognitive or programming overhead or other stumbling blocks with this approach and I’m happy to report that I couldn’t.
Much to my surprise and delight it is not even necessary to
.into()theErrin case one is matching against theT, so there’s virtually no downside in comparison to fixed error types for the typical case while still allowing to go wild with custom errors for special cases.So 👍 from me.
Please refrain from personal attacks.
Let’s resume the different propositions.
The Error to rule them all
Each embedded hal trait define its Error, with no choice for the hal implementation. That’s the original @therealprof proposition.
Advantages:
Disadvantages:
Convertible Error
That’s the current proposition. Instead of forcing the Error, some bounds are imposed on the Error to the hal implementors, allowing drivers to convert the error to something fixed.
Advantages:
Drawbacks:
The last drawback afraid me in inconsistent driver implementation: some driver returning Read::Error, some driver returning Read::I2cError, some driver returning sometimes Read::Error, sometimes Read::I2cError. I suppose that the recommendation for drivers is to returns the original Read::Error to avoid lost of information, at using I2cError internally to handle some kind of errors.
An Error bound on the trait with a
kindmethodThis is my proposition. The first drawback it tries to fix is the Clone bound: in reality, the embedded hal error is just a plain enum, that can be copy, no reason to eat the original error to do the conversion, just a
&Read::Errorshould be enough, and we can remove the Clone bound.Then, the desired feature (be able to handle the Nack error in a driver) remind me the handling of the special error “broken pipe”, that might be traited as an EOF in some case (think of an implementation of grep for example). In std, broken pipe in an
io::ErrorKind. Thus, my idea is to mimick this interface, providing an ErrorKind for each trait providing the common errors that driver implementation might want to manage, and not every possible error.The ErrorKind also implements its Error trait, allowing to use the ErrorKind directly as an error if this is enough as in the proposition 2.
This proposition allows everything possible with proposition 2, without a Clone bound, and with an interface expliciting that the error must not be lost by the Driver implementation.
So we have:
Advantages:
Drawbacks:
Conclusion
I think proposition 3 is better than proposition 2. Everything that can be done with proposition 2 can be done with proposition 3, and with a more ergonomic interface. Proposition 1, as more opinionated and simpler, proposes a different alternative.
hmm the
type Error: core::fmt::Debug + Into<I2cError>;idea seems pretty interesting to me, though does seem that it would requireClonewhich is not always viable for errors (especially on linux), and how should inner types not-mappable to flat types be handled?it seems like
TryFrom<&Self::Error> for I2cErrorwould be pretty were it achievable?Nonecommunicates the case where the inner error is not representable using the flat typeThe only real issue I have is that these approaches might promote error erasure in drivers, if you have to
.into()to get theI2cErrortype the odds of a driver author returning the genericI2cErrorinstead of the underlying one would seem high. This is not an concern with thetype Error: I2cErrorbounds approach, but, could probably be mitigated in documentation, and this waymatchworks.we have always had this facility, and i at least have used it heavily in my radio work (thought the applications using this are not public 😕). even where some of these cases are not handled in the driver, you have the expressiveness to manage errors at the application, to attempt reconnections at different levels of the stack, and even just to let the developer/user know what actually happened. I would be personally v frustrating to have to connect a debugger to work out whether it’s USB or SPI or networking that failed and returned me a
BusError, or as mentioned above, to have to fork drivers to add out-of-band mechanisms for propagating the actual underlying errors.i completely understand this perspective (and have had to do this a couple of times for some cursed C/FFI work), it’s a real pain to have to add underlying error context to driver objects and, nearly impossible to do so in a way that is safe with multithreading, async, or even multiple calls in a
no_stdcontext.It’s not something to be done without consideration and deliberation, given the impact on everyone else in the ecosystem. But I too would be okay with a bump in MSRV, especially packages with another pile of breaking changes.
As far as I am concerned it’s up to the application to decide what is recoverable and what is not.
Both. Originally it was supposed to signal that a device does not exist on the bus but nowadays is sometimes is used to signal that the device is busy.
You do have a strange interpretation of “clearer”… 😅
I think both. E.g. if you have a driver talking over some interface and it is waiting for a certain stream of data to arrive and the HAL signals an overrun or a jammed line, the driver might be able to retry the last communication in order to fix the issue. If you don’t have a driver but your application is handling the data directly you similarly might want to be able to handle such situations (which is currently not possible in a generic way).
I absolutely disagree. Having a common set of error types for general use is one the achievements of the latest iterations of e.g. libstd. I think it is fundamental for portable and composable software to have a common set of error kinds to rely on and that’s also the reason why custom approaches like
failureare now deprecated.Easy: if it makes sense, is not covered by an existing error kind we’ll accept it. There should be a very low bar to including new error kinds and since additions are non-breaking I would not worry about that at all.
I am one of the persons that thinks that forwarding underlying error information must be possible. That is possible without
ImplError.ImplErroris only about standardisation of underlying error causes. That is a second level of standardisation.I believe error types would be more flexible without attempting to represent every possible hardware failure situation inside each error type. This information can be transported separately with the approach from proposal 3.
So, standardisation of underlying errors was not the most pressing need in the community.
That’s true for HALs, yes. But if you have a device driver crate that has a
driver.measure(&mut i2c)API, then that driver will not be able to return HAL-specific information because it does not know about the HAL.This would mean either the driver would need to be modified so it knows HAL-specific APIs, or the HAL would need some kind of “return the last error that happened” method. Which is horrible and doesn’t work if a driver chains multiple calls to the HAL’s APIs. It can also break completely in an async context.
The alternative being a totally implementation-defined
errno-like mechanism to get the rest of the information whenever an error pops up, which is bolted manually onto other libraries. I am sorry but to me that sounds pretty horrible.I think there are two cases that we need to consider: A generic driver (which will not be able to deal with any platform-specific or project-specific error) and a hardware-specific application.
A generic driver will be very happy about unified errors because it allows it to handle certain cases like NACKs. I think nobody disagrees with the idea that there should be a list of predefined error variants.
However, an application like the vacuum robot may want to handle different kinds of bus errors differently.
If the e-h trait does not allow nesting errors (and thus the error information is not available by using the e-h API), then the driver will need to be modified with a custom non-e-h-compatible interface that returns the needed error information. For example:
The problem is that now the driver contains “prorietary” APIs that need to be called by the application. The developer of the application would need to fork or copy-paste the driver code in order to modify it.
If instead the e-h error enum allows wrapping a generic error, then the driver crate could simply pass through the underlying error, and the application could handle it if desired.
Of course this does not mean that we should not try to cover all possible error cases, but it still offers an escape hatch for developers that don’t want to go through the “propose-change-to-e-h > get-change-accepted > wait-for-release > update-driver > wait-for-release > update-application” cycle.
@ryankurte @eldruin So I was looking into your proposal to have a generic custom error kind. I do see a few problems with that approach:
CustomorOthertype rendering the plan to a good deal toothless; we absolutely want to encourage implementers to use new shared error kinds and extend them withine-hif they’re not sufficient (I so think that with a good look at some powerful implementation we can get a very good idea of what should be included in a first throw)stdapplications are moving away from nesting due to useability issues and crates likeanyhowmaking it really easy to get a nice chain of errors iff error variants are not manually nested and embedded applications are even more simplistic than that due to the requirement that the application actually needs to know about the used peripherals and even initialise them specifically; there’s no such thing as a library hidden several layers down magically initialising an peripheral and doing fancy stuff without the application knowing anything about thatI think with the new fallible-first approach it is very important to deliver a decent set of standard error kinds, too, and not leave it to impls to come up with a wild variety of custom errors (or none at all).
Thanks for your responses. I don’t have time to read them all in detail right now but here’s one paragraph which stuck out in the notifcation going by:
I can see other applications as well. For instance if you have an application running e.g. over a UART it would be very helpful being able to see buffer overruns so a driver could start over. For GPIO (where we’re discussing things like tri-state GPIOs and changing directions) it might be useful to know that a pin is in a wrong mode for some operation.
But even other than in drivers, whenever you’re doing real error handling in an application (other than the unfortunately common plain unwrapping) this is currently not at all portable so whenever you want to run an application on a different MCU and properly handle different error situations you’d have to write MCU specific error handling – which is not great.
To define the set of enum, what about checking a few existing errors in hal, and see how we would transfer it to the corresponding ErrorKind.
For example, the error used in https://github.com/stm32-rs/stm32f4xx-hal/pull/218/files look quite protocol related and should fit in the ErrorKind type.
OTOH, the DMABufferNotInDataMemory error in nrf-hal doesn’t seem appropriate for a dedicated entry in kind, and, I think, would enter in the
Othercase.linux embedded hal is not really helpful as it seems to directly use
std::io::Erroras its error type.Otherseems to be the right name, as it is also used in std::io::ErrorKind::Other https://doc.rust-lang.org/std/io/enum.ErrorKind.html#variant.OtherSo, maybe we can begin with a small set of variant in the embedded-error kind types, try to use it in a few hal, and see if it need additions. It would be semver compatible.
The only thing to really choose are:
ImplError) and its name.ImplErroris not really talkative, maybe GenericError or something like that?Othervariant belong to the generic error, or to the specific errors.My ability to come up with examples right on the spot is beside the point. A high amount of requests within a short period of time (leading to “swamping”) is also beside the point. The point is that the number of
ImplErrorerror variants is expected to grow, which adds to my point of an unwieldy number of ever-so-slightly-different error variants.Again,
Timeoutis already part ofI2cError, notImplError. A generic driver would be able to handle it without needingImplError.If it can only point into the right direction, you will still need to process the implementation-specific error if you want to fix it or know what actually happened. We may as well cut the middle-man and just encourage applications to choose
ImplErroras their implementation error type if they are satisfied with the level of information. That would keep the errors simple.Why force an additional set of somewhat unrelated errors into every error definition if we already have a mechanism to get the underlying information?
I am sure people will come up with more.
Things like NACKs are already part of the
I2cErrorenum, notImplError. Same thing withI2cError::Timeout(although that is part of SMBus, not I2C).A driver getting an
OutputPinshould not care / cannot care about whether the pin comes from the MCU or through an I2C multiplexer. It really cannot handleImplError::PowerDown.While
ImplErrorallows to get a bit more of information, it cannot tell exactly what went wrong without makingImplErroroverwhelming (because only the implementation knows), which is precisely my point.Since it does not provide a complete solution and is also not useful for generic drivers, I would avoid it and just let I2C methods return I2C-proper errors, or
Other.MCU HALs can reuse an unified error definition like
ImplErrorby importing it themselves form a central crate likeembedded-error. I have nothing against that. Similar to therand_coredependency, for example.However, I would not make
ImplErrorpart ofI2cError.I believe the seamlessness of the associated type +
Into<_>bounds approach is one of the key elements here since you absolutely don’t want error handling to be awkward or people will find rather invent nasty shortcuts instead of doing it right.Thanks @PTaylor-us, that sounds like an interesting solution. It would allow for using only OP’s
Errorwithout needingImplErrororOther(E)by introducing an error handling “hook” which could process any implementation-specific error and if appropriate return one of the standarized I2C errors. Maybe we can break the deadlock here, then.I see a couple detail differences to what I understand from your comment, though:
embedded-haltrait implementations.NoAcknowledge->WouldBlocktransformation.But anyway these are just unimportant details and the proposal still stands.
I think it would be most interesting if we can come up with just one error handler trait which we can use for all fallible things (i.e. GPIO, SPI, etc), although we have not extended the discussion to those traits just yet.
For I2C, I understand an error handler would provide the chance to process the implementation-specific error and would itself transform that into a variant of
I2C::Erroror evenOk. So I came up with this:The trait type parameters allow for several implementations depending on the input/output errors that it should handle.
I have put together a demonstration of how an error handler could be injected into a bitbanging I2C implementation here in the playground.
Here is the handler implementation from there. It is a bit like
Into<Error>:Thoughts?
But then the driver needs to know about the HAL. If the driver should be generic (only using embedded-hal APIs, e.g. a driver crate published on crates.io) then that’s not possible without modifying the generic driver for the application-specific code.
The driver doesn’t need to handle the error. It would simply pass it on to the application, which can then decide to handle the error, or not. The driver isn’t able to do hardware specific error handling if it’s written in a generic way.
If you call
std::fs::read, then you will get back a generic error that wraps the platform-specific error. A generic library can still handle certain classes of errors without needing to know what platform the user is on. It cannot handle platform-specific errors, it just passes the error to the application. If the application targets Windows, then that application can use the Windows specific errors to do additional error handling. Nobody would expect you to fork and modify the generic library so it knows about the Windows specific I/O error messages.Note: My mental model is a hardware-specific application, using the hardware-specific HAL, but calling generic drivers (e.g. from crates.io) through embedded-hal.
What if the driver does this?
This method would ensure that the sensor goes back to sleep even if the measurement failed. If you get back an error, and then call
i2c.get_err(), how do you know whether it’s the error from thetry_write(DO_MEASUREMENT)call or from thetry_write(SLEEP)call?Also, as mentioned above, asynchronous execution could also break such an API if I2C calls are interleaved.
Yes, that’s precisely the problem. A user would need to fork all driver crates and modify them with a custom API in order to provide detailed error information.
Especially if we want to promote embedded-hal in a commercial environment, it seems a bit weird to first advertise a “production-ready ecosystem of mature generic and compatible drivers” but then say “if you want to do error handling for your hardware-specific project, you will need to fork all generic drivers and adjust them to your needs”.
No, embedded-hal can definitely provide a solution for this if we want (the solution with a generic parameter that @eldruin and others proposed). The only question is whether we want to sacrifice this use case in order for the syntax to look slightly more clean (without the type parameter).
No problem.
Yes, we would need a generic error variant like already proposed:
This is nothing really new, all drivers already define a generic
Error<E>type and to me it has been quite fine so far. For example, in a driver:Would now become:
We now have an unified error layer in between but otherwise the driver code would see very few changes.
Hmm, there is a mismatch in the types. I am not sure if it is intended. The
bitbang-hal::i2c::Read::try_read()method would returnembedded_hal::i2c::Error::BusError(embedded_hal::digital::Error(BogusPin)). A hardware I2C peripheral would returnembedded_hal::i2c::Error::BusError(Something else or just nothing).If somebody is not interested in causes for the
BusErrorand just wants to retry, they would write this:If somebody is interested in some underlying causes, then of course they have a dependency to whatever is inside, but this dependency is already there. e.g. they already know they are using bitbang-hal and that the error might be from the pins the application itself put into it. Here the code:
So the point is that code that does not care about underlying errors can just ignore them and have no dependencies and code that cares about them can use them.
Let’s take a look at a real public example I know of: In the e.ziclean-cube the MCU (stm32f101) is connected via GPIO to an I2C accelerometer (KXCJ9). In order to interact with the accelerometer, a bitbanging I2C implementation must be used (
bitbang-hal). If the MCU hal returns an error when trying to set or read a pin during an I2C operation, how would you give that information to the application?I do not know if it would be fine to just start over, but I think if this comes up frequently, it would be useful to the application to be able to know that the MCU pins are acting up, and not the accelerometer, for example, and that is the reason for some failures. This would immediately simplify diagnostics. I think it would be useful to make something like this somehow possible.
@therealprof i agree with most your points except for the error nesting bit, ime the flat approach only works in
stdbecause they nestBox<dyn Error>so you can handle flat cases without erasing useful underlying information. The critiques of the alternatives I proposed are totally valid, however, as noted above none of the modern rust error handling stuff works without alloc, which i do not think an acceptable compromise for us 😦I think it’s also important to remember that this is not only used in
no_stdcontexts? I thinklinux-embedded-halis a pretty good example, especially where async implementations are calling out tolibcandmio.This solves the, there is a specific error that may occur during normal operation of the interface that a driver may wish to handle, problem. Which may be specific to I2C, and in a way that doesn’t restrict error types or delete error information, which afaik would resolve #181, and is generalisable to other cases similar to this.
The other question about this is how you ensure people use the correct error types, and how you interpret and use those both at the driver and application level? Say we end up with
BusFaultAandBusFaultBor something, we loose abstraction in that the driver author now needs to be aware of what an underlying impl might return, and any addition to this enum that should be handled at the driver level requires a driver update, and the application author now needs to know both what the underlying impl can return, and what the driver handles / forwards. Some of this is implicit to having defined errors, some of this is a facet of the flat approach I think.A single flat non-generic set would simplify things like this which is nice, but at the same time, seems to make it functionally impossible to report meaningful errors (device disconnect, permissions, pin not mounted, etc.) at the application level, which does not seem to me to be tenable.
It would be great to improve the situation, but, I would posit that any improvement should not be worse / less-expressive than the current approach, even if only for a subset of uses 😕
From looking through the different HALs, error types that exist right now:
I don’t know enough about the different implementations (and SMBus, which seems to add some other error types) to know what all those variants mean, but it feels a bit wrong to throw away error information for error types that we as embedded-hal developers did not think of, or where platform-specific error handling could be added. Maybe some people with more experience can judge that.
I only found this RFC now, but independently suggested the same solution as @ryankurte / @eldruin:
https://github.com/rust-embedded/embedded-hal/issues/181#issuecomment-647401030
I think it’s important that the implementation can provide custom errors if it wants to, but that we can have implementation-independent error variants for things like Nack / ArbitrationLoss / etc so that I²C device drivers can use polling instead of fixed sleep times (when waiting for something to be processed).
This should probably be decided before the 1.0.0 release, right?