Request for comments: Unexpected error handling


Currently the standard library has std::error::Error for error handling, which has proved to be problematic for a number of reasons. I will argue this is partly because it tries to cover 2 different types or errors (fast-path expected faults, and unexpected failures), and that separating these out gives a coherent strategy that is backwards compatible.

The std::error::Error trait

Copying from the standard library verbatim (except for explicit dyn):

pub trait Error: Debug + Display {
    fn description(&self) -> &str { ... }
    fn cause(&self) -> Option<&dyn Error> { ... }

That is, types that implement Error must also implement Debug and Display, and may optionally provide a description, and/or optionally link to another causal error.

There are problems with this Trait that lead to the Fail trait in the failure crate. They are:

  1. Implementations of Error are not required to be thread-safe (no Send, Sync bounds)
  2. Implementations of Error are not required to be Any, so downcasting isn’t guaranteed to work (no Any or 'static bounds).
  3. The trait contains 3 ways of formatting as text:
    1. The description method, which just references a static string,
    2. The Display impl, which may do more complex formatting,
    3. The Debug impl, similar to above.

The current Error trait has implementations for all the variant situations

impl Error + 'static
impl Error + Send + 'static

The failure::Fail trait

Here is the Fail trait from the failure crate:

pub trait Fail: Display + Debug + Send + Sync + 'static {
    fn cause(&self) -> Option<&dyn Fail> { ... }
    fn backtrace(&self) -> Option<&dyn Backtrace> { ... }
    fn context<D>(self, context: D) -> Context<D>
        D: Display + Send + Sync + 'static,
        Self: Sized,
    { ... }
    fn compat(self) -> Compat<Self>
        Self: Sized,
    { ... }

The differences are:

  1. Send + Sync bound so the error can be moved/referenced between threads,
  2. 'static bound so the error can be downcast,
  3. The context method, which boxes this Fail and creates another wrapping it,
  4. The backtrace method, which generates a backtrace at the call site, and
  5. The compat method, for backwards compatibility with Error

this means that Fail is perfect for unexpected errors: we can handle them anywhere, we can pass them around as Fails, and then try to downcast them to specific errors if we want, similar to

try {
   // ...
} catch (IoException e) {
   // ...
} catch (OtherException e) {
   // ...
} catch (Exception e) {
   // ....

where we can behave differently depending on the type of the exception.

This comes with a cost: trait objects have a performance and space cost, and so are not ideal for fast-path expected error recovery. I suspect this is why the Error trait was designed the way it was. However, for fast-path recovery we don’t need an Error trait! The error is being handled close to the code that causes it, and the author can just use his own types to handle it. We need an error trait when the handling happens away from the source, and we want to have choice about how much information we handle.

Note that on the happy path a Box<&dyn Fail> can be made to fit in a pointer (failure::Error) so its fine for hot code.

Add Fail to standard library

Therefore I propose

  1. Add a Fail trait (bikeshedded if necessary) that exists for unexpected errors, a.k.a failures. Explain the concept of unexpected errors in all the documentation and the book, and explain the history of the shortcomings of the Error trait trying to be all things to all people.

    It doesn’t need to have the backtrace functionality, which is orthogonal, but backtraces are very useful in debugging.

  2. Add the context infrastructure, and explain how it can be used to add extra contextual information to errors. Explain how you can walk the error chain to get the original error, and downcast it to a specific type if you need to.

And that’s it, everything else can be iterated on in crates. Deprecate the Error trait, explaining the 2 use cases and how Fail handles one, and for the other (fast-path) you don’t need an error trait. Say you can always implement Fail for any error, including fast-path errors, if it helps, just doing that doesn’t affect performance. Encourage the use of Box<&dyn Fail> for unexpected errors if you don’t want to expose the exact type of the error across crate boundaries.

A key argument is that as well as being backwards compatible, it makes sense to use another name than Error, since we are specifically dealing with unexpected errors, where performance is not a concern. Error sounds like a name for all errors.

What do people think? I don’t claim any of these ideas are original, but I thought it might be useful to have an article arguing the case. If anyone has prior discussion I will add links to it.




Yeah this thread is in some ways just a re-hashing of that RFC. I’m just a bit concerned that

  1. cause is used in other languages. It’s a shame that we lose this symmetry and increase friction for new users.
  2. The trait signatures get complicated, and the signature of the cause (source) isn’t the same as the signature of the original Error (it’s also bounded by 'static). Obviously this is necessary for backwards-compatibility, but still it increases complexity.
  3. This trait is not suitable for all error situations. Unless you want the ability to abstract over many unexpected errors, you don’t need this trait - you can just use your error type directly, and avoid some complexity. That’s why I’m arguing for a distinction for an error (or fault) as something that may be expected, and a failure, that is an unexpected error.

So the Fail trait isn’t replicating Error, is providing new functionality for a more specific type of error.