Unified Errors, a non-proliferation treaty, and extensible types

Hello!

I don’t have the time to write even a Pre-RFC, or an RFC, but I had this idea, probably not new, and thought perhaps it might be of interest. So this is like a Pre-Pre-RFC :smiling_imp:

Motivation

So lately, having written several crates, and still remembering avoiding writing error code boilerplate (and fearing it, but then loving it, especially with ?), (BUT still not enjoying getting over that first boilerplate aspect and building up the associated mental acceleration to start writing it), I was thinking about all the different crates that wrap the IO error, and then how so many more crates are wrapping so many other crates errors (or will in some distant, dark future, be doing so), and I thought that this seemed wasteful.

It seemed silly to me that so many crates are creating a unique type to wrap whatever type the happen to come across in their Result based computation, which will usually be quite redundant; i.e., we’re all usually wrapping IO, and then if I wrap someone else’s crate that wrap’s IO, and so on and so forth, with nested types of similar types, and so on. Really I want a flattened list of errors that we all subscribe to in some point. I.e., we all need to somehow sign a non-proliferation treaty, but how?

On a similar vein, I get really quite annoyed by this snippet, where I “unwrap” only to “wrap” again:

fn something(bytes: &[u8]) -> Result<Foo> {
  Ok(Foo::parse(bytes)?)
  // instead of just
  // Foo::parse(bytes)
}

We all know why I have to do this, but it really detracts from my flow, because its usually an inessential detail; I’m returning a Result, let me return a result.

Thought

In OCaml land so called “extensible variant types” were somewhat recently introduced. Technically they’ve actually been around for a while in the form of exceptions (you introduce a new exception, you can pattern match on them, and the exception is always the same type, exn, but when you introduce a new one, your variant automatically appears and is allowed to be matched against)

So the thought was, perhaps the issues above:

  1. some of the mental overhead of declaring and writing a new error type
  2. proliferation and duplication of other error types
  3. single signature/unified return type

could be addressed by extensible variant types applied to Error enums.

In other words, an Error enum, or whatever you want to call it, is introduced and becomes open and extensible such that a user can add a new error type to the Error enum, thereby maintaining the single type signature, reducing the amount of mental overhead for beginning to write a new error type, and proliferation and duplication of error types in custom errors no longer applies (because you aren’t wrapping them in your own enum anymore).

In OCaml (excuse this, I know), it looks something like:

module Res = struct

 (* this is marked as an open type now *)
  type error = ..

  type 'a t = Ok of 'a | Err of error

end

(* this is how we extend the type *)
type Res.error += MissingFile of string
type Res.error += MyCustomError of (string * int)

let f x =
  if x = 10 then
    Res.Err (MissingFile "missing")
  else if x = 20 then
    Res.Ok (x)
  else
    Res.Err (MyCustomError ("nope", x))

let _ = 
  let open Res in
  let res = f 30 in
  match res2 with
  | Ok res -> Printf.printf "%d\n" res
  | Err (MissingFile file) -> Printf.printf "%s\n" file
 (* we must have a catch all because the type is open*)
  | _ -> Printf.printf "unknown error\n"

So the important point here is that we can make every function now have the same signature, something like Result<T> where T is your return type, and Result is short for Result<T, Err>, where Err is the open/extensible error type.

There’s definitely some issues with introducing something like this, the semantics of an open type like this in Rust, how it reacts, how it works with the rest of the type system, what the syntax is (all of which I’m not too keen on, i’d defer to more eager minds), but that’s the point of an RFC, pre-RFC, etc., etc.

What I think could be particularly elegant about this solution of adopting an open Error type we all extend is that it unifies the type signature for Result<T> functions, and thereby makes them all compatible with another, it reduces friction when refactoring code or introducing a new function which introduces a new error type with ?, as we now no longer have to lookup the error type, add its type to our foreign_links, links, or add a new From impl, etc., since if it extends the error type it will automatically work (or at least that would be apart of the implementation/guarantee, etc.), as the signature is still the same.

Similarly, I think some really interesting results could be obtained when this is combined with rusts powerful trait system, where we can guarantee certain conditions on any type that enters the open type, such as implementing display, etc. (which OCaml can’t do as yet since it doesn’t quite have a typeclass or trait analog)

Anyway, thanks for reading and let me know what you think, and sorry about the typos :smile:

I’m not familiar with OCaml, but it seems to me this approach is not directly applicable to Rust.

The problem is, errors are passed primarily by value in Rust, and changing that would limit their usefulness somewhat. That means, you need to know the complete type in order to construct a value. So if a type is extensible, to compile code you need knowledge of all extensions that can ever be linked into the final executable, which is impossible for libraries.

On the other hand, if heap allocation is acceptable, you have something almost adequate in current Rust, and that’s Box<Error>. I admit, the trait would need a few extra methods to be truly usable in error recovery, but the basics are there.

What would really help, in my opinion, would be syntactic sugar for composite/extended error types themselves, so that you’d still create separate types for errors, but it would be easy and natural, something like

#[derive(Error)]
enum MyCrateError {
    io::Error, fmt::Error,
    CustomErrorCondition1(usize),
    CustomErrorCondition2,
} 

I’m fairly certain something in that vein is already possible as a procedural macro, without any additions to the language. The hard part would be deciding exactly how those included enum types are represented and accessed.

Just to clarify, the proc macro I’m imagining would also autoderive all the appropriate From<_> implementations and whatever other traits are appropriate on Error, so derive syntax is probably ill-advised. Possibly more like

declare_error!{
    MyCrateError {
        io::Error, fmt::Error,
        CustomErrorCondition1(usize),
        CustomErrorCondition2,
    }
}

I haven’t had a chance to read this thread in depth yet, but highly encourage checking out https://docs.rs/error-chain/, which is currently undergoing public evaluation!

1 Like

You’re basically just rewriting error-chain here.

So error chain is awesome, same with quick-error and the other boilerplate solutions that came up to get better error handling up quickly.

However, as I noted above and even mentioned w.r.t. foreign_link, it doesn’t really address any of the 3 issues I mentioned. That is, single unified type (without boxing), proliferation of the same wrapped types through crates, and mental overhead when adding new errors.

Hopefully tonight I can write an example of the ocaml version in an imaginary rust version, to better illustrate what an extensible type would buy (perhaps nothing), and to scare away less readers :smiley:

That being said, and very briefly, I don’t think there’s any technical reason an open type wouldn’t work in rust due to compilation strategy of rust and static linking of rust libraries (e.g. every open type and implementors would be known at the final compile time), unless you also want them to have ABI compatibility between rust dynamic libraries, but this is semi moot anyway since rust doesn’t have a stable abi yet anyway.

Lastly, I agree that it is perhaps extreme to potentially introduce a new type theoretic construct into the language for what some might consider very little buyback, but that is why I introduced this as a pre, pre RFC, because I thought the notion of open types could elegantly solve some of the issues I see the Error/Result type convention facing. Also that’s what internals language design tags are for :slight_smile:

From a compile model perspective, this seems to have the roughly same trade offs as monomorphization of type parameters, which we already do.

I think there is a real problem here, but I have large concerns about the language developing just too many features, and users feeling lost about whether they should be defining a trait or an open type etc.

I think this is all connected to the “virtual structs” / “fields in traits” stuff, and I suspect somewhere in the dark there is a minimal language extension covering all of these use cases that won’t introduce the problem of too many choices.

6 Likes

The difference is that generic types are explicit and you can decide the trade-off. If the idiomatic way to treat errors requires instantiating all library functions that can fail, then you are essentially barring way for Rust to become a useful systems language, because you eliminate potential for dynamic linking.

I can't follow you. Being idiomatic doesn't make something less explicit; generics are very idiomatic; this syntax for this would presumably be equally explicit, and (for certain situations) equally idiomatic. If you were concerned about dynamic linking, you would not use open types in your API, just like you wouldn't use generics.

If we implemented anything like this it would probably piggy back on traits anyway.

But I want to add also that this argument that "X would make it impossible for Rust to be used as a systems programming language" is made quite frequently, and my immediate reaction to it is always to want to ignore whatever the person was saying. Its reductive, uninformative, and emotionally charged. Its much better to make narrow, specific, technical statements.

3 Likes

I'll try to clarify. If I understood the concept of extensible errors correctly, then you basically can't even invoke a function that returns that error without forcing the calling function to instantiate. It would poison caller automatically, without any declaration. The end result would be a split in the crate ecosystem.

There are numerous libraries that can return errors from most functions, and preventing consumers from compiling into shared library would be quite a problem for smaller devices with limited memory. Now, this is all just a conjecture based on my possibly mistaken understanding of the proposal, so feel free to correct me.

No need to be so hostile. If you think my assessment is emotionally charged, try running a full blown Linux distro with only static executables and compare the memory requirements.

1 Like

I see the concern. Because its the same type, every subsequent user would modify the definition of the type, necessitating that we recompile every place that used the type.

I think instead we would need to have a model in which we instantiate separate "final types" during trans for each crate that uses this type, without exposing it in its public API. This is why I think something piggy backing on the idea of pattern expressions in traits is a good way to model this idea, because that's already how it works.

But if you do expose an open type in your API, yes you can't be dynamically linked. This is the same as exposing a type parameter in your API, which many libraries do already.

I'm sorry that it came off as hostile, that wasn't my intent. But I found both the previous comment and this one hostile as well. Your concern about supporting dynamic linking is valid and useful, but statements about what would be the doom of Rust, or this one about running a Linux distro, which came off to me as sarcastic, do not make your case stronger or elevate the conversation.

2 Likes

So while I do find these issues extremely interesting, in particular dynamic linking, I think it’s really off topic.

In case this wasn’t clear, and perhaps this is where some confusion lies, Result is already generic; exposing this as a C abi for use as a “system” library API call via dynamic linking is inappropriate, for reasons above, namely rusts abi, and generics in particular are not yet specified.

Consequently, dynamic linking concerns for open type on this grounds in favor of result is both inapplicable but also, again, off topic - at least for a pre RFC :slight_smile:

I have some ideas for compilation concerns, but again, discussion of this is, I think, largely misplaced, and somewhat off topic.

My reason for this thread was to note the three issues/concerns above, and why I thought open types would elegantly address this. I think I have failed. :frowning:

Consequently, I will attempt a small write up with rust examples, in particular, motivating examples for why in all three cases an open type (and however it is implemented, either traits, syntax extension, type system extension) addresses these issues head on, and in particular, why it would be nicer for beginners.

To be clear - I absolutely think the issues with compilation and dynamic linking are intensely important and also extremely interesting, but again, I feel we’re putting the cart before the horse, as they are somewhat off topic and/or inapplicable as an argument against, at least for the state of the proposal here. :smile:

Looking forward to more discussion though, as I do think there’s room, as @withoutboats said, for discussion of rusts error handling at both the language and crate level :slight_smile:

1 Like

If that works, then I'm looking forward to see progress on it.

Not my intent either. I try to express the motivation for my concerns, and find words to be an inelegant and error-prone tool. Par for the course for me I'm afraid.

1 Like

Dynamic linking is a method for reducing memory footprint first and foremost. It does not depend on ABI being specified or stable, nor does it have anything to do specifically with C or system libraries. The crux of the matter is that if it's possible to compile code once and use that compiled code in every consumer without change, you can have just one instance of any given library in memory, regardless of how many applications use the library. If a library is instantiated separately for each application, memory consumption grows much more steeply.

My concern is that a feature as proposed would have a broader, qualitatively different impact on the compilation process, above and beyond what monomorphization is doing. @withoutboats seems to think this can be done without such impact, so I'm eagerly waiting if anything comes of it. :slight_smile:

1 Like

I'm all too keenly aware of what dynamic linking is. I wrote a dynamic linker :slight_smile:

Again tho, i just think this is off topic right now, and I think there's several interesting ways to deal with this, but I don't feel it's relevant to worry about at the outset, and declare as a blocker, an arguably niche concern like dynamic linking rust binaries and how it interacts with feature X.

1 Like

Yes but Result<(), io::Error> is not generic and is suitable for dynamic linking (or would be if we had a stable ABI). These concerns are definitely relevant to this proposal.

1 Like

Imaginary Rust with open types

To make things a little clearer and easier, let’s invent some syntax (this should not be understood to be a proposal for syntax additions, or how it should be implemented, etc., it’s just pedagogical) for open types in rust, and then proceed to the 3 points above, and why I think it’ll help (I’ll repeat each of the points later)

I’m not going to get into generics, or lifetimes, or anything else that might interact with the base definition, or how you’d extend it with that (these are all really important considerations for a serious proposal of course); but for demonstration purposes, let’s just assume the open type is enum-like and must be extended with elements that you would normally be allowed to add to an enum, with restrictions that they’re “simple”.

Now, let’s suppose the module core::exn defines:


pub enum Exn = ..

This, let us say, will declare the type/enum/variant Exn to be open; that is, other downstream clients can extend that type with their own, extra variants.

To do so, we might write:

use core::exn::Exn;

pub enum Exn := CustomError(String)

At compile time now, if only core::exn and the above module extending Exn with CustomError are present (and there are legimate concerns how this takes place, again, not my target right now), the enum Exn will look something like:

pub enum Exn {
  CustomError(String)
}

with the caveat that anyone pattern matching on a binding with x: Exn must include a catch all, as the type can always have a variant you didn’t include (it’s open, and someone else can always extend it to a type you now aren’t including in your match, if you don’t have a catch all).

So if another crate foo extends Exn:

 use core::exn::Exn;
 pub enum Exn := BadFoo{ msg: &'static str, foo_size: usize }

and with the above example, our Exn now perhaps looks like:

pub enum Exn {
  CustomError(String),
  BadFoo{ msg: &'static str, foo_size: usize }
}

I’m not going to get into what happens if two different crates define the exact same enum variant :slight_smile: (this is problematic, because two separate crates could define the same Exn variant (e.g., the type and constructor matches), and you are unlucky enough to invoke functions from both in your function which returns a Exn , and they both happen to return that particular variant with populated values)

One last thing: because the original motivating factor for this was to use as error types, we (or the module core::result) will define a final, regular convience type for error-returning functions (like many crates all already do, except it is fixed on the right hand side), called EResult (to avoid confusion, so you don’t think I’m talking about regular Result):

use core::exn::Exn;
pub enum EResult<T> {
   Ok(T),
   Err(Exn)
}

Ok with that out of the way, I’ll move onto the motivating use case for why I thought this would be fun/interesting/nice, other concerns aside

Motivating Use Case

I’m a new user; I’m told to use the std libraries Result and Error types to return errors, and I start playing around. I first encounter this when doing IO, because read_all returns an error.

So I write something like this:

pub fn eat_bytes() -> io::Result<Vec<u8>> {
  let f = File::open(Path::new("foo.txt"))?;
  let mut bytes = Vec::new();
  f.read_all(&mut bytes)?;
  Ok(bytes)
}

pub fn run() -> io::Result<()> {
  let bytes = eat_bytes()?;
  println!("wow, bytes: {:?}", bytes);
  Ok(())
}

pub fn main() {
  match run() {
    Ok(()) => (),
    Err(e) => { println!("Err: {:?}", e); ::std::process::exit(1); }
  } 
}

This is really cool; the annoying walls of pattern matching is gone, replaced with ? and my handler which calls eat_bytes and I can pattern match on the kind of error that it returned, and etc.

I then start to add more functionality, and realize I want to add logging; I hear about env_logger, read all about how to use it, then modify my code:

pub fn eat_bytes() -> io::Result<Vec<u8>> {
  let f = File::open(Path::new("foo.txt"))?;
  let mut bytes = Vec::new();
  f.read_all(&mut bytes)?;
  Ok(bytes)
}

pub fn run() -> io::Result<()> {
  env_logger::init()?;
  let bytes = eat_bytes()?;
  info!("wow, bytes: {:?}", bytes);
  Ok(())
}

Now, suddenly, my code won’t compile. I do some research, and learn that env_logger::init() returns Result<(), SetLoggerError>, and also discover at the same time io::Result is a type definition for result::Result<T, Error>, and Error is it’s own type.

In my experience, several people do different things here, and all are legimate, because honestly, they just want to start coding again!

One thing that people do is that they learn about io::Error, study it, understand it’s api, and then manually map other errors into this error they understand, using the ErrorKind and returning a custom message when a foreign error is encountered.

Another thing people do is remove the IO error result type, and replace it with Result<(), String>, and manually map other errors into string, by either pattern matching, or other fancy methods.

Some dig deeper, and learn about rust “best practice” error handling, and read the chapter, understand how errors are enums, and can work together when you implement From, or they find error-chain, and with some fidgeting and largely through black magic macros at the top of their crate and elsewhere, they create their own custom type, and add IO, and any other encountered as the link or foreign_link.

By far however though, is that people forego ? or try! and unwrap instead. Once they’ve studied a bit more (sometimes months later), and maybe finally get to that chapter on error handling in the book and wrap their heads around it, they write their own custom error type, implement From for the errors they encounter, and maintain their own Result<T>; only to have to go back through their code and remove all the unwraps they added while developing and learning.

1. Unified Error Type

Let’s now return to this first time user, but in a parallel universe where open types are used by prominent crate vendors and the std lib.

It now looks like this:

pub fn eat_bytes() -> EResult<Vec<u8>> {
  let f = File::open(Path::new("foo.txt"))?;
  let mut bytes = Vec::new();
  f.read_all(&mut bytes)?;
  Ok(bytes)
}

pub fn run() -> EResult<()> {
  let bytes = eat_bytes()?;
  println!("wow, bytes: {:?}", bytes);
  Ok(())
}

pub fn main() {
  match run() {
    Ok(()) => (),
    Err(e) => { println!("Err: {:?}", e); ::std::process::exit(1); }
  } 
}

when this user realizes they want logging, and write:

pub fn eat_bytes() -> EResult<Vec<u8>> {
  let f = File::open(Path::new("foo.txt"))?;
  let mut bytes = Vec::new();
  f.read_all(&mut bytes)?;
  Ok(bytes)
}

pub fn run() -> EResult<()> {
  env_logger::init()?;
  let bytes = eat_bytes()?;
  info!("wow, bytes: {:?}", bytes);
  Ok(())
}

It compiles without a hitch, and they’re on their merry way, ready to frob some more bits, because both std::io::Error and log::SetLoggerError extend Exn, and therefore the type signature, () -> EResult<()> is still valid after the addition of env_logger::init(), because as we saw above, EResult includes the Exn as it’s “right hand” Error variant. In our syntax, the extensions are as simple as:

use core::exn::Exn;
pub struct Error {
    repr: Repr,
}

enum Repr {
    Os(i32),
    Simple(ErrorKind),
    Custom(Box<Custom>),
}
pub enum Exn := IOError(Error)

and

use core::exn::Exn;
pub enum Exn := SetLoggerError(())

2. Mental Overhead Reduction, Developer Flow, and the Bureaucracy of Errors in Rust

Bureaucratic is the best way for me to describe the bookkeeping, initial setup (when starting a new lib/binary), and further adjustment when new error types enter the fray in Rust.

Jean Yves Girard (the founder/inventor/discoverer of linear types, whooo!) referred to the “Inessential differences in proofs as the bureaucracy of syntax”, and I think this perfectly captures my feelings when I have to change do_something() to Ok(do_something()?) or adding new error types to my From impl, or error macro in the examples below. I know why I have to do it, but it’s inessential to the task in large, and this in my opinion has an unmeasured mental toll on me, the developer. I often find myself procrastinating doing proper error types in an initial lib or binary, because it distracts and delays me from the fun things.

Anway, let’s consider one of the cases where you the new user, got motivated enough and:

  1. studied and learned in depth about rust errors
  2. wrote your own error type by hand (or used error-chain)

Your flow got broken, but you feel better for it; you’re now back to programming. You realize that you need to add another function, frobulate, but it too returns a custom error. Now, either through helpful compiler messages, or on the docs for the crate, you add the From impl, or the type name to the links or foreign links section to teach your custom error how to understand it, and thereby preserve your return type.

What happens if we had open types and frobulate also extended Exn? It’s the exact same scenario as above in 1 - your code does not change at all, except for the addition of frobulate, you leave signatures untouched, and are unmolested by the compiler, and you continue on in your zen state.

You never got distracted by adjusting your error type to understand another error type, it doesn’t need to, because when you learned about rust using open types to model it’s error in rust, you just added this one single line to start working with your own custom error type, and you did not have to adjust any of your original function types returns:

use core::exn::Exn;
pub enum Exn := MyError(String)

pub fn eat_bytes() -> EResult<Vec<u8>> {
  let f = File::open(Path::new("foo.txt"))?;
  let mut bytes = Vec::new();
  f.read_all(&mut bytes)?;
  let is_frobulant = frobulate(&bytes)?;
  if !is_frobulant { Err(MyError("Is not frobulant enough".to_owned())) }
  else { Ok(bytes) }
}

pub fn run() -> EResult<()> {
  env_logger::init()?;
  let bytes = eat_bytes()?;
  info!("wow, bytes: {:?}", bytes);
  Ok(())
}

Basically you’re already primed and ready to go to write custom errors, AND integrate other crate’s errors into your “own” error type. You don’t need to stop and think about the crate’s error type, open your error.rs file, add the type in the links section or add a From impl (and then in the match for Display, etc.); you just don’t need to really do much at all - except concentrate on the programming task at hand, and leave all the syntactic bureaucracy where it belongs, on planet Vogsphere.

3 Non-proliferation

Lastly, and perhaps not as important, but another benefit of the effect of an open type is that adding new Error types isn’t that big of a deal. You perhaps might think that it actually isn’t that big of a deal.

This is straight from the amazing walkdir's error source code:

/// An error produced by recursively walking a directory.
///
/// This error type is a light wrapper around `std::io::Error`. In particular,
/// it adds the following information:
///
/// * The depth at which the error occurred in the file tree, relative to the
/// root.
/// * The path, if any, associated with the IO error.
/// * An indication that a loop occurred when following symbolic links. In this
/// case, there is no underlying IO error.
///
/// To maintain good ergnomics, this type has a
/// `impl From<Error> for std::io::Error` defined so that you may use an
/// `io::Result` with methods in this crate if you don't care about accessing
/// the underlying error data in a structured form.
#[derive(Debug)]
pub struct Error {
    depth: usize,
    inner: ErrorInner,
}

“To maintain good ergonomics” They’re being nice! They implemented From<Error> for ::std::io::Error for you, so you don’t have to add it to your foreign link, nor change your type signature if you started off using std::io::Result because you read the examples and didn’t know any better.

This entire comment, and rationale, wouldn’t even have to exist if there was an open type, Exn, which they could extend; they wouldn’t be proliferating, because there’s nothing to proliferate; there’s nothing to make ergonomic, because it would already be ergonomic.

A final point I’d like to make is that I now have a crate, scroll, which another crate goblin uses; another project, panopticon, uses goblin.

Scroll has a custom error; goblin has a custom error; panopticon has a custom error. Goblin From impls Scroll and IO; panopticon impls IO and a bunch of others.

Now, if panopticon decides to switch to scroll because its awesome :D, it will now impl From<scroll::Error> for it’s error’s as well, making two crates it uses, at different levels, all have enums, which wrap at various levels the same errors. I.e., if you wanted to match you could literally have:

match err {
  Error::Goblin::Scroll(err) => ..
  Error::Goblin::IO(err) => ..
  Error::Scroll::IO::(err) => ..
  Error::IO(err) => ..
  Error::Scroll::(err) => ..
}

This just fundamentally smacks of bad design to me, and it will just get worse, the more and more crates depend on each other, and the more they impl each other’s errors in order for better ergonomics, etc. This is the non-proliferation treaty that open types would enable us all to sign.

To be clear, I don’t think this is anyone’s fault; it’s just how things ended up, which is fine. And I am optimistic we’ll come up with a good, robust, forward looking solution for it. I don’t know if it’s open types, or some variant thereof, but I hope whatever it is, and however it’s implemented, that for me, most importantly, it can address the 3 issues I’ve talked about here in as robust and satisfying a manner as I think open types have in this minor, pre pre RFC.

Thanks folks for your time, and I’ll see you at the after party.

~m4b

3 Likes

I don’t think you can really avoid the heap (or at least some form of indirect allocation) this way…

First of all, if the compiler aggregates all possible error types into one giant enum, that means every error has the size of the largest error type anywhere in the program. As long as there’s some error somewhere that contains a lot of fields or a large field (e.g. an array), a type like Result<u32, MyError>, which today might be two or three words, will take up probably hundreds of bytes. With a naive implementation, that entire region of memory will be copied whenever a value of that type is moved or even returned, even if most of it was uninitialized to start with because there isn’t a big error stored in it. You might be able to optimize it by storing the used size and only copying that portion of the struct, but that logic would have its own overhead, and in any case there’s still the issue of wasting a ton of stack space.

Besides, how would you deal with the ‘chain’ part of error-chain? That is, how do you store both an outer error and the inner error it was caused by? Currently that’s done just by making the inner error a field of the outer error type. Usually there’s no need for recursion – i.e. the inner error type isn’t the same as the outer error type, and it can’t contain it as one of its fields either – so the inner error can be stored by value, and the result is a finitely sized type without heap allocations. But you can’t do that with one universal error type; you’d have to write

pub enum Exn := MyError { cause: Exn }

but that doesn’t work, because the compiler can’t rule out

MyError { cause: MyError { cause: MyError { cause: MyError { [etc…] } } } }

One potential workaround would be to make the universal error type an array of Exns rather than just one; the inner error would be stored in the next array element rather than actually inside the outer error. But there would have to be some (small) arbitrary limit on error count, and the types get even more bloated…

At that point it’s probably considerably more efficient to just use the heap - at least, when it’s available. (But in the sorts of environments where it’s not available, you might not have that big a stack either, in which case huge error types would create a high risk of stack overflow.)

But then you don’t really need a new language feature; you can just use trait objects. Box<Error> is pretty much what you’re asking for, except you can’t match on it – but if Error inherited from Any you could use the downcast methods.

2 Likes

All very real issues, but I think most of them have much less complicated solutions. For one, a large part of your rationale stands on beginners not learning their options efficiently. Extensible errors will not solve that. It’s just one more different way to do errors, on top of all the others. If there is problem with learning curve, adding more possibilities won’t help. There are already simple, straightforward, effortless ways to deal with errors, even if they don’t fit all the requirements you stated. The only way to solve that one is to teach about errors better, regardless of how they work.

I think the only other point not addressed by existing crutches like error-chain or Box<Error> is matching on errors. I think that’s the principal issue here – how do we make errors that can wrap arbitrary cause (so that you don’t​ need to declare which errors you can wrap), while still being able to match on them easily.

Extensible enums as proposed have several unanswered implementation concerns, and it is important to determine early whether and how those concerns can be solved.

As @comex pointed out, the enum would probably need to be an unsized type, otherwise it would be too unwieldy to use universally, not to mention the difficulty of determining size of a type that depends on the entire crate graph. Heap allocation would be involved every time an error is created, which may affect performance, but that’s not a major problem for most cases.

A possible way to solve discriminant allocation could be cryptographic hashing. It would be possible to create and match variants without knowing other variants. It would still be possible that two variants in separate modules are exactly identical in both name and type signature, but the only effect that would have is not being able to distinguish them. Since type signature is the same, it would not affect safety. Even with 64b discriminant – measly by crypto standards – the likelyhood of collisions would be negligible.

As for alternative approaches.

  • error-chain is good enough for most purposes and it’s readily available.
  • Dynamic typing could achieve identical results to extensible enums, with just modest syntax extensions on top of what we already have in std::any, while avoiding most or all of the big questions involved.
  • Error trait can be extended with freeform string discriminant, which could potentially be derived automatically from its definition. This would mesh well with existing Box<Error> practice. There would be a risk of namespace collisions, but without exposing attached data, this would have no impact on safety.

As a side note, the combination of heap allocation and cryptographic discriminant would completely eliminate my prior concerns, since no code would ever need to know the full range of possible variants.

One of the things that keeps rubbing me the wrong way in error handling in Rust is that I feel like the libraries have the full power and responsibility to provide the errors in their full form, but that seems like wasteful if they aren’t used in the end. On the other hand, if they neglect their responsibilities, that’s gonna cause some pain in the downstream too.

In practice, this has to do with [no_std], heap allocations and chaining errors. I wonder if there isn’t any way to have a dependency injections that allow the caller of the library to decide how far they are willing to go to have high-fidelity errors, or are they going to prefer performance and avoid heap usage.

Of course, generics already provide some mechanisms for things like that, but that requires the library authors always to be explicitly parametric over the error types, which is unwieldy. I think that if the Rust language itself is going to take some additional steps to improve error handling, making this kind of a thing easier and more flexible needs some serious thought.

1 Like