Pre-Pre-RFC: Error Set types and Error Return Traces

The Zig programming language, by Andrew Kelley and contributors, has two novel error handling concepts that I think Rust could greatly benefit from: Error Set/Union Types and Error Return Traces. However, since such large additions would likely be quite controversial at this late in the language development process, I would like to start a discussion about them and get feedback before spending time writing up an more formal RFC. I'll try to communicate the basic idea behind these two concepts, and why I think Rust would benefit from them. Any thoughts or feedback are be greatly appreciated.

Let's start with Error Union Types. Zig's Error Union Types are something like an inverse refinement type or an "open" enum. When writing a fallible function in Zig, you may define new error variants by simply returning them. The compiler then infers the set of possible error types that your function can return. If you want to re-raise errors without modification, you simply call a function within a try expression. And, like with Rust error enum's, the Zig compiler checks for exhaustiveness when switching/matching on an error type.

The most obvious advantage of Zig's Error Union Types over Rust's Error Enum's is the ease of adding new error cases while in development. In Zig, you just write return error.InvalidChar; to define a new error variant, and this variant gets added not only to the error set for the current function, but also to the error set for any functions that called this function within a try expression, and any functions that called those functions within a try expression, etc. In Rust, you would need to add an InvalidChar variant to the current function's error enum, and then alter any callers to either handle or convert that error to an already existing error variant on their error enum, and then repeat this all the way up the call stack.

The most obvious corresponding disadvantage of Zig' Error Union Types is that function prototypes do not explicitly list the possible error variants. This means that function prototypes do not provide enough information to determine if a breaking API change was made, in the form of a new error variant. Instead, you must rely on the compiler to determine when this occurs. However, I suspect this disadvantage could be mitigated by adding a way to explicitly list the possible error variants and give that particular error set a name.

The other novel error handling concept that Zig introduced is Error Return Traces. Zig's Error Return Traces look a bit like the stack traces displayed by many popular languages when an exception goes uncaught. Error return traces have a far smaller performance impact, though, and can even provide more information than stack traces.

In order for a stack trace to be presented when an exception goes uncaught, the entire stack trace must be captured when the exception is created (or when it is thrown/raised). This is a fairly expensive operation since it requires traversing each stack frame and storing (at minimum) a pointer to each function in the call stack in some thread-local storage (which is typically heap-allocated). The argument usually made is that exceptions should only be thrown in exceptional cases, and so the performance cost of collecting a stack trace will not significantly degrade the overall program performance. In reality, though, errors are quite common, and the cost of stack traces is not neglegable.

Nonetheless, there's no doubt that stack traces are invaluable for debugging program failures. Without stack traces, achieving a similar level of understanding about the cause of a failure has been extremely difficult... Until the invention of error return traces. Error return traces convey the same or better level of understanding about the origin and cause of an error with a much lower cost. And unlike stack traces, where the cost scales with the stack depth at the location that the exception is first raised, the cost of error return traces scales with the number of times the error value is returned. If the error is recovered one function above where it is created, then the total cost is a single memory write (Edit: this is wrong; it's actually 1 read, 2 ALU ops, and 2 writes, plus a couple writes for initialization and clearing the return trace).

So how can an Error Return Trace provide more information than a stack trace? Take the following Rust code as an example:

extern crate rand;
use rand::Rng;

fn main() {
	a()
}

fn a() {
	b()
}

fn b() {
	match c() {
		Err(e) => panic!("c failed: {:?}", e),
		Ok(_) => {}
	}
}

fn c() -> Result<(), ()> {
	match d() {
		Ok(()) => Ok(()),
		Err(()) => {
			if rand::thread_rng().gen::<bool>() {
				Ok(())
			} else {
				Err(())
			}
		}
	}

}

fn d() -> Result<(), ()> {
	if rand::thread_rng().gen::<bool>() {
		Ok(())
	} else {
		Err(())
	}
}

One quarter of the time, this example panics with the following stack trace:

thread 'main' panicked at 'c failed: ()', src/main.rs:14:13
stack backtrace:
   0: backtrace::backtrace::libunwind::trace
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
   1: backtrace::backtrace::trace_unsynchronized
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print_fmt
             at src/libstd/sys_common/backtrace.rs:84
   3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
             at src/libstd/sys_common/backtrace.rs:61
   4: core::fmt::write
             at src/libcore/fmt/mod.rs:1025
   5: std::io::Write::write_fmt
             at src/libstd/io/mod.rs:1426
   6: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:65
   7: std::sys_common::backtrace::print
             at src/libstd/sys_common/backtrace.rs:50
   8: std::panicking::default_hook::{{closure}}
             at src/libstd/panicking.rs:193
   9: std::panicking::default_hook
             at src/libstd/panicking.rs:210
  10: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:471
  11: rust_begin_unwind
             at src/libstd/panicking.rs:375
  12: std::panicking::begin_panic_fmt
             at src/libstd/panicking.rs:326
  13: playground::b
             at src/main.rs:14
  14: playground::a
             at src/main.rs:9
  15: playground::main
             at src/main.rs:5
  16: std::rt::lang_start::{{closure}}
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libstd/rt.rs:67
  17: std::rt::lang_start_internal::{{closure}}
             at src/libstd/rt.rs:52
  18: std::panicking::try::do_call
             at src/libstd/panicking.rs:292
  19: __rust_maybe_catch_panic
             at src/libpanic_unwind/lib.rs:78
  20: std::panicking::try
             at src/libstd/panicking.rs:270
  21: std::panic::catch_unwind
             at src/libstd/panic.rs:394
  22: std::rt::lang_start_internal
             at src/libstd/rt.rs:51
  23: std::rt::lang_start
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libstd/rt.rs:67
  24: main
  25: __libc_start_main
  26: _start
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Note that the functions c and d are not listed anywhere in here. This is problematic because the error originated in d. There are, of course, ways to structure your code so that you can always determine the origin of the error, but this requires either capturing a stack trace every time you instantiate an error type, which has a significant performance impact, or it requires a whole lot of boilerplate error handling code, which slows development.

If Rust had a dedicated Error Union Type with Error Return Traces, the above example might be written something like this (using a place-holder error keyword syntax that I just made up on the spot):

extern crate rand;
use rand::Rng;

fn main() {
	a()
}

fn a() {
	b()
}

fn b() {
	match c() {
		Err(error.C) => panic!("c failed"),
		Ok(_) => {}
	}
}

fn c() -> Result<(), error> {
	match d() {
		Ok(()) => Ok(()),
		Err(error.D) => {
			if rand::thread_rng().gen::<bool>() {
				Ok(())
			} else {
				Err(error.C)
			}
		}
	}

}

fn d() -> Result<(), error> {
	if rand::thread_rng().gen::<bool>() {
		Ok(())
	} else {
		Err(error.D)
	}
}

Edit: fixed the error matching patterns.

And the Error Return Trace that would be printed by the panic handler would look something like this:

thread 'main' panicked at 'c failed', src/main.rs:14:13
error return trace:
   0: src/main.rs:37 in playground::d
   	      Err(error.D)
   1: src/main.rs:26 in playground::c
   	      Err(error.C)
   2: src/main.rs:14 in playground::b
          Err(e) => panic!("c failed: {:?}", e),

stack backtrace:
   ... [stack trace would also be shown, here] ...

Note that the Error Return Trace is able to track the origin of the original error, even though a different error eventually triggers the panic. This seems to be another advantage of having a single global error set. Achieving the same functionality with standard Rust Result and error enum types might be possible, but it's not particularly obvious how it would be done. You would probably need some way to annotate enum's as being tracked by the Error Return Trace. So the ease of implementing Error Return Traces seems like another advantage of Zig's Error Union Type.

And again, note that the performance cost of the Error Return Trace is small. Much smaller than that of capturing a backtrace when creating any error, as is done by some error handling crates.

I hope this has communicated the value of these two concepts and how they might benefit the Rust programming language. I know this is far from a complete exposition, and that a formal RFC would require a lot more work. If others find these ideas interesting, I would ask and encourage you to do two things: 1) check out the Zig programming language, and consider contributing or donating. 2) offer to help writing or editing a formal RFC for adding these concepts to Rust.

I look forward to hearing everyone's thoughts!

4 Likes

Andrew Kelley acknowledges Rust as the source of some of the ideas in Zig. Zig can directly compile C code and is a reasonable successor to C. I've been donating to Zig for some time.

1 Like

Have you looked at the anyhow crate? I think it already enables these sorts of features.

5 Likes

While convenient, I would also imagine this giving rise to inadvertent duplicates in large enough code bases. E.g., developer Alice writing return error.InvalidChar, while Bob introduces return error.CharInvalid for a similar purpose. ISTM these would go unnoticed rather easily in this implicit approach.

It would also seem that it would be easy to cause quite a lot of re-compilation by making a seemingly innocent change in one file that cascades outwards. If errors are part of the function signature it's quite a bit more obvious this occurs. It's also more obvious where follow-up changes need to be made to address such a new variant.

Regarding an explicit list of error variants: I am still hoping that we'll get something like the TypeA | TypeB | TypeC syntax variant for enums that has been floating around.

Error return traces do look spiffy, though! :slight_smile:

3 Likes

I believe both of these ideas have been discussed before under different names.

Error union types sound pretty much identical in motivation, mechanics, pros and cons to the “enum impl Trait” idea. Sadly I haven’t seen it get seriously discussed in years, but Allowing multiple disparate return types in impl Trait using unions is one example of a thread where it came up.

As for error return traces, a long-standing goal of std::error::Error reform has been to add a backtrace() method to provide exactly this:

3 Likes

The error types thing is basically global type inference. I see no good reason to restrict that to error types if it's already there. That said, I'm not a big fan of it and I'd rather Rust doesn't adapt anything like that. Local type inference with explicitly typed API boundaries is a great tradeoff between clarity and convenience.

Similarly, I don't see how this "New and Improved" kind of stack trace would require a builtin type, if that's what you are implying. It should probably be a library function, or at most a compiler intrinsic, returning a regular "user-defined" (or in this case, std-defined) type holding the necessary information.

5 Likes

An easy way to "flatten" or create a union of mulitple enums sounds like a nice feature. I'd like to have that independent of errors.

Currently enums in Rust can be combined only explicitly by wrapping one enum in another, but such structure shows up when matching them, and doesn't automatically allow conversions between the enum types.

enum MyError {
   LibraryError(LibraryError),
   MyErrorCase,
}

It'd be nice to have something like:

enum MyError {
   pub use LibraryError::*; 
   MyErrorCase,
}

which would copy variants from LibraryError as if they were variants of MyError, so you'd have MyError::InvalidChar, and not the verbose MyError::LibraryError(LibraryError::InvalidChar)).

4 Likes

How is rustc going to be able to assign disjoint discriminants to the various additional error types that are added by different mods or crates to a shared base enum?

mod MyCrate {
  pub enum MyError {
    LibraryError(LibraryError),
    MyErrorCase,
  }
}

mod YourCrate {
  pub enum YourError {
    LibraryError(LibraryError),
    YourErrorCase,
  }
}

mod UnifiedCrate {
  enum UnifiedError {
    MyError(MyError),
    YourError(YourError),
  }
}
2 Likes

Good point. The only obvious way to make the syntax work for arbitrary enums would be for the resulting enums to have the same "wrapped layout" you'd get from explicitly wrapping them today, which I doubt is worth it.

I think getting the "no-wrapping layout" to work across crates would require https://github.com/rust-lang/rust/issues/60553 to be implemented and all the crates involved to be using it and happen to not use any of the same explicit discriminant values. I can see that being legitimately valuable at scale for large multi-crate projects such as Servo, where the motivating use case for #60553 itself originated, but it's likely impractical for anything in public APIs or involving multiple maintainers.

And for single crate cases, it's probably easier to just define all the enums in one place with a one-off macro anyway. So at the moment it does seem like there's no way to make "flattening" enums work in a way that's genuinely useful.

When I saw this thread, I thought maybe this was @yaahc_'s thread; her tracing-error crate is an(other) experiment on error context and traces that are more meaningful than a generic stack trace.

I'm not sure how much that matters. Any time the developer uses a features that expresses "Figure out what the type of this is, as long as it's an error", any error handling using this type is going to be best-effort regardless, usually of the form "open a pop-up window with the error message and write it to stderr, then move on".

Code that needs to react predictably based on the type of error it gets isn't going to use auto-generated enum impl (or whatever we call it) anyway.

Well, you could have a pass where the compiler automatically allocate discriminants to every single enum in every crate the current build uses (which I assume is what Zig does), but then the problem is that the generated type isn't known before that pass, which by nature needs to happen after type-checking.

Not sure how big of an obstacle that is.

Yeah, this sort of thing is a common suggestion in back-and-forths about feature proposals with an intended optimization component, and it always hits the brick wall of: Separate compilation matters for many reasons, but especially because compile times matter, so in practice this is probably a non-starter.

Thanks for the suggestion! I had not seen this particular crate. However, it looks like the anyhow crate is similar to ones I've seen before that provide an exception-like error handling experience, in which a stack trace is captured when the error is instantiated. As discussed above, this is not the same thing.

That being said, this does give me an idea for how it might be possible to implement something like an error return trace on stable Rust today, by abusing the ? operator's implicit call to From. I'll give a try when I get a chance.

I don't think there's a way to implement error set types, though, unless there's a Rust macro to get a unique number at compile time.

1 Like

That said, I think we could come up with a good-enough approach that still allows separate compilation and fast compile times. I just don't know how much effort it would be to implement it in Rust existing's type system.

Using @Tom-Phinney's code above as an example:

  • Create a global, unique identifier, eg UnifiedError. Both MyCrate and YourCrate need to "derive" from this identifier in some fashion; this is possible only if they were made by the same organization, or they use the same library; for instance, UnifiedError could be a part of std.

  • Associate a type to UnifiedError that you estimate will be large enough to hold every discriminant UnifiedError can possibly have. Since this is an estimate, you're probably going to be conservative and pick u64 to avoid compile errors down the line. If UnifiedError is a part of std, its discriminant type could be a build option.

  • For every type that "derives" from UnifiedError, eg MyError:

    • Create a global variable of type u64, eg __MyError_min_det . That variable holds the value of the first determinant of MyError, and is uninitialized at this stage. Incidentally, that means MyError::MyErrorCase isn't const.

    • When initializing values of type MyError, use that global variable. Eg, MyError::MyErrorCase is compiled down to __MyError_min_det + 1.

    • When matching between branches of type MyError, do the inverse operation. So match myError is compiled down to switch (myError.det - __MyError_min_det).

  • Right before the LLVM pass, list every single *_min_det variable, and assign an unique value to each.

  • During the LLVM pass, expressions like __MyError_min_det + 1 get folded away because the value of __MyError_min_det is now known.

The biggest obstacle is that enum determinants for these project-wide enums wouldn't be known at compile time (well, until the LLVM pass). I don't know how much rustc internals would need to be tinkered with to accommodate that.

That said, I don't think there's anything inherently incompatible between project-wide enums and parallel compilation.

1 Like

When I saw this thread, I thought maybe this was @yaahc_'s thread; her tracing-error crate is an(other) experiment on error context and traces that are more meaningful than a generic stack trace.

tracing-error does look interesting! But it looks like the performance cost is even higher than capturing a stack trace every time you instantiate an error! And scales with the number of fallible function calls?

The beauty of error return traces is that they give you tremendous insight into the origin of an error, while having an extremely small performance penalty. Each time an error is returned, a single value uniquely identifying that return point (e.g. the memory location of the return) is written to the error return trace ring buffer on the stack, which costs a memory read, an add, a bitwise-and, and two memory writes; and all of the read/write locations should remain in cache for the duration of the error return bubbling up the call stack.

The error types thing is basically global type inference. I see no good reason to restrict that to error types if it's already there. That said, I'm not a big fan of it and I'd rather Rust doesn't adapt anything like that. Local type inference with explicitly typed API boundaries is a great tradeoff between clarity and convenience.

This seems a bit self-contradictory. You don't see a reason to restrict this sort of type inference to error types, but you don't want this kind of type inference in the general case. Do you disagree with the advantage of this approach for error handling that I attempted to communicate? If not, then perhaps there is a reason to restrict this sort of inference to error types?

Similarly, I don't see how this "New and Improved" kind of stack trace would require a builtin type, if that's what you are implying. It should probably be a library function, or at most a compiler intrinsic, returning a regular "user-defined" (or in this case, std-defined) type holding the necessary information.

As I tried to communicate, it's not impossible to support return tracing of arbitrary types, but it's less obvious how it should work if there is not a dedicated error type. You would likely need to annotate types that should be tracked by the error return tracing mechanism.

Also, there are two main reasons error return traces (ideally) require compiler support: 1) the compiler can determine how large to make the return trace ring buffer based on the potential stack depth (and using a heuristic for recursive calls). 2) the compiler can insert the writes to the return trace before each error return point.

As mentioned above, I have an idea of how you might be able to get something similar on stable Rust today, but it will not be as good or as streamlined as if the compiler supported it directly, for the two reasons listed above. You'll have to either hard-code the size of the return trace ring buffer, or heap allocate it, and you'll have to abuse the ? operator to insert the needed writes to the ring buffer (which will partially break if you forget to use the ? operator on newly instantiated error types).

Edit: any library implementation of error return tracing would also need to use thread local storage for the ring buffer, rather than implicitly passing a pointer to the struct on the stack, as Zig does; I'm not sure if this would cause more overhead or not.

.. so global program analysis establishes both of

  • exact list of errors a given function can return
  • maximum function call depth across the whole application

... since the data structure recording a return trace is allocated once per thread and is large enough for the worst case?

It seems this 2nd type of global analysis is another thing Rust currently doesn't have..

Rust does not have global program analysis, by intent. However, a safe variant of Rust is likely to require such analysis as part of its safety proofs. Even so, I'm fairly certain that, in the worst case, determining the actual maximum call depth of a recursive function is equivalent to the halting problem, thus impossible.

1 Like

Not the least bit, it's just ordering of preferences. For me:

  • Best case: no global type inference.
  • Worse: global type inference but at least it's a uniform and stand-alone language feature without artificial partial restrictions and without affecting or special-casing other elements of the language.
  • Worst: global type inference twisted in a way that it's only usable with certain types, which now need to be made special.

And to answer the second question, yes, I disagree that your proposal would improve error handling.

3 Likes