The Zig programming language, by Andrew Kelley and contributors, has two novel error handling concepts that I think Rust could greatly benefit from: Error Set/Union Types and Error Return Traces. However, since such large additions would likely be quite controversial at this late in the language development process, I would like to start a discussion about them and get feedback before spending time writing up an more formal RFC. I'll try to communicate the basic idea behind these two concepts, and why I think Rust would benefit from them. Any thoughts or feedback are be greatly appreciated.
Let's start with Error Union Types. Zig's Error Union Types are something like an inverse refinement type or an "open" enum. When writing a fallible function in Zig, you may define new error variants by simply returning them. The compiler then infers the set of possible error types that your function can return. If you want to re-raise errors without modification, you simply call a function within a try
expression. And, like with Rust error enum's, the Zig compiler checks for exhaustiveness when switching/matching on an error type.
The most obvious advantage of Zig's Error Union Types over Rust's Error Enum's is the ease of adding new error cases while in development. In Zig, you just write return error.InvalidChar;
to define a new error variant, and this variant gets added not only to the error set for the current function, but also to the error set for any functions that called this function within a try
expression, and any functions that called those functions within a try
expression, etc. In Rust, you would need to add an InvalidChar
variant to the current function's error enum, and then alter any callers to either handle or convert that error to an already existing error variant on their error enum, and then repeat this all the way up the call stack.
The most obvious corresponding disadvantage of Zig' Error Union Types is that function prototypes do not explicitly list the possible error variants. This means that function prototypes do not provide enough information to determine if a breaking API change was made, in the form of a new error variant. Instead, you must rely on the compiler to determine when this occurs. However, I suspect this disadvantage could be mitigated by adding a way to explicitly list the possible error variants and give that particular error set a name.
The other novel error handling concept that Zig introduced is Error Return Traces. Zig's Error Return Traces look a bit like the stack traces displayed by many popular languages when an exception goes uncaught. Error return traces have a far smaller performance impact, though, and can even provide more information than stack traces.
In order for a stack trace to be presented when an exception goes uncaught, the entire stack trace must be captured when the exception is created (or when it is thrown/raised). This is a fairly expensive operation since it requires traversing each stack frame and storing (at minimum) a pointer to each function in the call stack in some thread-local storage (which is typically heap-allocated). The argument usually made is that exceptions should only be thrown in exceptional cases, and so the performance cost of collecting a stack trace will not significantly degrade the overall program performance. In reality, though, errors are quite common, and the cost of stack traces is not neglegable.
Nonetheless, there's no doubt that stack traces are invaluable for debugging program failures. Without stack traces, achieving a similar level of understanding about the cause of a failure has been extremely difficult... Until the invention of error return traces. Error return traces convey the same or better level of understanding about the origin and cause of an error with a much lower cost. And unlike stack traces, where the cost scales with the stack depth at the location that the exception is first raised, the cost of error return traces scales with the number of times the error value is returned. If the error is recovered one function above where it is created, then the total cost is a single memory write (Edit: this is wrong; it's actually 1 read, 2 ALU ops, and 2 writes, plus a couple writes for initialization and clearing the return trace).
So how can an Error Return Trace provide more information than a stack trace? Take the following Rust code as an example:
extern crate rand;
use rand::Rng;
fn main() {
a()
}
fn a() {
b()
}
fn b() {
match c() {
Err(e) => panic!("c failed: {:?}", e),
Ok(_) => {}
}
}
fn c() -> Result<(), ()> {
match d() {
Ok(()) => Ok(()),
Err(()) => {
if rand::thread_rng().gen::<bool>() {
Ok(())
} else {
Err(())
}
}
}
}
fn d() -> Result<(), ()> {
if rand::thread_rng().gen::<bool>() {
Ok(())
} else {
Err(())
}
}
One quarter of the time, this example panics with the following stack trace:
thread 'main' panicked at 'c failed: ()', src/main.rs:14:13
stack backtrace:
0: backtrace::backtrace::libunwind::trace
at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
1: backtrace::backtrace::trace_unsynchronized
at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66
2: std::sys_common::backtrace::_print_fmt
at src/libstd/sys_common/backtrace.rs:84
3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
at src/libstd/sys_common/backtrace.rs:61
4: core::fmt::write
at src/libcore/fmt/mod.rs:1025
5: std::io::Write::write_fmt
at src/libstd/io/mod.rs:1426
6: std::sys_common::backtrace::_print
at src/libstd/sys_common/backtrace.rs:65
7: std::sys_common::backtrace::print
at src/libstd/sys_common/backtrace.rs:50
8: std::panicking::default_hook::{{closure}}
at src/libstd/panicking.rs:193
9: std::panicking::default_hook
at src/libstd/panicking.rs:210
10: std::panicking::rust_panic_with_hook
at src/libstd/panicking.rs:471
11: rust_begin_unwind
at src/libstd/panicking.rs:375
12: std::panicking::begin_panic_fmt
at src/libstd/panicking.rs:326
13: playground::b
at src/main.rs:14
14: playground::a
at src/main.rs:9
15: playground::main
at src/main.rs:5
16: std::rt::lang_start::{{closure}}
at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libstd/rt.rs:67
17: std::rt::lang_start_internal::{{closure}}
at src/libstd/rt.rs:52
18: std::panicking::try::do_call
at src/libstd/panicking.rs:292
19: __rust_maybe_catch_panic
at src/libpanic_unwind/lib.rs:78
20: std::panicking::try
at src/libstd/panicking.rs:270
21: std::panic::catch_unwind
at src/libstd/panic.rs:394
22: std::rt::lang_start_internal
at src/libstd/rt.rs:51
23: std::rt::lang_start
at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libstd/rt.rs:67
24: main
25: __libc_start_main
26: _start
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Note that the functions c
and d
are not listed anywhere in here. This is problematic because the error originated in d
. There are, of course, ways to structure your code so that you can always determine the origin of the error, but this requires either capturing a stack trace every time you instantiate an error type, which has a significant performance impact, or it requires a whole lot of boilerplate error handling code, which slows development.
If Rust had a dedicated Error Union Type with Error Return Traces, the above example might be written something like this (using a place-holder error
keyword syntax that I just made up on the spot):
extern crate rand;
use rand::Rng;
fn main() {
a()
}
fn a() {
b()
}
fn b() {
match c() {
Err(error.C) => panic!("c failed"),
Ok(_) => {}
}
}
fn c() -> Result<(), error> {
match d() {
Ok(()) => Ok(()),
Err(error.D) => {
if rand::thread_rng().gen::<bool>() {
Ok(())
} else {
Err(error.C)
}
}
}
}
fn d() -> Result<(), error> {
if rand::thread_rng().gen::<bool>() {
Ok(())
} else {
Err(error.D)
}
}
Edit: fixed the error matching patterns.
And the Error Return Trace that would be printed by the panic handler would look something like this:
thread 'main' panicked at 'c failed', src/main.rs:14:13
error return trace:
0: src/main.rs:37 in playground::d
Err(error.D)
1: src/main.rs:26 in playground::c
Err(error.C)
2: src/main.rs:14 in playground::b
Err(e) => panic!("c failed: {:?}", e),
stack backtrace:
... [stack trace would also be shown, here] ...
Note that the Error Return Trace is able to track the origin of the original error, even though a different error eventually triggers the panic. This seems to be another advantage of having a single global error set. Achieving the same functionality with standard Rust Result
and error enum types might be possible, but it's not particularly obvious how it would be done. You would probably need some way to annotate enum's as being tracked by the Error Return Trace. So the ease of implementing Error Return Traces seems like another advantage of Zig's Error Union Type.
And again, note that the performance cost of the Error Return Trace is small. Much smaller than that of capturing a backtrace when creating any error, as is done by some error handling crates.
I hope this has communicated the value of these two concepts and how they might benefit the Rust programming language. I know this is far from a complete exposition, and that a formal RFC would require a lot more work. If others find these ideas interesting, I would ask and encourage you to do two things: 1) check out the Zig programming language, and consider contributing or donating. 2) offer to help writing or editing a formal RFC for adding these concepts to Rust.
I look forward to hearing everyone's thoughts!