Could we use padding bits of Result
and/or unused patterns to store the creation location of Err/None? This could give an extremely cheap backtrace, costing nothing if Ok is returned and only a few bit shifts if Err is returned. The cost: Slightly larger debug information and a slightly more expensive Result::unwrap()
, which shouldn't matter (I'm not sure about the complexity cost in the compiler).
Example output when calling Result::unwrap()
(can be shortened/lengthened if desired):
thread 'main' panicked at src/main.rs:43:23:
called `Result::unwrap()` on an `Err` value: 5
converted at src/main.rs:13:23 // e.g. `.map_err(...)?`, perhaps only with RUST_BACKTRACE=1
converted at src/main.rs:23:23 // e.g. `.map_err(...)?`, perhaps only with RUST_BACKTRACE=1
caused by `Err(5)` by src/main.rs:99:23 // May not be present if call stack was too deep
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Motivation
Errors in Rust are often propagated to the caller (?
operator). This can hide their origin because when the application does end up unwrapping the error the backtrace only shows the .unwrap()
location. The eyre crate solves this by storing the entire error together with additional information behind a pointer, but this has downsides:
- Creating the error report has an overhead (though it does result in a lot of useful information). This overhead (heap allocation + backtrace) happens every time an error is created, even if it is gracefully handled/discarded afterwards.
- The Error type is always 8 bytes large and 8 byte aligned (with the 0x00 pattern unused), even if the underlying error is only a 1-byte enum.
The impact of the second 8-byte alignment depends on T
:
// Error enum used in size/alignments below
enum E {
// Up to 255 variants (just as an example)
}
Type | Size | Alignment | Type (without eyre) | Size | Alignment | Padding bits | Unused patterns |
---|---|---|---|---|---|---|---|
eyre::Result<()> |
8 | 8 | Result<(), E> |
1 | 1 | 0 | 0 |
eyre::Result<u8> |
16 | 8 | Result<u8, E> |
2 | 1 | 7 | 0 |
eyre::Result<NonZeroU8> |
16 | 8 | Result<NonZeroU8, E> |
2 | 1 | 7 | 0/1? |
eyre::Result<u64> |
16 | 8 | Result<u64, E> |
16 | 8 | 7+56 | 0 |
eyre::Result<NonZeroU64> |
16 | 8 | Result<NonZeroU64, E> |
16 | 8 | 7+56 | 0/1? |
eyre::Result<[u8; 4]> |
16 | 8 | Result<[u8; 4], E> |
5 | 1 | 7+24 | 0 |
eyre::Result<[u8; 8]> |
16 | 8 | Result<[u8; 8], E> |
9 | 1 | 7+56 | 0 |
eyre::Result<E> |
16 | 8 | Result<E, E> |
2 | 1 | 7 | 0? |
Larger T
(Result<T, E>
) types usually mean the additional 8-byte alignment requirement from the eyre
pointer has little to no impact, but the size difference between T and E also tends to increase significantly.
For Option<T>
there usually is less padding and often no unused patterns.
Internal padding
If I understood it correctly we cannot make use padding bytes added to ensure alignment, but Result
and Option
often already have internal padding bits/unused patterns (especially when the Ok
variant is larger than the Err
variant).
In rare cases these extra bits/patterns are used, but usually not more than a single pattern. For example Option<Option<T>>
or Option<Result<T>>
. But more often than not these bits are unused.
The bytes needed due to the size difference between T
and E
are never used (as far as I can tell).
Proposal
Let's use these 7 bits + (some of the) extra padding to fit the OK variant to store where the Err
was created:
// **Conceptually**, we obviously can't change the signature of `Result`.
enum Result<T, E> {
Ok(T),
Err{
// Whatever we have in padding bits from our own enum +
// extra bytes needed by `T` but not `E`.
error_origin: ???,
value: E,
},
}
- For every function (ideally after inlining) give each location of error creation (
Err(_)
) and propagation/manipulation (?
, return, ...,.map_err(_)
) an index (unique for this function, not including called functions), see example below. If a function has only one error creation/propagation/manipulation site the function does not shift error_origin, its index has a length of 0 bits. - Store this index as part of the debugging information (or somewhere else in static/constant/read-only memory if needed/desired on release builds):
HashMap<(function_identifier, index), function_identifier>
- At codegen: At error creation sites (
Err(_)
) set the the error-origin location bits to this index (left-aligned) - At codegen: At propagation/manipulation sites: shift the existing bits to the right to make space for the new index. If the new Result type is smaller than the old one the least significant bits are truncated.
- If
T
is larger thanE
it should be possible to put the overflown bits into those bytes, instead of discarding them. Storing the error_origin entirely in those bytes would also be an option, and would likely be desired, as it can make the shifting more efficient.
In the following I'm assuming 4 bits are available to store the error_origin.
fn outer() {
let res = middle(); // No error creation or propagaion => No index
res.unwrap(); // Maps the index to line numbers and prints it
}
fn middle() -> Result<(), ()> { // 3 bits needed to store the error origin
inner()?; // index: 0, error_origin: 0000 or 0001
let x = do_something_else()?; // index: 1, error_origin: 001X
if (x) {
return Err(()); // index: 2, error_origin: 0100 (left aligned)
}
do_something()?; // index: 3, error_origin: 011X
do_something().map_err(|_| ())?; // index: 4, error_origin: 100X or 1000, see below
inner().map_err(|_| {
// Alternatively, this can be seen as part of the same function and
// thus be index 5 (with the one below using index 6),
// e.g. due to inlining.
// This closure only has one error propagation location and thus
// needs 0 bits to store the origin.
inner() // index: 0, error_origin: XXXX (in practice: 0000 or 1000)
})?; // index: 5, error_origin: 101X (1010 if some_condition returns true, otherwise 1011)
Ok(())
}
fn inner() -> Result<(), ()> { // 1 bit needed to store the error origin
if some_condition() {
Err(()) // index: 0, error_origin: 0000
}
Err(()) // index 1, error_origin: 1000 (left aligned)
}
Not shown in the example above: If the 4 bits are not sufficient to store the entire error location the least significant bits are dropped:
- If error_origin would only be 3 bits you'd know the error was created somewhere in
inner()
- If error_origin would only be 2 bits you'd know that the error was created somewhere in
middle()
, and you would know if it is in the first or second half of the function.
By itself these bits are not really useful (unless you manually analyze the functions starting at the .unwrap()
location. But they can be used to print (part of) a backtrace to where the error originated:
impl<T, E> Result<T, E> {
pub fn unwrap(self) -> T
where
E: fmt::Debug,
{
match self {
Ok(t) => t,
Err(e) => {
let bt = self.creation_backtrace();
unwrap_failed(...)
},
}
}
// Pseudocode
pub fn creation_backtrace(&self) -> Vec<...> {
let mut error_origin = ...; // Read from memory representation
let mut location = read_from_debug_info(get_caller());
let mut backtrace = vec![];
while error_origin > 0 {
let index = first_n_bits(error_origin, location.bitcount);
error_origin = error_origin << location.bitcount;
location = read_from_debug_info_indexed(location, index)
backtrace.push(location);
}
}
}
Additional considerations:
is_err()
-like comparisons are still as cheap as they are now, since we can store zeroes in the error_origin, thus this is still just a comparison with 0.- Types like
Result<(), NonZeroU8>
make use of the unused 0x00 pattern inE
and thus do not have any unused bits. This type cannot store the error_origin and behaves as it does today. - The same logic can be used for recursive functions (error propagation shifting the bits). There may be additional complexity when the stack frame of a recursive function is overwritten (I forgot how that was called, Tail recursion optimization?), as the error_origin bits would have to be shifted multiple times when the function returns (storing the path took during the recursion). This can be simplified by not storing the entire path/backtrace, effectively skipping all recursive calls and only inserting one index. This may even be the desired behavior, as the number of iterations and the path took is probably less relevant than which line of the recursion caused the error.
- It may be desirable to limit the number of bits to a reasonable value, to limit the amount of bit shifting that is needed. For example:
Result<[u8; 64], ()>
has 64 bytes of space for the error_origin, but all those 64 bytes would need to be bit-shifted at every error propagation site. 4 bytes (32-bit) are probably sufficient in most cases and only causes minimal overhead if an error is returned. - As far as I can tell this is currently not possible/feasible without changes to the compiler: Every error creation/modification/propagation site (including
?
would need to be a macro that additionally builds up some global/static table to map the index back to line numbers. - Because this error_origin is stored in the unused bits from
Result
this information is lost when the errorE
is stored outside of a Result (e.g. in a match pattern). This isn't really desirable but can be circumvented by adding the ability to extract those bits from asResult
:fn error_origin(&self) -> BacktraceBits32
(just an example) - If the
Result
value is stored outside of local variables (e.g. in a global value, channel or a struct held by a function higher up in the callstack) this compressed backtrace is only of limited use, unless all places that put stuff into this variable have a unique index they add to it before storing it. In that case the compressed backtrace can show where the value was read out of this storage location but not who put it there. The bits containing the indicies would still be there but it is not possible to know which function they belong to. Storing the location that stored it there using a global index (e.g. unique to that storage location, if that's possible) could still be done, but it would likely need many bits and would thus probably get lost before theunwrap()
.
Extensions
The stdlib could additionally provide a type that allows increasing the number of bits that can be used. For example:
pub struct BacktraceBits32(u32);
pub struct BacktraceBits64(u64);
pub struct BacktraceBitsAlternative<const N: usize>([u8; N]);
// Usage:
// This function can return up to 39 bits for the backtrace,
// at the cost of an increased return value size
fn do_something() -> Result<(), (MyError, BacktraceBits32)> {}
// or
fn do_something() -> Result<(), BacktraceBits32<MyError>> {}
A cargo config value (or an attribute) could be used to set/override the maximum size of error_origin, thus allowing longer (compressed) backtraces at the cost of a larger performance impact when errors are returned.
Option<T>
could use the same mechanism to store the None
location + compressed backtrace, but there are usually fewer bits available to store this.
Backwards compatibility
As far as I know the memory layout of Result
is not set in stone and subject to change. The biggest impact would probably be that Option<Result<T, E>>
could get bigger (unless only 6 bits are used for storing the error_origin). Other than that I can't think of types that make use of the unused patterns Result
has, as even MyEnum<Result<(), ()>>
already needs 2 bytes.
(My) Conclusion
I think this adds a significant benefit - even if it cannot store the complete backtrace - with minimal overhead and would fit well into the language.