Using unused bits in `Result` for error origin backtraces

DragonDev1906 · December 12, 2024, 1:29pm

Could we use padding bits of Result and/or unused patterns to store the creation location of Err/None? This could give an extremely cheap backtrace, costing nothing if Ok is returned and only a few bit shifts if Err is returned. The cost: Slightly larger debug information and a slightly more expensive Result::unwrap(), which shouldn't matter (I'm not sure about the complexity cost in the compiler).

Example output when calling Result::unwrap() (can be shortened/lengthened if desired):

thread 'main' panicked at src/main.rs:43:23:
called `Result::unwrap()` on an `Err` value: 5
converted at src/main.rs:13:23 // e.g. `.map_err(...)?`, perhaps only with RUST_BACKTRACE=1
converted at src/main.rs:23:23 // e.g. `.map_err(...)?`, perhaps only with RUST_BACKTRACE=1
caused by `Err(5)` by src/main.rs:99:23 // May not be present if call stack was too deep
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Motivation

Errors in Rust are often propagated to the caller (? operator). This can hide their origin because when the application does end up unwrapping the error the backtrace only shows the .unwrap() location. The eyre crate solves this by storing the entire error together with additional information behind a pointer, but this has downsides:

Creating the error report has an overhead (though it does result in a lot of useful information). This overhead (heap allocation + backtrace) happens every time an error is created, even if it is gracefully handled/discarded afterwards.
The Error type is always 8 bytes large and 8 byte aligned (with the 0x00 pattern unused), even if the underlying error is only a 1-byte enum.

The impact of the second 8-byte alignment depends on T:

// Error enum used in size/alignments below
enum E {
    // Up to 255 variants (just as an example)
}

Type	Size	Alignment	Type (without eyre)	Size	Alignment	Padding bits	Unused patterns
`eyre::Result<()>`	8	8	`Result<(), E>`	1	1	0	0
`eyre::Result<u8>`	16	8	`Result<u8, E>`	2	1	7	0
`eyre::Result<NonZeroU8>`	16	8	`Result<NonZeroU8, E>`	2	1	7	0/1?
`eyre::Result<u64>`	16	8	`Result<u64, E>`	16	8	7+56	0
`eyre::Result<NonZeroU64>`	16	8	`Result<NonZeroU64, E>`	16	8	7+56	0/1?
`eyre::Result<[u8; 4]>`	16	8	`Result<[u8; 4], E>`	5	1	7+24	0
`eyre::Result<[u8; 8]>`	16	8	`Result<[u8; 8], E>`	9	1	7+56	0
`eyre::Result<E>`	16	8	`Result<E, E>`	2	1	7	0?

Larger T (Result<T, E>) types usually mean the additional 8-byte alignment requirement from the eyre pointer has little to no impact, but the size difference between T and E also tends to increase significantly.

For Option<T> there usually is less padding and often no unused patterns.

Internal padding

If I understood it correctly we cannot make use padding bytes added to ensure alignment, but Result and Option often already have internal padding bits/unused patterns (especially when the Ok variant is larger than the Err variant).

In rare cases these extra bits/patterns are used, but usually not more than a single pattern. For example Option<Option<T>> or Option<Result<T>>. But more often than not these bits are unused.

The bytes needed due to the size difference between T and E are never used (as far as I can tell).

Proposal

Let's use these 7 bits + (some of the) extra padding to fit the OK variant to store where the Err was created:

// **Conceptually**, we obviously can't change the signature of `Result`.
enum Result<T, E> {
    Ok(T),
    Err{
        // Whatever we have in padding bits from our own enum + 
        // extra bytes needed by `T` but not `E`.
        error_origin: ???,
        value: E,
    },
}

For every function (ideally after inlining) give each location of error creation (Err(_)) and propagation/manipulation (?, return, ..., .map_err(_)) an index (unique for this function, not including called functions), see example below. If a function has only one error creation/propagation/manipulation site the function does not shift error_origin, its index has a length of 0 bits.
Store this index as part of the debugging information (or somewhere else in static/constant/read-only memory if needed/desired on release builds): HashMap<(function_identifier, index), function_identifier>
At codegen: At error creation sites (Err(_)) set the the error-origin location bits to this index (left-aligned)
At codegen: At propagation/manipulation sites: shift the existing bits to the right to make space for the new index. If the new Result type is smaller than the old one the least significant bits are truncated.
If T is larger than E it should be possible to put the overflown bits into those bytes, instead of discarding them. Storing the error_origin entirely in those bytes would also be an option, and would likely be desired, as it can make the shifting more efficient.

In the following I'm assuming 4 bits are available to store the error_origin.

fn outer() {
    let res = middle(); // No error creation or propagaion => No index
    res.unwrap(); // Maps the index to line numbers and prints it
}
fn middle() -> Result<(), ()> { // 3 bits needed to store the error origin
    inner()?; // index: 0, error_origin: 0000 or 0001
    let x = do_something_else()?; // index: 1, error_origin: 001X
    if (x) {
        return Err(()); // index: 2, error_origin: 0100 (left aligned)
    }
    do_something()?; // index: 3, error_origin: 011X
    do_something().map_err(|_| ())?; // index: 4, error_origin: 100X or 1000, see below
    inner().map_err(|_| {
        // Alternatively, this can be seen as part of the same function and
        // thus be index 5 (with the one below using index 6),
        // e.g. due to inlining.
        // This closure only has one error propagation location and thus
        // needs 0 bits to store the origin.
        inner() // index: 0, error_origin: XXXX (in practice: 0000 or 1000)
    })?; // index: 5, error_origin: 101X (1010 if some_condition returns true, otherwise 1011)
    Ok(())
}
fn inner() -> Result<(), ()> { // 1 bit needed to store the error origin
    if some_condition() {
        Err(()) // index: 0, error_origin: 0000
    }
    Err(()) // index 1, error_origin: 1000 (left aligned)
}

Not shown in the example above: If the 4 bits are not sufficient to store the entire error location the least significant bits are dropped:

If error_origin would only be 3 bits you'd know the error was created somewhere in inner()
If error_origin would only be 2 bits you'd know that the error was created somewhere in middle(), and you would know if it is in the first or second half of the function.

By itself these bits are not really useful (unless you manually analyze the functions starting at the .unwrap() location. But they can be used to print (part of) a backtrace to where the error originated:

impl<T, E> Result<T, E> {
    pub fn unwrap(self) -> T
    where
        E: fmt::Debug,
    {
        match self {
            Ok(t) => t,
            Err(e) => {
                let bt = self.creation_backtrace();
                unwrap_failed(...)
            },
        }
    }

    // Pseudocode
    pub fn creation_backtrace(&self) -> Vec<...> {
        let mut error_origin = ...; // Read from memory representation
        let mut location = read_from_debug_info(get_caller());
        let mut backtrace = vec![];
        while error_origin > 0 {
            let index = first_n_bits(error_origin, location.bitcount);
            error_origin = error_origin << location.bitcount;
            location = read_from_debug_info_indexed(location, index)
            backtrace.push(location);
        }
    }
}

Additional considerations:

is_err()-like comparisons are still as cheap as they are now, since we can store zeroes in the error_origin, thus this is still just a comparison with 0.
Types like Result<(), NonZeroU8> make use of the unused 0x00 pattern in E and thus do not have any unused bits. This type cannot store the error_origin and behaves as it does today.
The same logic can be used for recursive functions (error propagation shifting the bits). There may be additional complexity when the stack frame of a recursive function is overwritten (I forgot how that was called, Tail recursion optimization?), as the error_origin bits would have to be shifted multiple times when the function returns (storing the path took during the recursion). This can be simplified by not storing the entire path/backtrace, effectively skipping all recursive calls and only inserting one index. This may even be the desired behavior, as the number of iterations and the path took is probably less relevant than which line of the recursion caused the error.
It may be desirable to limit the number of bits to a reasonable value, to limit the amount of bit shifting that is needed. For example: Result<[u8; 64], ()> has 64 bytes of space for the error_origin, but all those 64 bytes would need to be bit-shifted at every error propagation site. 4 bytes (32-bit) are probably sufficient in most cases and only causes minimal overhead if an error is returned.
As far as I can tell this is currently not possible/feasible without changes to the compiler: Every error creation/modification/propagation site (including ? would need to be a macro that additionally builds up some global/static table to map the index back to line numbers.
Because this error_origin is stored in the unused bits from Result this information is lost when the error E is stored outside of a Result (e.g. in a match pattern). This isn't really desirable but can be circumvented by adding the ability to extract those bits from as Result: fn error_origin(&self) -> BacktraceBits32 (just an example)
If the Result value is stored outside of local variables (e.g. in a global value, channel or a struct held by a function higher up in the callstack) this compressed backtrace is only of limited use, unless all places that put stuff into this variable have a unique index they add to it before storing it. In that case the compressed backtrace can show where the value was read out of this storage location but not who put it there. The bits containing the indicies would still be there but it is not possible to know which function they belong to. Storing the location that stored it there using a global index (e.g. unique to that storage location, if that's possible) could still be done, but it would likely need many bits and would thus probably get lost before the unwrap().

Extensions

The stdlib could additionally provide a type that allows increasing the number of bits that can be used. For example:

pub struct BacktraceBits32(u32);
pub struct BacktraceBits64(u64);
pub struct BacktraceBitsAlternative<const N: usize>([u8; N]);

// Usage:
// This function can return up to 39 bits for the backtrace,
// at the cost of an increased return value size
fn do_something() -> Result<(), (MyError, BacktraceBits32)> {}
// or
fn do_something() -> Result<(), BacktraceBits32<MyError>> {}

A cargo config value (or an attribute) could be used to set/override the maximum size of error_origin, thus allowing longer (compressed) backtraces at the cost of a larger performance impact when errors are returned.

Option<T> could use the same mechanism to store the None location + compressed backtrace, but there are usually fewer bits available to store this.

Backwards compatibility

As far as I know the memory layout of Result is not set in stone and subject to change. The biggest impact would probably be that Option<Result<T, E>> could get bigger (unless only 6 bits are used for storing the error_origin). Other than that I can't think of types that make use of the unused patterns Result has, as even MyEnum<Result<(), ()>> already needs 2 bytes.

(My) Conclusion

I think this adds a significant benefit - even if it cannot store the complete backtrace - with minimal overhead and would fit well into the language.

scottmcm · December 12, 2024, 7:05pm

To me, the building piece this wants is move-only fields, aka those to which you're not allowed to get references and thus which the compiler would have more flexibility with in layout.

I think if I abstract the request, it's that you'd like a Result<T, (MiniLocation, E)>, just in a way that the MiniLocation can bitpack with the tag in a nice way that's not allowed if that's literally a tuple there and the E forces the size up because of alignment.

But eyre's own docs tell you not to use it for errors you're planning on handling:

Use eyre if you don’t think you’ll do anything with an error other than report it.
~ https://docs.rs/eyre/latest/eyre/#comparison-to-thiserror

Note that this is also a big help for API reasons, since it allows eyre::Result<()> to be returned in a single register.

DragonDev1906 · December 13, 2024, 9:53am

No: I was thinking about always including those MiniLocation bits in a Result without having to specify that (since it should be possible without breaking changes in Result as long as unsafe code cannot rely upon the memory layout Result has today. Effectively allowing the following:

<Result<u8, u8>>::error_origin() // Up to 7 bit
<Result<u16, u8>>::error_origin() // Up to 14 bit (maybe even 23 bit)
<Result<(), u8>>::error_origin() // Up to 7 bit
<Result<u8, ()>>::error_origin() // Up to 7 bit

// One of those (I mean the same with them, but the first was easiest to show
// that they are only relevant for Err, the second is probably more likely how
// it would be implemented). With regards to pattern matching and
// everything else except for memory layout it should be identical to the
// current `Result`.
<Result<u8, (BacktraceBits32, u8)>>::error_origin() // Up to 32 bit
<Result<u8, u8, 32>>::error_origin() // Up to 32 bit

Maybe the following better showcases what I mean:

pub type Result<T, E> = ResultWithMoreBits<T, E, 0>;

// N specifying the minimum number of unused bits for the error
// variant (however that'd work)
enum ResultWithMoreBits<T, E, const N: usize> {
    Ok(T),
    Err(E) // Use all unused (up to 64?) bits to store the indexes
}
impl<T, E, const N: usize> ResultWithMoreBits<T, E, N {
    fn error_origin(&self) -> Option<...> {
        let indicies = std::mem::unused_bits_collect!(self, ResultWithMoreBits::Err);
        let size = indicies.bitcount();
        // ...
    }
}

As far as I know it's not possible to clearly describe in rust without manually manipulating the memory layout. But with a bit of help from the compiler to allow accessing/manipulating the unused bits of a type it should be doable (just an example of how it could look like, in case it helps):

// Read
// Bits that are always unused
std::mem::unused_bits_collect!(self, ResultWithMoreBits)
// Bits that are unused if the variant is Err
std::mem::unused_bits_collect!(self, ResultWithMoreBits::Err)

// Write
std::mem::unused_bits_store!(self, ResultWithMoreBits)
std::mem::unused_bits_store!(self, ResultWithMoreBits::Err)

// in-place shift (if faster)
std::mem::unused_bits_shift!(self, ResultWithMoreBits)
std::mem::unused_bits_shift!(self, ResultWithMoreBits::Err)

Those bits are effectively outside of the E generic from Result<T, E>, (and not visible in the API), but they only exist while in the Err variant. (I hope you see why I'm having trouble representing that in current Rust code, where those padding bits are completely hidden to the developer).

Edit: With structs doing this would be relatively easy, but that doesn't work for enums (at least in a backwards compatible way), as it has to be listed in pattern matching. An alternative to the hidden/macro based approach above would be to allow fields that don't have to be listed in pattern match arms (allowing access to the same padding bytes in a more obvious way when looking at the type definition):

struct {
    a: u8,
    padding_bits: u8,
    b: u16,
}
// As an example:
enum Result<T, E> {
    Ok(T),
    // May need the ability to not be of a fixed size but instead adjust
    // to how much space is left, otherwise it wouldn't work well for `Result`.
    Err(E, u8),
}
// Which could allow the following match arms:
// Err(e) => {}
// Err((a, b)) => {}
// Err(e, padding) => {}
// Err((a, b), padding) => {}

Yes, that's why I think it would make sense to have a system like this. To be used if you plan to handle the error but don't know for sure (which is probably most of the time). A cheap way to do it if your plan to handle them failed for some reason. Maybe enabled for debug builds and disabled on release (by default).

True, but that's also only helpful if your error variant is larger than 8 bytes (with one unused bitpattern, assuming a 64-bit system).

scottmcm · December 13, 2024, 6:53pm

It cannot always be done, because there are some stable layout guarantees: https://doc.rust-lang.org/std/result/#representation.

DragonDev1906 · December 13, 2024, 10:38pm

True. Though unless I'm mistaken that's only when the discriminant fits into T or E and thus there would be no padding bits that can be used (and thus there is no layout change). The good thing is that the proposed solution works for any number of unused bits. If there are no bits available you cannot get this additional information for (near) free. If you really need/want it you have to adjust the result type such that there are extra bits (implicitly or explicitly), otherwise you get exactly as much information as today or when the call stack is too deep.

You're absolutely right: This cannot be done for every possible Result, sometimes there are no unused bits or they are restricted due to existing layout guarantees (or for other optimizations). The goal is to use the bits that are available to provide extra context where/if possible.

Topic		Replies	Views
`Result` optimization for two types with matching invalid patterns compiler	6	335	August 21, 2024
Better panic location reporting for `unwrap()` and friends compiler	13	4179	March 25, 2019
Context on unwrap() libs	7	1266	July 24, 2019
It is hoped that the community can further optimize the performance of track_caller in Result and Option language design	9	1219	July 20, 2023
RFRFC: Better Option/Result error messages libs	8	3074	March 25, 2019