Tiny non-Box panic payloads

Panicking code paths are not cheap, especially in terms of binary size. Most of them construct fmt::Arguments and gather Location, so every panicking branch adds several instructions, constants, and additional strings to the executable.

From what I can tell in libstd allocation of Box<dyn Any> is unavoidable, even tough PanicPayload could avoid it, there's __rust_start_panic and __rust_panic_cleanup that must get a Box, not just the &mut dyn PanicPayload. So that's not great either, especially for OOM panics.

And in libstd there are hundreds of uses of .expect() and .unwrap(). Some of them like bounds checks are hit frequently, and need good error messages and precise location. However, there's also a lot of checks for really rare situations or assertions that may never be hit unless there's a bug in libstd (e.g. reference count overflow on 64-bit).


I'm wondering whether there could be a fast/lean path for panics that "throw" just an integer error code or a thin static pointer.

For example, what if Box<dyn Any + Send> could be replaced with:

enum PanicPayload {
    Fat(Box<dyn Any + Send>),
    Lean(ErrorCode), // or Lean(&'static ErrorInfo),
}

and those internals like .expect("hmmm, this should never happen") instead of constructing and allocating fmt::Arguments for the whole string could be .oops(error!(E1235)) that could translate to just an integer in release builds, and possibly still have fully-featured error descriptions in debug builds.

1 Like

This relies on global enumeration of the possible non-fmt panic messages, so it'd be difficult to do in an extensible manner. However, the experimental work around dyn* might be of assistance. Niching in a &'static impl Sized variant would also work. The interesting part would be constructing it in such a way that debug messages could be stripped, if that's desired. (A downside of not using nul terminated strings is that you can't just replace a static string with b'\0' to erase it less destructively.)

Also note that the actual panic payload essentially must fit into two pointers in order to ride the EH ABI. Additionally, the Itanium EH ABI effectively requires a dynamic allocation in order to unwind (although using a static allocation pool just for this instead of the general purpose allocator is possible).

Finally, catch_unwind's signature means that the panic payload will still need to eventually be boxed even if this can be deferred until after some cleanup has been done.

1 Like

Program-global enumeration is an unsolved problem in Rust, but I think a custom solution for libcore + libstd could work, even if the panic error codes were assigned manually (the regular E0000 compiler errors already have such table).

Box<dyn Any> has a niche, so it can fit in an enum with something else while remaining 16 bytes large. The downside is that Option<PanicPayloadEnum> is 24B in safe code, but maybe there could be an unsafe hack that casts everything to *mut dyn Any for ABI purposes, and before Box::from_raw checks the type to know whether that's a real Box or a static/leaked ErrorCode.

Information for the libstd panics by code could have a lookup table like

extern { static DETAILS_LOOKUP_TABLE: &[ExtraInfo] }

and rustc or Cargo would need to know to link either all_the_details.rlib or empty_table.rlib depending on the build profile.

This would mean that every program which has any std panic messages from this table would have all of them, because ordinary dead code elimination cannot subset this table. That seems likely to be bad, because many programs will not use many parts of std (e.g. std::process or std::sync::mpsc).

I think I missed something here. Box<dyn Any> is 16 bytes, it has a niche for null, which gives 8 spare bytes; &'static dyn Any doesn’t fit in that, but &'static StaticPanicInfo does, and then there’s still another niche for null, which fits None. Or did I analyze that incorrectly?

well, if you're being particularly terrible, you can have more than just that. e.g.:

// mutable so we know it won't overlap with any compiler generated vtables,
// not sure if just immutable static u8 is sufficient
static MY_CUSTOM_VTABLE: SyncUnsafeCell<u8> = SyncUnsafeCell::new(0u8);

// or something else ptr-sized that's non-zero
#[repr(usize)]
pub enum MyCustomErrorCode {
    A = 1,
    B = 2,
}

/// it's UB to drop the Box, or to dyn upcast it, or a bunch of other things, be really careful
pub unsafe fn pass_off_as_box(v: MyCustomErrorCode) -> Box<dyn Any> {
    Box::from_raw(ptr::from_raw_parts(
        v as usize as *const (),
        mem::transmute(&MY_CUSTOM_VTABLE),
    ))
}

pub fn unwrap_my_custom_error_code(v: Box<dyn Any>) -> Result<MyCustomErrorCode, Box<dyn Any>> {
    unsafe {
        let p = Box::into_raw(v);
        let m = ptr::metadata(p);
        let m: *const SyncUnsafeCell<u8> = mem::transmute(m);
        if ptr::addr_eq(m, &MY_CUSTOM_VTABLE) {
            Ok(mem::transmute(p as *const () as usize))
        } else {
            Err(Box::from_raw(p))
        }
    }
}
3 Likes

I think you could pull off some linker hackery, something like:

#[linker(section = ".rodata.std-messages")]
static MESSAGE_START = ();
#[linker(section = ".rodata.std-messages.123")]
static MESSAGE_123 = "....";

static MESSAGE_123_ID = &MESSAGE_123 - &MESSAGE_START;

But I'm sure there's a less insane way to do it.

2 Likes

I like it he idea, but wouldn't the ID then depend on the actual messages that a given program use? Any elided message would result in IDs shifting around.

I'm not sure a non-Box payload is necessary for the stated benefits. There could just be a function fn panic_with_code(ErrorCode) -> !, which allocates a corresponding error object and throws it. This would mostly have the same pros and cons as the original proposal. That is – Pros: saves binary size, stack usage for fmt::Arguments, and other costs associated with panic paths. Cons: you're missing the full error info.

A non-Box payload would be better for OOM panics, but you'd still need to allocate a Box if someone uses catch_unwind. Might be better to just preallocate a Box at startup and have this new panic_with_code function throw it.

However…

Simplifying the payload would have serious benefits for no_std code. I would love to have a way to build libcore without any string formatting machinery, which would imply that core::panic::PanicInfo couldn't use fmt::Arguments.

4 Likes

So long as it's consistently just used to get the message (or whatever error info you like) that should be fine, which seems to have been the goal? You're just trying to smuggle a static pointer in a smaller integer, basically.

It's probably a much better idea to ensure you can use alignment to get a enum descriminator into a niche, so the or Lean(&'static ErrorInfo) in the original post works. I dunno if that currently works?