Pre-RFC: mem::trailing_padding!

Problem statement

It is quite common in rust to have an array of enums that is iterated over. It is also quite common for some variants of that enum to be much larger than others. Implemented trivially, this will result in quite a lot of wasted space for padding. This can be solved by defining a full bytecode syntax, but this requires implementing a bunch of encoding and decoding logic, which wastes CPU cycles and introduces opprotunities for programmer error.

Guide level explanation

Add a trailing_padding! macro to core::mem. This macro would take the name of a type or enum variant, and evaluate to an integer literal.

This literal would be the number of trailing "padding" bytes in that enum variant.

Padding bytes can have any value, so this would allow the creation of a crate that implements a datastructure that carefully overlays the padding at the end of one enum with the start of another. Because padding can have any value, and because this datastructure would be append-only, this would be sound. Combined with #[repr(C)], this would essentially allow easily creating an opcode prefixed bytecode representation just through an enum and a derive macro.

Alternatives

A const function that accepts a mem::Discriminant and returns the number of trailing padding bytes.

2 Likes

Reminds me of https://lang-team.rust-lang.org/frequently-requested-changes.html#size--stride, where I hypothesized something like this that would give a Range<usize> for a type or an instance saying which bytes are sufficient to copy, with the trivial 0..sizeof(T) implementation always being legal if a compiler wants.

Importantly, this change would not effect how Vec or any other pre-existing type functions.

Another alternative would be full layout reflection, or at least saying where all the padding bytes are.

You could almost implement this today in a derive macro by looking at the offsets and sizes of all fields and finding the last occupied byte. The catch is that you can't ask where the enum discriminant is stored (or any other non-field repr complications that might be introduced in the future), but that's well-defined for #[repr(C)] and #[repr(uN)] enums. That would be sufficient to write a prototype of this feature without modifying the compiler.

2 Likes

If you compact enums like that, won't this ruin alignment? You won't be able to repr(C)-read any except the first one?

Important thing to be careful of: if a tail field is not Freeze (is shared mutable; an UnsafeCell) then some of the trailing padding may be writable from safe code, and any calculations need to be careful to consider that.

Whether padding is valid to write to with unsafe doesn't matter since it's still unsound to write through &T not via the UnsafeCell API even if it happens to be generally valid.[1]

Note that it's possible to use repr(C) and/or repr(uN) to create a bytecode which can load safe enums with ptr::read_unaligned, although it might be somewhat less optimal than a manual bytecode definition. The RFC text defining these layouts: https://rust-lang.github.io/rfcs/2195-really-tagged-unions.html

Ideally this would be a function like mem::size_of, but that would require “enum variants as types” to be utilized as intended by the target problem space.

Or similarly, an “*_of_val” function. The limitation of course being that such needs an instance of the variant to get the trailing padding, just like currently you need to get a Discriminant value.

There's no real requirement for this to be a macro like there is for offset_of!, so I'd personally prefer a solution which doesn't use a macro in the stdlib. A crate can of course provide a wrapping macro that sticks the fn in a const{} block[2].

Given the definition is as-if defined with struct and union, a proc-macro with access to the enum definition (e.g. a derive) could accomplish the goal with just a little bit of Layout math on said as-if types.

I didn't even think of that possibility, which gives a bit tighter of a bound than just using the size of the variant payload as-if types.

You can almost get this today with just offset_of! and Layout of the fields' types, if you know the field name/types (i.e. #[derive]). The macro auto(de)ref specialization tricks I'm conceptualizing to get a nice ergonomic behavior are honestly kinda frightening.

Technically, that is all entirely private unstable impl details, so it doesn't matter whether they change their impl strategy or not. They should do whatever has the most consistent/predictable “zero overhead” performance characteristics.

read_unaligned or repr(packed) solve that (although I don't think we actually define a repr(C, packed) for enums), or if the variants are sufficiently different in size (given the desire to overlap, they likely are), it's possible to align the next instance properly while still starting in the padding of the prior.


  1. This is the stated position of T-opsem, and the used justification for immutable static constant promotion of UnsafeCell-free enum variants even when the full type is not Freeze and potential opsem would make writes valid when not promoted to immutable memory. (And a trivial proof/example: I can retag memory as a Freeze array of bytes and then refcast it back to the type in question, and this should be considered sound if the reachable safe API cannot cause a write to the now immutable memory.) ↩︎

  2. And because of various impl details, doing so can actually be better for compile time, since it reduces the amount of MIR cost to the function. ↩︎

1 Like

This isn't a problem unless you are

  1. using a “deep, not shallow” definition of trailing padding that looks into field types rather than treating them as opaque except for ordinary size_of, and
  2. looking through UnsafeCell<T> to the padding of T.

I'd say this operation shouldn’t do (2), in exactly the same way looking for niches doesn’t, and possibly shouldn't do (1) either.

2 Likes

And so Vec (and even more so, slices) could not do this, because it is expected to offer O(1) random read-write access which depends on uniformly-sized elements. Collections like BTreeMap could try, but they’d have to expand out to full-size nodes on any get_mut() or iter_mut() (because you can never hand out a &mut reference to these padding-overlapped enums without potentially smashing following elements). In general, std doesn’t deal in append-only or “immutable” collections, so there aren’t many opportunities to use this kind of layout in std.

the reason i prefer a macro solution is because of the aforementioned limitation of needing an instance of the enum. i want this to be useable in derive macros, and a function-based solution would struggle, expecially if the enum variant contained runtime-only values.

of course, another alternative would be first-class curried enum variants. it's already possible to instantiate values that are a variant that is missing it's value, but only with tuple enums:

enum En {
    Um(u8)
}

fn main() {
    let _x = En::Um;
}

That's a function item or so, not a variant value. You can let _y = |b| En::Em { b }; too.

offset_of is a macro, so I don't see why this wouldn't be as well.

it's an opaque type that implements Fn. i don't know a ton about the compiler, but theoretically it could implement other traits too.

let _x: En = En::Um; doesn't compile, and the error says says it's fn(u8) -> En.

The type error should be saying not fn(u8) -> En but fn(u8) -> En {En::Um}. This is the (not nameable in surface syntax) function item type of the variant constructor. Every fn foo(){} item, and every tuple-struct or tuple-variant constructor, has a unique function item type.[1]

There is no type-system obstacle to deciding that constructor function item types should implement another trait that gives information about the variant (perhaps via returning mem::Discriminant?).


  1. One reason why such unique types exist is so that they can be zero-sized types, so that when functions are passed to generic code and stored in generic data structures, the function takes up no space and is easily inlinable after monomorphization. Function pointers can also sometimes be bypassed by LLVM optimization of code, but not removed from struct layout. ↩︎

3 Likes