Another case for smaller structs

Continuation of Towards even smaller structs

I realized there's one more use-case for #[pack] / #[flatten] / #[cant_take_address_of_this] attribute for fields. In today rust, you sometimes need to denormalized struct with an enum to make it smaller. To give an example from rowan, where saving extra bytes matters:

pub(crate) enum GreenChild1 {
    Node { rel_offset: TextSize, node: GreenNode },
    Token { rel_offset: TextSize, token: GreenToken },
}
static_assert!(mem::size_of::<GreenChild>() == mem::size_of::<usize>() * 2);

pub(crate) struct GreenChild2 {
    rel_offset: TextSize,
    node_or_token: Either<GreenNode, GreenToken>,
}
static_assert!(mem::size_of::<GreenChild>() == mem::size_of::<usize>() * 3);

Both versions define the same set of values. The second version is nicer code, but it takes more memory. I't be cool if I can say

pub(crate) struct GreenChild2 {
    rel_offset: TextSize,
    #[packed]
    node_or_token: Either<GreenNode, GreenToken>,
}

to shrink it by one usize.

Source code.

Trivia: realized that I need to post it while explaining how rust-analyzer syntax trees work.

7 Likes

Why is that? I am not familiar with that crate. The types look like they should occupy the same amount of space.

Also would be nice to say explicitly this calls for non-transitive compacting (what I termed #[repr(inline)]), and think for a while if transitive compacting may yield further advantages.

Though having drafted that proto-RFC for a while, I don’t think it’s more use cases that this feature needs to land. What it needs is careful consideration how to reconcile it with other Rust features, like borrow returning, UnsafeCell, smart pointers, etc.

(Although a couple of better examples in the RFC wouldn’t hurt. This is probably the weakest point of my draft.)

Both GreenNode and GreenToken are pointers, so size_of::<Either<Node, Token>> is 2 usizes: pointer + tag + alignment. For c: GreenChild2, the &c.node_or_token thus should return a reference to proper either, so just node_or_token field occupies two usizes. Then, you need to add rel_offest on top of that, which is again aligned to 3*usize.

For the first case, while the data is logically Either<Node, Token>, we can't take a reference to that. So the compiler can stuff the discriminant into the padding between rel_offest and node or token.

Here's a minimized example:

enum E1 {
    A { x: i32, y: *const () },
    B { x: i32, y: *const () },
}

struct E2 {
    x: i32,
    ab: Either<*const (), *const ()>,
}

enum Either<L, R> { Left(L), Right(R) }

fn main() {
    println!("{}", std::mem::size_of::<E1>()); // 16
    println!("{}", std::mem::size_of::<E2>()); // 24
}
9 Likes

Another situation where some way to control this would be extremely useful is a struct where all/many fields are Option<T>. If there's a way to guarantee that the field never has a reference to it, the Some/None bits could be bitpacked into a single field. I've benchmarked this before: it does not have a performance impact in my experience.

Bitpacking "is None" bits can also be useful when you just have a lot of Option<T> variables. For instance, from this users- thread,

struct LandmarkFileColumns {
    addr: usize,
    longitude: usize,
    latitude: usize,
    distance: usize,
}

fn select_columns(header: &csv::StringRecord, location: &str)
    -> Result<LandmarkFileColumns, LandmarkFileMissingColumnsError> {

    let mut addr: Option<usize> = None;
    let mut lon:  Option<usize> = None;
    let mut lat:  Option<usize> = None;
    let mut dist: Option<usize> = None;

    for (index, field) in header.iter().enumerate() {
        match field {
            "addr"             => { addr = Some(index); },
            "longitude"        => { lon  = Some(index); },
            "latitude"         => { lat  = Some(index); },
            f if f == location => { dist = Some(index); },
            _                  => {},
        }
        if let (Some(addr), Some(lon), Some(lat), Some(dist)) =
            (addr, lon, lat, dist) {
                return Ok(LandmarkFileColumns { addr, lon, lat, dist });
            }
    }
    return Err(LandmarkFileMissingColumnsError(/* ... */));
}

Because usize has no niches, the compiler needs 8 words for the 4 Option<usize> variables, and (on x86-64) we run out of registers. Hand-coded assembly would pack all four of the "is None" bits into one register. It's not that big of a deal (this particular function is executed only once per run of the program it was extracted from) but I do kinda wish the compiler knew how to do that.

3 Likes

Same idea, but limiting it to structs/enums at first seems more plausible. I've previously suggested doing this automatically if no references are taken, but some expressed concern about size_of::<T>() not being deterministic in that case.

One advantage of starting with local stack variables is that this has no observable side effects (aside from the generated assembly). Applying this to a struct means that attempting to take the address must result in a compilation error (or prevent the optimization, but that only makes sense if it's automatically applied in which case I agree with the concern around size_of). But this can be applied to local stack variables automatically be checking if any references are taken (no references → optimization can be applied; yes references → optimization cannot be applied) with zero impact to whether or not the code compiles or what size_of returns.

Won't LLVM SROA already do this for locals? Probably not bitpacking, exactly, but if it splits the aggregate tag+value, it can also optimize that tag in whatever way it sees fit.

I'm not good at reading LLVM IR dumps, but I'm pretty sure, based on the generated assembly, that SROA does split up the aggregates in the example. It's the further optimization of "these four abstract variables could share a register" that's missing.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.