Pre-RFC: repr(bitpacked) for automatic bitflags

Very informal, as I'm not great at writing and new to the process.

Currently Rust's struct layout wastes many bytes on structs with many booleans or many enum values. This is because each boolean has to take up an entire byte to allow getting a &mut to them individually. This is also the case with enum discriminants, which often take up an entire byte with many niches, due to the &mut problem.

My proposed solution to this would be a new repr value, bitpacked, which would disallow getting references to boolean fields, and only allow references to enum data variants via match_ergonomics style matches. These restrictions allow for each boolean to be packed into single bits and enum discriminants to take up only the required number of bits.

Open questions:

  • Is this worth it? In my opinion, this will allow many programs to become more memory efficient with a slight trade off for cpu time, possibly. Memory usage is often overlooked by developers but is becoming more and more important as devices continue to ship with virtual memory swapping using a lot of CPU time. This will also be opt-in, as is has to be to prevent breaking existing code getting references to boolean fields.

Notes: I have already developed a proof-of-concept proc macro for performing this packing for bool and Option<bool> but it cannot support arbitrary enums and relies on getters and setters that repr(bitpacked) wouldn't have to rely on, alongside being somewhat janky with other derives.

4 Likes

bitflags is in the top most downloaded crates, which I think is a good indicator that there is a need for something like this.

6 Likes

Has anyone written up a list of the issues with using bitflags that could be solved by a native implementation? If it's Good Enough :tm: as an external crate is there an actual need to integrate it into the compiler?

1 Like

The problems I see right now are that it:

  1. Requires significant refactoring of a struct to use
  2. Cannot work to pack enum discriminants, without a significant amount of unsafe boilerplate for every struct.

It is a great crate, and my proof of concept macro just punts the actual bitflag work to that crate, but having the layout behaviour built-in without having to prescribe tags manually and supporting enums would be much better.

3 Likes

Perhaps a slightly different approach to the issue would be ability to define copy-only fields, i.e. a field that cannot be referenced with &/&mut, and where struct.field is equivalent to running a magical getter that returns a copy of the data.

This way rustc would be free to group together booleans, apply further enum optimizations, and also make #[repr(packed)] usable with unaligned data.

8 Likes

This is, I think, the right approach. It mixes better for things like struct Foo(HashSet<u32>, move bool, move bool); where you really want to have &muts to the hash set, but you want the bools to pack together.

The repr level doesn't seem like the right place. Especially because it would need a bunch of the "and it means that the repr changes the normal rules" problems that are part of what makes repr(packed) bad.

15 Likes

My vision for this involves utilizing the concept of restrictions. It should be possible to place #[restrict(ref)] on a field, which would let the compiler do whatever it wants for optimization.

1 Like

The problem I see with copy only fields is that it doesn't allow for the match_ergonomics magic I envision for repr(bitpacked). This means less fields can take advantage of it, eg: Option<String>. I guess copy only fields would be a better language feature, but bitpacked is a better optimisation tool.

Would you elaborate on exactly what you had in mind that interacts with match ergonomics?

I think such fields could be made to work with match ergonomics — they wouldn't support ref, and therefore they'd always match via copy/move instead, even if matching through a reference.

8 Likes

Or, to the extent they allow references, they'd have to be &move references (or whatever we end up calling them), meaning "you can extract a value of this type but the memory at the location doesn't have the layout of the original type".

4 Likes

I mean that for a

#[repr(bitpacked)]
struct Example {
    opt_str: Option<String>,
    bool_1: bool,
}

You could get access to a reference to the string by doing

let example: Example;
match &example.opt_str {
    Some(my_str) => {} // my_str is &String,
    None => {}
}

as the string value is still stored perfectly normally for a reference, it's just the option tag that is packed.

1 Like

They can't even be &move bool, really, because any function taking a &move bool would need it to be represented the same way, and if you bitpacked multiple bools into a byte, there's no way for the thing to know what the decoding scheme is.

(Unless &move bool is implemented as passing by value, but that feels weird.)

2 Likes

For &move to be useful, it would have to allow not representing it the same way. That could be done with a wide pointer that has metadata, or with by-value in some cases, or with an implicit generic (things taking a &move end up with multiple implementations); there may be other possibilities as well.

As cool as &move could be, it would:

  • Not be able to optimise enum discrim tagging
  • Would be a significant change to the language
  • Would be significantly harder to implement

If anyone wants to open another thread for that, please do so, but this proposal is for specifically a new repr to affect struct layout as I believe that would be a better idea.

It would, that's part of the point. Having "move-only types" that you can't reference in place, and can only move/copy in or out, allows optimizing those types more aggressively, including packing bitfields, renumbering enums, and similar. You don't even need to have "move references" for that, they just provide further convenience.

repr(packed) has been a substantial source of complexity, user traps, and corner cases. Adding a new mechanism that's similar to packed but more heavily packed seems likely to have similar issues and similar complexity. In addition, we'd need the same logic for "can't take a reference to this".

1 Like

I should have been more specific, move only references would be too restrictive to be usable for the enum discrim tagging I made this PR in mind for, specifically types like Option<String>. Unless move only references allowed the match ergonomics I talked about above unconditionally, but that could be too restrictive for optimizations.

The language already has repr(packed), and I don't believe it's a mistake to write off, so bitpacked could work with the teaching of repr(packed) to not have to be taught from scratch.