Bitfields: wanted but hated [Pre-pre-RFC discussion]


#1

I have noticed that there have been several attempts at getting things like bit-fields within Rust. Such as n-bit numbers but these have either been closed for later or decided that it was too complicated.

So this post is to facilitate discussion on what sort of things would be wanted with these sort of fields.

Descriptiveness:

  • C/C++ have bitfields which have a standard format type name : n-bits;. This is good as it is very similar to how other fields are defined in those languages. In rust this might look something like name : type : n-bits;
  • From documentation, the fields are generally defined not as a list of field with sizes but as labels to specific bits or range of bits within the registers. In rust this might look like this: name : 1 as type; or name : type 3..=6 as type;.

So within rust we could use either and there are pros and cons to either. With the C/C++ version we have a more recognizable syntax and means of describing the layout. But I would argue that when using bitfields the location of the field is more important then how large it is. Using a different syntax also allows Rust to be easier to read in relation to documentation.

The size of these structures are also a factor when talking about how to layout the data. In C/C++ the size of the structure is the sum of the number of bits rounded up (generally to bytes or words). This makes sense to have a similar restriction in rust but it might also make sense for bitfields to have a predefined maximum size that is separate to that of the layout. This is especially true if opting for a range based layout since the rest of the bits of a 64-bit register might be ignored or reserved for later use.

Proposal:

// bit size
struct name (64) {
    ...
};

// byte size
struct name (8) {
    ...
};

Types:

  • In C/C++ any integral type (char, int, short, …) is allowed to be the type of a bitfield. This disallowed structures and classes which make sense.
  • Another restriction (in strict more) is that the number of bits cannot be more than the size of the type, which makes sense but only insofar as the holding type should be able to hold it.
  • I would say that whether it is signed or unsigned is more important and the maximum number of bits should be the max size that the architecture and language can handle (64 or 128 on some systems). By this reason I think that only signed and unsigned should be used (or whatever i stands for in i32).

Mixing:

  • Should non-bit fields be allowed to be mixed with bit fields within the same structure. I would argue that this should not be allowed since they for different sorts of purposes.

Usability

Casting:

  • Should casting between bit field structures be allowed?
  • I would say yes on the condition that the size of the two structures are the same. This is because sometimes two different layouts are used depending on some field within the structure. This obviously cannot be done statically then.

Setting fields:

  • What should happen if a value that is unrepresented in a bit field is assigned to it at runtime (one that cannot be determined statically)?
  • Several options: panic (very strange, rust doesn’t do this), bound (meaning that it is set to the max value), masked (ignore bits that cannot be represented, this has problems with signed numbers), Result<Err, ()> (have the assignment return a result which can be bubbled up), fail and do nothing.
  • I would say that bound is the best choice because of ergonomic considerations.

References and Pointers:

  • Like in C/C++ I don’t think that (at least at first) references should be able to be taken for bitfields since there size is not uniform with other types.

Final Exmple:

struct Bitfield(64) {
    present : unsigned as 0,
    page_size : unsigned as 3..=5,
    address : unsigned as 16..=46,
    other : signed as 60..=63
}

#2

First off, someone has to ask this: why not a custom derive?

#[derive(BitField)]
#[bitfield(present(unsigned, 0),
           page_size(unsigned, 3, 5),
           address(unsigned, 16, 46),
           other(signed, 60, 63))]
struct MyFlags(u64);

Expands to:

struct MyFlags(u64);

impl MyFlags {
    fn present(&self) -> bool /* or u8 */ {
        self.0 >> 0 & 0x01 != 0
    }
    fn set_present(&mut self, bit: bool /* or u8 */) {
        self.0 = self.0 & !0x01 | (bit as u64) << 0
    }

    fn page_size(&self) -> u8 {
        self.0 >> 3 & 0x07
    }
    fn set_page_size(&mut self, bits: u8) {
        self.0 = self.0 & !(0x07 << 3) | (bits & 0x7 << 3) // truncation is just an example, more on this later
    }

    // etc.
}

You can’t reasonably take the address of the field anyway, so using a function is not a limitation compared to a direct field access.


If, however, bitfield types are to be added to the language, I think at this point (based on the proposal) they are so fundamentally different from regular structs that they should be a separate kind of type altogether. If field “types” don’t really correspond to types anymore, if their address can’t be taken, if the syntax is generally just very different from structs, if the bit-typed fields aren’t allowed to be mixed with regular fields, then why should we forcibly try to unify the two concepts?

That should probably require a stronger condition (e.g. the existence of all identically-named and identically-typed fields lying on the exact same bit positions). If you just allow casting one to another based purely on the predicate that they have the same size, then people will be free to cast completely unrelated bitfields, which seems like a huge source of errors and doesn’t seem to have its place in a strictly-typed language where there are (rightfully) no implicit integer coercions.

For a fallible operation, typically returning Result is the right default choice. Again, second-guessing the user is Bad™, and just truncating the bitfield sounds pretty much like silently second-guessing the user. Perceived (or even real) “ergonomics” should not take precedence over correctness. Another advantage of the derive-based approach is that the setter is a real function which has no problem of returning a value.

By the way, assigning a dynamically incorrect value would be impossible with the introduction of arbitrary-bit-width integers, but if we don’t have them, then I don’t really see the value in a special bitfield type because then assigning dynamically incorrect values to its fields is still possible, ie. it’s not an improvement over a getter-and-setter-with-uN-or-iN-based approach.


#3

C allows mixing bitfields and non-bitfields in the same structure. I would argue for the following requirement:

  • It should be possible to take a C struct that includes bitfields (and other things like unnamed unions and unnamed structs), convert it to a C-compatible Rust struct, and not have to invent any new named types or new named fields.

#4

If Rust is going to finally address bit-fields that are not exactly 2N bits wide, where 3≤N≤7, I’d really like an approach like Zig’s that makes uN and iN first-class integer types for all N ≤ 128. (Zig does this for N ≤ 65535, but such large bit fields probably would entail more significant changes in rustc and perhaps llvm than limiting to the currently-largest u128 and i128 integer types.)


#5

Bit-fields are typically packed, which means they don’t have well-formed addresses. Expressing this with non-power-of-two integer types introduces a nasty corner case to the language, where certain types can’t be addressed.

How would you feel about something like

struct MyBitField(..);
#[repr(C)]
struct Foo {
  a: A,
  #[inline] b: MyBitField,
  c: C,
}

where the #[inline] in field position causes the field to behave “as if” it was inlined for the purposes of layout? I suspect this might result in some hilarious addressing problems if the &MyBitField you get out of your &Foo has different padding…


#6

:+1: for a procedural macro approach. The Chrome OS codebase has a pretty slick implementation of bitfields that works like this:

#[bitfield]
#[derive(Clone, Copy, PartialEq)]
pub struct Trb {
    parameter: B64,
    status: B32,
    cycle: B1,
    flags: B9,
    trb_type: B6,
    control: B16,
}

const_assert!(mem::size_of::<Trb>() == 16);


// expands by replacing the above with:
pub struct Trb {
    data: [u8; 16],
}
impl Trb {
    pub fn new() -> Trb;
    pub fn get_parameter(&self) -> u64;
    pub fn set_parameter(&mut self, val: u64);
    pub fn get_status(&self) -> u32;
    pub fn set_status(&mut self, val: u32);
    pub fn get_cycle(&self) -> bool;
    pub fn set_cycle(&mut self, val: bool);
    pub fn get_flags(&self) -> u16;
    pub fn set_flags(&mut self, val: u16);
    pub fn get_trb_type(&self) -> u8;
    pub fn set_trb_type(&mut self, val: u8);
    pub fn get_control(&self) -> u16;
    pub fn set_control(&mut self, val: u16);
}

#7

There’s also a custom derive-based bitfield crate:

https://crates.io/crates/bitfield


#8

Bitfields, and bits in general, are extremely important in the embedded context. It’s worth to check out how it’s implemented in ARM HALs: https://rust-embedded.github.io/discovery/07-registers/type-safe-manipulation.html

BTW; In C, the bit ordering in a bitfield is undefined.


#9

This may not be such a nasty corner case if you also forbid other things, like returning them from functions or composing them with other types. They would still be ‘special’, but you could probably contain that by localizing it to a struct definition and inherent methods.

(Though as a disclaimer, I personally don’t have any pressing need for bitfields, so I’m not confident this wouldn’t cause problems for the people who do need them.)


#10

I’ll repeat the conclusions from one of the previous RFCs (not sure which exactly, perhaps all of them :smile:).

There are two primary use cases for bit-fields, that are very different from each other:

  • Bitfields specifying precise layout - bit ranges that can be specified by position and length, or by start and end positions, containers for bit-fields with well-defined sizes (32, 64, …), perhaps more fancy stuff - read-only bits, always-one / always-zero bits, sticky bits (think of emulating hardware registers).
    This problem is well solvable with DSLs created using procedural macros.
    More than that, I think we should have a well established macro library and DSL for this as a reference point before starting thinking about pulling this into the language.
  • Bitfields compatible with C, for FFI. These bitfields have totally unspecified layout, lack any fancy stuff beyond what C provides, but match what C ABI does exactly, that’s their only purpose.
    I suspect it’s possible to incorporate this ABI knowledge into a procedural macro library/DSL as well, but maybe it’s better suitable for the compiler.
    At least the feature set is well-defined in this case - what C allows and no more.

#11

As stated in my previous post:

Having to create the separate MyBitField type makes that onerous, compared to simply inlining the bitfields as fields.


#12

I don’t want to make the perfect the enemy of the good. I would suggest that we start out with C-compatible bitfields, and then people who want precise layout can use the same base syntax together with either some additional language extensions or procedural macros.


#13

I however, don’t think that not being able to tell how the data is laid out is desirable.

Sebastian Malton


#14

Oops, I misread your reply, I thought you wanted to not mix normal fields and bitfields. I’m only really against making “partial byte numbers” behave like real types, because they’re barely types in the Rust sense. I think some attribute-ad-hoc thing makes sense, like

#[repr(packed)]
struct K {
  a: A,
  #[bits(3)] b1: u8,
  #[bits(1)] _: u8, // We'd allow _ as a field
                    // name only for this purpose, i.e.,
                    // equivalent of `uint8_t :1;`
  #[bits(4)] b2: u8,
  b: B,
}

Note that since the layout is implementation-defined in C, we might want to make the packed layout opt-in.


Pre-RFC: binary patterns