Reviving the bit-data discussion

The main language-level bit structure mechanism I'm aware of is Erlang's

I just wanted to +1 this, and also note that the real power of the Erlang bit syntax is combining it with pattern matching. Here's an example from the linked page:

case Dgram of 
    <<?IP_VERSION:4, HLen:4, SrvcType:8, TotLen:16, 
      ID:16, Flgs:3, FragOff:13,
      TTL:8, Proto:8, HdrChkSum:16,
      SrcIP:32,
      DestIP:32, RestDgram/binary>> when HLen>=5, 4*HLen=<DgramSize ->
        OptsLen = 4*(HLen - ?IP_MIN_HDR_LEN),
        <<Opts:OptsLen/binary,Data/binary>> = RestDgram,
    ...
end.

As @cbiffle said, this is primarily useful specifically for the case of destructuring binary data i.e. protocol parsing as seen in the example above, but it's an amazing feature of Erlang and one I think would fit quite well into Rust.

1 Like

Personally I strongly dislike the idea of adding u1, u2, etc. to the language. In my opinion better approach will be introduction of something like BitArray<N> where N is the type level integer. If we add additional restraints on N (e.g. from this RFC) we could define convenience methods for conversion to other types with necessary limits on N if needed. (e.g. to_be_u8 and to_le_u8 will be defined only for 0<=N<=8, and to_bool only for N=1) So color example will look something like this:

// bitpacked will join booleans and BitArray, if struct has integers fields
// they will be stored as usual
#[bitpacked] 
struct RGB565 {
    r: BitArray<5>, g: BitArray<6>, b: BitArray<5>
}

impl RGB565 {
    fn r(self) -> u8 { self.r.to_le_u8() }
    fn g(self) -> u8 { self.g.to_le_u8() }
    fn b(self) -> u8 { self.b.to_le_u8() }
    // if r=>32 ignore significant bits
    fn set_r(&mut self, r: u8) { self.r.set_le_u8(r) }
    fn set_g(&mut self, g: u8) { self.r.set_le_u8(g) }
    fn set_b(&mut self, b: u8) { self.r.set_le_u8(b) }
}

@newpavlov I see that you dislike the idea of {u,i}N syntax, but you donā€™t explain why that is, so I am not sure how to respond. Note that the word ā€œbit-arrayā€ has a very different connotation than what we are talking about here; we are thinking of u5 as a type for values {0, 1, ..., 31}, not as an array of bits.

Having said that; I could imagine combinations such as:

  • UInt<N> and Int<N>
  • UnsignedInt<N> and SignedInt<N>

The problem is that these types need a representation within the type-checker, and as such, they are ā€œprimitive types,ā€ not something that resides in userland. The compiler needs to know the bit-size. So perhaps the right syntax should follow the lower-case convention:

Short | Long
------+---------
u<N>  | uint<N>
i<N>  |  int<N>

Keep in mind, the longer the type names are, you will get a more cluttered and unreadable type specification. I think that is something you want to avoid. As a matter of fact, I would probably always type define

type u1 = u<1>;
type u2 = u<2>;
type u3 = u<3>;
// and so on
#[bitpacked(network_order)] 
struct DataGramHdr {
  ip_version: u4, hdr_len: u4,
  srvc_type: u8,
  total_len : u16,
  id : u16,
  flags: u3, frag_offset: u13,
  ttl: u8, proto: u8, 
  hdr_checksum: u16,
  src_ip, dst_ip: u32  
}

{
  if dgram.hdr_len >= 5 && 4 * dgram.hdr_len <= dgram_size {
    // ...
  }
}

This would be a jarring departure from Rust's behavior with other integral types. Note that today, underflow and overflow result in an unspecified value in Rust: by default they panic in Debug and wrap in Release, but the hope is to get them to always panic by default (once performance is sorted out).

A specific wrapping operation (or wrapper type) is necessary to get the wrapping behavior.

If you wish for i1 or u4, I strongly advise following the semantics of their predecessors.

@matthieum Thanks for pointing that out!

I wish there was a more ergonomic way of writing wrapping, saturating, and overflowing adds in Rust, but thatā€™s for another RFC.

The main reason why I donā€™t like {u,i}N syntax is that it blends with existing types while behaving quite differently. Also I donā€™t know any languages which implement such types and this is a clearly a worrying sign.

In my opinion such types will add too much implicit magic behind curtains compared to something like BitArray which explicitly shows what we are working with bits with explicit conversion between bits to integers with regard to big and little endians. In other words cost of such solution will be too high for, lets be honest, quite niche use case.

Less extreme solution could be use of [bool; N] with compiler support for bit packing and some trait which will add convenience methods for bits conversion and representation. Yes, it will be significantly more verbose, but it can be done without drastic changes to the language and it will follow Rustā€™s explicitness inclination.

First, thank you for clarifying your position. I do see an issue with the difference between the native u8 type and the bitfield type u<8>, and I agree - unless the semantic of those types are the same, we canā€™t use the same syntax.

However, Iā€™d like to challenge you on the ā€œquite niche use case,ā€ quote. Iā€™m quite certain that it depends on what you program. If youā€™ve ever had the chance to work on microcontrollers, or low-level interfaces to hardware and binary formats, you would not think that way. As a matter of fact, for me, that comment comes off almost rude, though Iā€™m sure thatā€™s not what you intended!

So, let me just put it out there:

  • For system programming dealing with low-level hardware and binary data, the main pain-point in using Rust is the lack of ergonomic bit-level access to binary data.

This is what we are trying to solve here. And that is why some people get excited about the idea of proper bit-field support in Rust.

1 Like

I think you missed the part where the Erlang version is performing a pattern matching operation (see the original page for additional context). How would you unpack the binary data with pattern matching ala:

match dgram {
  DataGramHdr(...) => ...
}

I agree that mixing real integer types with arbitrary bit lengths is a bad idea, but I also find this point very persuasive:

So what about using b instead of i or u?

#[bitpacked(network_order)] 
struct DataGramHdr {
  ip_version: b4, hdr_len: b4,
  srvc_type: b8,
  total_len : b16,
  id : b16,
  flags: b3, frag_offset: b13,
  ttl: b8, proto: b8, 
  hdr_checksum: b16,
  src_ip, dst_ip: b32  
}

My original RFC had something akin to normal match:

match dgram {
  DataGramHdr { hdr_len: hlen, .. } => ...
}

IMO they shouldn't behave differently because they are bitsized, but because they are contained in a #[bitpacked] struct or array. Otherwise they should just be padded and behave like every other type.

That should actually be the case for all types. Also u8 should behave differently in a #[bitpacked] struct, because it might not by aligned to the byte boundaries.

Let's give every type a size in bits. For most types that corresponds to 8 times the size in bytes, but not for all. In normal usage they should be padded to a full byte and behave like current types. This has additional advantages: A #[bitpacked] struct can itself have an irregular bitsize and it can again be packed efficiently.

That way the uN types are not special and restrictions only apply if types are actually used in #[bitpacked] structs or arrays.

6 Likes

@troplin is exactly right. Iā€™d also like to add that our existing integer types can already be somewhat magical as in the case of u64 on x86-32, or float with soft float.

Perhaps at least with nightly, overflower could help you?

Iā€™m bumping this again as Iā€™ve now been waiting for any kind of proper bit support in Rust since 2014 and thereā€™s been next-to-no movement. Iā€™m happy and willing to take this through the prototyping and RFC process, but would likely need a mentor to help me along the way.

Not sure I can mentor, but maybe I can give you a good starting direction.

To begin with, you really need to be clear about what problems youā€™re trying to solve. Some of these proposals have started with ā€œit would be nice if we had thisā€, but they include somewhat contrived examples, wishlist items that depend on other major language features (like type-level integers), or arenā€™t really specific enough about how theyā€™d be used in the real world.

So a good start would be to gather some real-world examples, possibly edited down for clarityā€™s sake, but based on actual code people have written in earnest rather than hypothetical examples. These examples could be Rust code that you find cumbersome, or code in other languages that have some kind of native bitfield support, like C or the bitfield destructuring in Erlang, to show how that native support makes the code a lot more readable.

Once you have some examples and real-world use-cases, it would be good to compare them against existing macro based solutions in Rust, like rust-bitfield, pnet_macros, bitfield-register, or solutions that provide bitflag support like bitflags or enumflags.

If you still find that those donā€™t meet your needs, or arenā€™t quite seamless enough to use and still would make the code cumbersome, then it would be a good idea to draft a possible solution. It doesnā€™t have to be complete, but should show the basics of how it would work, and answer some questions like:

  1. If there are some kind of special narrow types, can they be used outside of a packed struct? Or is it like C, where the type is a regular type, but thereā€™s a bit width specifier for how many bits its packed into in the struct? Either way, how do you handle overflow (if there are types for bitfields, then arithmetic overflow when manipulating them, if itā€™s C like, then when packing values into it)?
  2. Would it help more to have bitfields in structures like in C, treated as members of the structure which just have limited things you can do with them like never taking a reference, or is what is needed just the ability to do Erlang-style convenient destructuring and construction of byte arrays?
  3. Can part of your proposal, or a subset of it, be prototyped using macros, the type system, and operator overloads, even if the syntax isnā€™t quite as convenient?
  4. Could all of it be implemented that way if given some other lower level feature, like extensible destructuring and matching support (for the Erlang-style approach), or maybe std::ops::Assign that would let you have some kind of dummy type that would let you do pixel.red() = pixel.green()/2 + pixel.blue()/2?

Here are a couple of starting ideas, as well as everything else discussed in this thread and previous ones, but the real work will be in both finding some real-world examples to see how these work for real problems, and thinking through how all of the corner cases will work and it will fit in with the rest of Rust, including overflow behavior, endianness, whether this will provide a zero-cost abstraction or whether people interested in efficiency will need to do the bit-twiddling by hand.

A new type of repr:

#[repr(bitpacked(r = "5", g = "6", b = "5"))]
struct RGB565 {
    r: u8, g: u8, b: u8,
}

Some special type that repr(packed) knows can be packed at a sub-byte level:

use std::bitfield::{self, Bitfield, FiveBits, SixBits, Saturate};

#[repr(packed)]
struct RGB565 {
    r: Bitfield<u8, FiveBits, Saturate>,
    g: Bitfield<u8, SixBits, Saturate>,
    b: Bitfield<u8, FiveBits, Saturate>,
}

A macro that could expand into something using std::ops traits, including a new std::ops::Assign (not sure this could work, after testing out @hannobraunā€™s suggestion from earlier, you get an ā€œinvalid expression on left-hand sideā€ for something like color.red() += 1):

#[bitfields(overflow = "saturate")]
struct RGB565 {
    #[bitfield(width = "5")]
    r: u8,
    #[bitfield(width = "6")]
    g: u8,
    #[bitfield(width = "5")]
    b: u8,
}

Erlang style destructuring support (really unsure of the syntax that would fit in Rust, just throwing out something crazy):

let data: [u8; 2] = [0 as 5bits, 63 as 6bits, 31 as 5bits];

match data {
    [r as 5bits, g as 6bits, 0 as 5bits] => ...,
    [r as 5bits, g as 6bits, b as 5bits] => ...,
}
2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.