Reviving the bit-data discussion
Rust can be a great language for systems-level programming, possibly taking roles that C and C++ typically have had. With the emergence of “Internet of Things” (IoT), there are efforts to support microcontrollers such as ATMEGA (Arduino) and ARM-family chips (STM32s, Kinetix, etc.) However, these efforts are somewhat crippled with the rather weak support Rust has regarding exact representation of data, especially when it concerns data at the bit level.
A few years ago, we had a multiple RFCs that tried to improve things, but all of these were closed or postponed. One of the reasons why was that many do not understand the problem space. Many have responded in the lines of “I don’t know why you can’t use macros for this.”, to “I’ve never had the need to do this.” Therefore, to be successful in a future RFC, we need to educate and exemplify the problems faced when dealing with “low-level” details, such as bit-specifications, memory layout, endianness, access rules, and ergonomics in this space.
Exact Representation of Data
There are many examples of the need for an accurate memory representation:
- binary data formats (network and in-memory)
- manipulating hardware registers
- machine-code in a JIT or compiler setting.
- data sent to and from GPUs or other processors
- data sent to and from C- or other languages (FFI)
What is common to all of these is that the exact memory representation has to be observed and that there must be a guarantee that the compiler follows the specification. The compiler must not rearrange data in any way. We already see this implemented in #[repr(C)]
, which we support due to FFI needs.
Example
A relatively common image format used by GPUs is RGB565. Five bits are used for red and blue, while six is used for green. In total, this is 16 bits, so it fits in a u16. To manipulate these values, one has to understand bit-manipulation:
struct RGB565(u16);
impl RGB565 {
fn r(self) -> u8 { (self >> 11) & 0x1f }
fn g(self) -> u8 { (self >> 5) & 0x3f }
fn b(self) -> u8 { self & 0x1f }
fn set_r(&mut self, r: u8) { self &= ~(0x1f << 11); self |= (r & 0x1f) << 11 }
fn set_g(&mut self, g: u8) { self &= ~(0x3f << 5); self |= (g & 0x3f) << 5 }
fn set_b(&mut self, b: u8) { self &= 0x1f; self |= (b & 0x1f) }
}
fn test() {
let c = RGB565(0);
c.set_r(10);
c.set_g(c.g() + 1);
}
Compare this to another format RGB888:
struct RGB888 {
r: u8, g: u8, b: u8
}
fn test() {
let c = RGB888 {r: 10, g: 0, b: 0 };
c.g += 1;
}
Which one is easier to understand and use? Which one is more likely to have errors? I think you can all agree that it would have been much simpler to have something like bit-data:
bitdata RGB565 : u16 {
r: u5, g: u6, b: u5
}
fn test() {
let c = RGB565 {r: 10, g: 0, b: 0 };
c.g += 1;
}
How much of a problem is this?
The short answer is: It depends on the ratio of bit-fiddling code versus non-bit-fiddling code. If most of what program uses i8
, i16
, i32
and i64
level, then you probably won’t care. But if a significant portion of the code needs to use bits, it will be a problem.
In Rust, we can now write for i in 0..n { ... }
. While a devil’s advocate could argue that this nice syntax could be replaced with C’s for (i = 0; i < n; i++) { ... }
syntax, it is quite clear why the former is better: i
is referenced once, resulting in much fewer errors and much more readable code.
I hope the above example can showcase the same issue on bit-data.
Goals
I need to collect other examples. If you know about code that exemplifies bit-data issues, please respond to this thread. Other wants or needs in this area are also very much appreciated.