@japaric Thanks! That’s a lot of great information I wasn’t aware of.
Dynamic memory allocation
One of the outcomes of novemb.rs this last weekend was the alloc-cortex-m3 crate which lets you easily plug in an allocator to access dynamic memory allocation (Box) and standard-ish collections (e.g. Vec).
This is good stuff, and I will certainly make use of it, but it doesn’t appear to address the point I was trying to make in my previous post. On the desktop all memory is basically the same, and due to virtual memory/MMU the heap appears as one contiguous block of memory. In embedded systems there can be multiple discontiguous memories, each with different size and performance characteristics … and there’s no MMU.
What I’d like to see is some way to set up multiple heaps. Then based on type, size, or some other attribute, the allocator will choose the appropriate heap to allocate from. Those types that are small and used very often would allocate from the smaller/faster memory. Those that are large and not used so often would allocate from the larger/slower heap. I see from the alloc-cortex-m crate that I should be able make custom __rust_allocate/__rust_deallocate functions that have such logic. I only bring it up to raise awareness that this use case exists and might be useful to keep in mind when designing language and library features.
BitFields
I keep hearing requests for bitfields, mainly I think from C developers, but I, personally, haven’t found use for them, at least in the MMIO space.
I don’t like C bitfields, and never use them. I’d like to see the ability to do something like this:
struct Register {
bit_field4 : BitField { 31, 16, ro }, // half-word bit field, half-word aligned
bit_field3 : BitField { 15, 8, rw }, // byte bit field, byte aligned
bit_field2 : BitField { 7, 1, rw }, // 7-bit bit field, not aligned
bit_field1 : BitField { 0, rc_w1 } // Single bit bitfield
}
Ergonomics and Convenience
- Easily cross-reference back to the datasheet
- Easily walk the hierarchy with ‘.’ notation.
UART.register.bitfield1 = 0xFFFF;
- See the list of bitfields in a given register using Racer/RLS after typing
register.
- See the documention for each bitfield by hovering your mouse over the bitfield in your code editor.
- Query information about the bitfield at runtime or compile-time
if register.bitfield1.width > 8 { do_someting() } else { do_something_else() }
if register.bitfield1.most_significant_bit == 3 { do_someting() } else { do_something_else() }
if register.bitfield1.least_significant_bit == 1 { do_someting() } else { do_something_else() }
if register.bitfield1.is_read_only { do_someting() } else { do_something_else() }
register.bitfield1.reset(); // restores the bitfield's reset value as specified in the data sheet
// many other possibilities here
Optimization
Elide read-modify-write When Possible
There is also great opportunity for optimization with compile-time features like metaprogramming, static-if, and CTFE.
If the register is byte addressable, the bit field is exactly 8 bits wide, AND the bitfield falls on a byte boundary, compile-time features can generate code to access the bitfield with a single read or single write, avoiding the overhead of read-modify-write of the whole register. This will decrease binary size and increase runtime efficiency. It also has the benefit of making the access to the bitfield atomic.
static if register.is_byte_addressable
&& register.bitfied3.width = 8
&& (register.bitfield3.least_significant_bit % 8) == 0 {
// generate optimzied code for this unique case
}
ARM Cortex-M microcontrollers also have a bit-banding feature that allows one to address individual bits in memory. Compile-time features can create bit-banded optimizations also for the same benefits.
Setting Multiple BitFields with One Write
Consider the following code. If I’m not mistaken, this code will perform as stated in the comments:
let r = register.read(); // read
r.bit_field1 = value1; // read-modify-write
r.bit_field2 = value2; // read-modify-write
register.write(r); // write
I suspect that if the two middle assingment expressions were not volatile, they may be optimized by the compiler, but it’s not obvious, and may not be reliable and portable.
What I’d like to see in Rust is something like this:
register.write(bit_field1: value1, bitfiedl2: value2);
- Compiler combines bit mask for bit1 and bitmask for bit2 into one 32-bit composite bit mask
- Compiler combines value1 and value2 into one 32-bit composite value
- Compiler generates only one read-write-modify using the composite bit mask and composite value
Other Uses
Bitfields are not only useful for memory-mapped IO; they can also be used for parsing binary datagrams from communication channels such as TCP/IP, USB, and UART, or for reading packed data such as RGB565 LCD frame buffers and file allocation tables on mass storage devices.
I also think it would be quite cool to be able to iterate over bitfields.
for (bit_field: BitField) in register {
do_something(bit_field);
}
There’s lots of potential here to make programming with bitfields a joy. With the right language features (static-if, CTFE, metaprogramming, etc…) there’s no need to build this into the language; one could probably implement it as a crate.
Hardware Abstraction
I think the right way to model this is with traits as traits specify interfaces. AFAICS, there are two approaches here:
I think there’s opportunity to push the envelope here, iff Rust has the right features and they are stabilized.
Almost everything about the hardware (core, peripherals, board, memory, etc…) should be known at compile-time, and with the right compile-time features (static-if, CTFE, metaprogramming), there’s no reason why any of this information needs to be sorted out at runtime.
In an attempt to illustrate, imagine the current supply chain for a microcontroller board:
- ARM releases a core (e.g. Cortex-M4F) - We know core peripherals and features at compile-time
- ST creates a the silicon (e.g STM32F407VG) - We know optional core features, pin count, memory sizes, and ST peripheral features at compile-time
- Board manufacturer creates a board - We know our clock crystal, which pins are soldered where, and what features can be used where at, you guessed it, compile-time
With all of this information available at compile-time, we should be able to aggregate the crates we need, fill out all clock, pin, memory, etc… info, and let the compiler generate the code using the compile-time logic supplied by the crate’s authors. Crate authors can add a static assert to check for any misconfigurations and display a friendly message at, once gain, compile-time.
Allow me to illustrate with C++ template-like syntax (I don’t know how else to describe it):
// Crate as created by ARM
// ----------------------------------------------------------------------
class ARMCortexM;
template <bool FPU, bool DSP, bool MPU>
class ARMCortexM4 : ARMCortexM {
static if (FPU) {
// Add FPU features
}
static if (DSP) {
// Add DSP features
}
static if (MPU) {
// Add MPU features
}
}
// Crate as created by silicon vendor: ST
// ----------------------------------------------------------------------
template <uint FlashSize, package Package>
class STM32F4F : ARMCortexM4<true, true, false> {
static if (Package == LQFP100) {
// Do whatever is required for LQFP100 package
}
else {
static assert(false, "Sorry this model is only available in the LQFP100 package.")
}
}
typedef STM32F407VG STM32F4<1024*1024, LQFP100>
// Crate as created by board manufacturer
// ----------------------------------------------------------------------
template <typename TMCU, typename TCrystal, typename TPinLayout>
class Board;
typedef ST_Disco_PinLayout PinLayout<PinA0<Out, PullUp>, PinA1<AF:I2C1, Pullup>, etc...>
typedef ST_Disco_Crystal Crystal<25MHz>
typedef ST_Disco Board<STM32F40VG, ST_Disco_Crystal, ST_Disco_PinLayout>
// Finally the programmer can start writing his/her application code
// ----------------------------------------------------------------------
main() {}
ST_Disco.init();
// After calling MyBoard.init(), all core, MCU, and board peripherals are
// properly initialized and we're ready to roll.
}
No more need for tools like ST’s CubeMX to generate code for you. The necessary compile-time features are already in the Rust language and the authors of the crates (ARM Cortex-M4 crate, STM32F407VG crate, PinLayout crate, etc…) already wrote all the necessary compile-time code-gen logic for you; you just have to snap together the Legos. The resulting code generated by the compiler will be highly specialized and highly optimized to your hardware because the compiler has everything it needs to do so.
I think Zinc had a similar goal to what I’ve envisioned here, and I think it was a pretty good one. Zinc’s demise was not caused by any fundamental flaws in its design, IMO. After reading the post-mortem, I would argue that it was Rust’s difficulty fleshing out its compile-time/metaprogramming story (or lack thereof) that killed Zinc. Zinc, or something like it, may rise from the ashes after Rust eventually works these features out.
Action Items:
So after all this dreaming, what am I actually suggesting for 2017 Roadmap?
-
Finalize Compile-time/Metaprogramming Features - Figure out Rust’s compile-time and metaprogramming story, make it great, and stabilize it. Consider the bitfield and hardware abstraction ideas I just mentioned while fleshing out features and design.
-
Create a Rich Bitfield Crate - Leverage those compile-time and metaprogramming features to make a super cool bitField crate, that programmers might actually want to use.
-
Start Creating the Ecosystem - Leverage those compile-time and metaprogramming features to make hardware abstracted crates analogous to the Core–>Silicon–>Board supply chain, so board designers “solder” their software components together at compile time just like they solder their board components together, resulting in a highly specialized and hightly optimized board support package (BSP) crate.
(Actually, that looks like a multi-year roadmap; perhaps we will see some progress on item 1 in 2017)