[Roadmap 2017] Needs of no-std / embedded developers

I’m very new to Rust, but I’ve been spending my evenings with Rust for the past 2 months. My primary motivation for exploring Rust is its potential for programming everthing from micocontrollers to cloud services, and to do so with safety, efficiency, and productivity.

In general, I think the term embedded systems, as it is often thrown around today, can be broken into 2 platforms:

  1. Application Processors
  2. Microcontrollers

Application Processors

Example Products

I find that software development for these systems is only slightly different than that for your typical desktop PC, however…

  • Developing with a Cross-toolchain - Your toolchain typically does not run on the target (though for some targets, it probably could). You typically develop on a much more powerful desktop PC and deploy/debug the target remotely (USB, Serial, or Ethernet)
  • Bare Metal or OS - These products are almost always running an operating system like Linux, Android, Windows CE/Embedded Compact/10 IoT, or a spcialized RTOS (e.g. VxWorks). Though, they can be bare-metal programmed also.
  • 3D Graphics Acceleration - Nowadays, these system often have a built-in GPU, but not always.
  • Single-Purpose Application - It is not uncommon for them to run only 1 application and run it all day every day
  • Energy Efficiency - Energy efficiency is sometimes a concern, sometimes not; it depends on device, application, and environment
  • Boot Time - Very fast boot times can be an excellent selling point for these devices, depending, as always, on the type of application.


Example Products


I primarily work with ARM Cortex-M microcontrollers. Here’s a summary of the different profiles (ordered by least performant to most performant) so you can see their vast range.

  • High energy efficiency, low cost
    • Cortex-M0 - 48MHz, 16KB~256KB ROM, 4KB~32KB RAM
    • Cortex-M0+
    • Cortex-M23
  • Balance between cost, efficiency and performance. May have DSP instruction set.
    • Cortex-M3
    • Cortex-M33
    • Cortex-M4
  • High performance. DSP instruction set. Hardware floating point. 2D Graphics Acceleration
    • Cortex-M4F
    • Cortex-M7 - 216MHz, 512KB~2MB ROM, 256KB~512KB RAM

Software development for these microcontrollers is quite a bit different than that for desktop PCs and the previous mentioned application processors.

  • Bare Metal or OS - In my experience microcontrollers are most often bare-metal programmed, but it is also common to see them employing a specialized RTOS (e.g. FreeRTOS, µC/OS, ChibiOS/RT, NuttX, etc…). These RTOSs are also quite different from those used with the previously mentioned application processors.
  • Connectivity - These products are typically a node in a larger system or serve as a peripheral to a larger system. They often interface to their larger system via USB, WiFi, Bluetooth, Radio, Infrared, RS-232/422/485, CAN, SPI, just to name a few.
  • Developing with a Cross-toolchain - Very unlikely these systems will run your toolchain. You typically develop on a much more powerful desktop PC and deploy/debug the target remotely most often using JTAG or SWD with an in-circuit-emulator/debug probe (J-Link, ST-Link). That being said, I’m sure someone somewhere has embedded a C compiler in their microcontroller already.
  • Binary Size - Binary size is important due to the very limited amount of Flash ROM (can be less than 16KB)
  • Dynamic Memory Allocation - Some applications need dynamic memory allocation, some don’t; it depends on the application.
  • Energy Efficiency - Engery efficieny is sometimes important, sometimes not; it depends on the application. That being said, you will find energy efficiency to be more often a concern with microcontrollers than with the application processors. Software efficiency will play an important roll in energy efficiency.
  • Single-Purpose Application - These products most often execute only a single binary, and it is often required to run all day, everyday. For power efficient systems, the processor spends much of its time in a standby state to conserve energy.
  • Output Console (a.k.a stdout) - Almost all of these systems have some kind of console port used for debugging/monitoring the application. It is most often used for output, but is sometimes used for input also
    • ARM Semihosting - A clever trick using the BKPT instruction to trigger an operation on a host computer. Demonstration in Rust.
    • ARM Instruction Trace Macrocell - ARM proprietary peripheral for monitoring the execution of code. Faster than semihosting and less intrusive to the running application than other alternatives.
    • Serial (UART) - All microcontrollers that I’ve used have always had a UART that can be used for this purpose
    • USB - Nowadays many of these microcontrollers have a built-in USB device peripheral, so it can be programmed for serial communication (CDC device), or as some USB class for this purpose.
  • Floating Point - Some mcirocontrollers have hardware floating point some don’t. Some applications need floating point, some don’t. Some can get by with software floating point, some can’t.
  • Linker Scripts - Developers will almost always need to write a linker script (a.k.a scatter file) to tell the linker how to organize the binary.
  • Memory-mapped IO - Almost all peripherals are controlled via memory-mapped IO. Volatile semantics are important here, and so is some way of modeling bitfields.

My suggestions to those paving the future of Rust

  • No Runtime - Rust’s “no runtime” feature is great in this domain. Please keep it that way. There’s no reason for “Hello World!” to be more than a 100 byte binary, or more than 20 lines of code. Bare-metal development should be pay-as-you-go.
  • No Dependencies - Rust’s libcore “no dependencies” library is also great. Again, please keep it that way. Don’t make developers pay too heavy a penalty for adding a crate to their project, especially when they may only need a small part of it.
  • Inline ASM - Inline ASM is important. It is often needed to take full advantage of the hardware (e.g. Semihosting, synchronization/memory barriers, DSP, SIMD, etc…)
  • Binary Size - Don’t be careless with binary size; someone might want to run your code on a microcontroller with only 16KB of flash memory or even less.
  • Modular Software Components - Keep things modular with as few dependencies as possible. The range of applications and hardware capabilities is vast. Let the developer aggregate only what they need for their application (i.e. pay-as-you-go). Principles of good software engineering like high cohesion, low coupling are especially valuable here.
  • Setting Up a Development Environment - Don’t make setting up the toolchain or adding software components a hassle. rustup and cargo have been a breath of fresh air for me; thank you! Also, the recent ARM Thumb target additions were most welcome; thank you!
  • Memory-mapped IO and Bitfields - Almost all aspects of the hardware are controlled by manipulating bitfields in memory-mapped IO registers. Some kind of bitfield support would be nice, but it needs to be done right for people to want to use it. Bitfields are typically described in the MCU’s datasheet with most-significant bit index and least-significant bit index or width; not as fields relative to each other. Developers will want to easily cross-reference their code with the MCU’s datasheet; don’t make them count bits.
  • CTFE and Metaprogramming - Compile-time execution and metaprogramming can be used very effectively to reduce binary size, increase runtime efficiency, and reduce boiler-plate. Zinc’s ioreg in Rust, Ken Smith’s register access in C++, and my own memory mapped IO experiment in D are good examples of this. It would be nice if we didn’t have to learn another metalanguage or API to do this. D is very nice in this regard (especially with features like CTFE and static-if). There’s no separate language to learn for metaprogramming; it all feels like one cohesive language in D.
  • Hardware Abstraction - Most libraries written for microcontrollers leave some functions unimplemented so they can be ported/adapted to the underlying hardware platform. This may not be the best way to create a hardware abstraction, and I think there are opportunities to do this differently with more modern tools and programming techniques. The software that comes to mind for me here is Anti-Grain Geometry (AGG).

Anti-Grain Geometry is not a solid graphic library and it’s not very easy to use. I consider AGG as a “tool to create other tools”. It means that there’s no “Graphics” object or something like that, instead, AGG consists of a number of loosely coupled algorithms that can be used together or separately. All of them have well defined interfaces and absolute minimum of implicit or explicit dependencies.

  • RAM (The Stack and the Heap) - My latest project has 3 discontiguous RAM regions:
    1. Core-coupled memory (CCRAM) - Super fast SRAM coupled right to the processor core… but small (64KB).
    2. Inernal SRAM - Not as fast as CCRAM, but still fast…(192KB)
    3. External DRAM - Slower than I’d like it to be…(4MB) I use the CCRAM for my stack. I had to write a custom malloc to make use of both the Internal SRAM and External DRAM for my heap so I could control what types are allocated where. I’d like to rethink dynamic memory allocation to support different, discontiguous heaps for allocation based on type, size, etc…
  • Microcontrollers are Different and More Diverse Today - Microcontrollers and their applications are very different today from what they were just 10 years ago. Many microcontrollers today have more resources and power than the PCs I used to use, while some are still very constrained. I caution against making decisions based on tired, old rules of thumb. Dyanamic memory allocation, exceptions, runtime type information, etc… can all be useful or harmful in the this domain; it all depends on the application. Let the developer choose.

My Current Project

I am currently working on a microcontroller-based application. It has two components: an Editor that runs on a host PC, and a microcontroller with a runtime in its firmware. The user creates a project using the Editor, uploads project files and data to the microcontroller, and their project runs on the microcontroller.

The microcontroller firmware/runtime is written in C++. The Editor application is written in C# and C++. The C++ firmware is recompiled for the PC so the user can simulate their project in the editor.

I almost always have to write my own tools that run on the PC to help me develop, test, and make good use of my microcontroller products. I also have to write tools for my customers to make using and interfacing with my microcontroller products more seemless and productive.

I love the power of C++, but it’s too inconvenient to use. I love the convenience of C#, but it’s just not powerful and portable. I’m tired of having to learn different programming languages for different domains. I wish I could program everything from tiny microcontrollers with 4KB of RAM to huge distributed cloud services with one and only one safe, efficient, and productive programming language. Maybe I can with Rust…we’ll see.


@JinShil Thanks for the insightful comment!

The rust-embedded community has been working on several of the points in your post:

Microcontrollers Platforms

As you probably know, Rust has excellent support for ARM Cortex-M microcontrollers but doesn’t fully support other microcontroller architectures yet.

Minimal MSP430 support landed last week but there’s no documentation on how to actually write/flash/debug Rust programs for that arch (yet).

And AVR support is WIP. Upstream LLVM now supports this architecture but we haven’t yet updated our LLVM submodule.

Dynamic Memory Allocation

One of the outcomes of novemb.rs this last weekend was the alloc-cortex-m crate which lets you easily plug in an allocator to access dynamic memory allocation (Box) and standard-ish collections (e.g. Vec).

As you mention dynamic allocation is opt-in. Nothing can implicitly allocate if you don’t opt into an allocator.

Output Console (a.k.a stdout) ARM Instruction Trace Macrocell

The f3 crate makes use of the ITM and provides a family of iprint macros to send formatted messages through the ITM. On the PC side, there is the itmdump tool which receives and parses ITM packets and prints those formatted messages to stdout.

In the future, I’d like to move this functionality into a single crate that works for all Cortex-M microcontrollers (if possible).

Linker Scripts

Writing linker scripts is hard if you have never wrote one before. This Cargo project template has a sensible linker script that Just Works and that most people will only need to tweak two lines of it (the origin and length of Flash memory and RAM) to make it work for their device.

Setting Up a Development Environment

There’s some WIP to embed lld, LLVM’s linker, in rustc. If/when that lands, it would become possible to build and link executables without having to install arm-none-eabi-gcc.

Last week, I learned about cargo-sym which is a Cargo subcommand you can use to inspect and disassemble ELF files and the best part is that it can, in theory, disassemble ELF files of any architecture. It’s in an early stage but I think it could replace all the binutils one has to install to work with different architectures.

Memory-mapped IO and Bitfields

I keep hearing requests for bitfields, mainly I think from C developers, but I, personally, haven’t found use for them, at least in the MMIO space. One of the arguments I see for them is ergonomics as you can write

// register is a 32-bit register
register.bitfield = true;
register.bitfield = false;

to set/reset one bit. But once you have to encode volatile operations and invariants like read-only or write-only access you’ll end with something like this:

let r = register.read(); // volatile read
r.bit1 = true;
r.bit2 = false;
register.write(r); // volatile write

And if you want to encode the idea of reserved bits then the API becomes even more complex and the ergonomics mostly disappear. If you want to preserve all those invariants you’ll probably end up with an API that looks like the one svd2rust generates but that API doesn’t use bitfields at all and works with the features Rust has today.

CTFE and Metaprogramming

The approach we have been taking is to generate a register manipulation API from SVD files, once, into a crate (example, note the documentation) and then just use that crate as a dependency.

In contrast with ioregs, this is slightly faster as code doesn’t has to be re-generated each time the ioregs syntax extension is expanded and also one doesn’t has to write the whole memory map using a syntax extension as SVD files contain all the information one needs to build an API.

Also, plugins like ioregs are prone to breakage due to their reliance on compiler internals. This constant breakage has been one of the reasons the Zinc project met its demise.

Hardware Abstraction

I think the right way to model this is with traits as traits specify interfaces. AFAICS, there are two approaches here:

Each crate defines a trait and presents an API that’s “generic” around that trait. Then the user has to implement that trait for their microcontroller to be able to use that particular crate.

Or we settle on a common interface for peripherals like Serial, I2C, etc. and create a single crate with only traits that specifies such interfaces. Then crates (libraries) can be written against those traits and users only have to implement the traits in that crate for their microcontroller once and have access to a bunch of crates (libraries) on crates.io.

The community is currently discussing the latter approach in this thread.

One important that we are missing is a single point with all this information (how to start, what crates one should use, etc). We already have a website for this: areweembeddedyet; it’s just missing the content :sweat_smile:.

@japaric Thanks! That’s a lot of great information I wasn’t aware of.

Dynamic memory allocation

One of the outcomes of novemb.rs this last weekend was the alloc-cortex-m3 crate which lets you easily plug in an allocator to access dynamic memory allocation (Box) and standard-ish collections (e.g. Vec).

This is good stuff, and I will certainly make use of it, but it doesn’t appear to address the point I was trying to make in my previous post. On the desktop all memory is basically the same, and due to virtual memory/MMU the heap appears as one contiguous block of memory. In embedded systems there can be multiple discontiguous memories, each with different size and performance characteristics … and there’s no MMU.

What I’d like to see is some way to set up multiple heaps. Then based on type, size, or some other attribute, the allocator will choose the appropriate heap to allocate from. Those types that are small and used very often would allocate from the smaller/faster memory. Those that are large and not used so often would allocate from the larger/slower heap. I see from the alloc-cortex-m crate that I should be able make custom __rust_allocate/__rust_deallocate functions that have such logic. I only bring it up to raise awareness that this use case exists and might be useful to keep in mind when designing language and library features.


I keep hearing requests for bitfields, mainly I think from C developers, but I, personally, haven’t found use for them, at least in the MMIO space.

I don’t like C bitfields, and never use them. I’d like to see the ability to do something like this:

    struct Register {
    	bit_field4 : BitField { 31, 16, ro    },   // half-word bit field, half-word aligned
    	bit_field3 : BitField { 15,  8, rw    },   // byte bit field, byte aligned
    	bit_field2 : BitField {  7,  1, rw    },   // 7-bit bit field, not aligned
    	bit_field1 : BitField {      0, rc_w1 }    // Single bit bitfield

Ergonomics and Convenience

  • Easily cross-reference back to the datasheet
  • Easily walk the hierarchy with ‘.’ notation.

UART.register.bitfield1 = 0xFFFF;

  • See the list of bitfields in a given register using Racer/RLS after typing register.
  • See the documention for each bitfield by hovering your mouse over the bitfield in your code editor.
  • Query information about the bitfield at runtime or compile-time
if register.bitfield1.width > 8                  { do_someting() } else { do_something_else() }
if register.bitfield1.most_significant_bit == 3  { do_someting() } else { do_something_else() }
if register.bitfield1.least_significant_bit == 1 { do_someting() } else { do_something_else() }
if register.bitfield1.is_read_only               { do_someting() } else { do_something_else() }

register.bitfield1.reset();  // restores the bitfield's reset value as specified in the data sheet

// many other possibilities here


Elide read-modify-write When Possible

There is also great opportunity for optimization with compile-time features like metaprogramming, static-if, and CTFE.

If the register is byte addressable, the bit field is exactly 8 bits wide, AND the bitfield falls on a byte boundary, compile-time features can generate code to access the bitfield with a single read or single write, avoiding the overhead of read-modify-write of the whole register. This will decrease binary size and increase runtime efficiency. It also has the benefit of making the access to the bitfield atomic.

static if register.is_byte_addressable 
	   && register.bitfied3.width = 8 
	   && (register.bitfield3.least_significant_bit % 8) == 0 {

    // generate optimzied code for this unique case

ARM Cortex-M microcontrollers also have a bit-banding feature that allows one to address individual bits in memory. Compile-time features can create bit-banded optimizations also for the same benefits.

Setting Multiple BitFields with One Write

Consider the following code. If I’m not mistaken, this code will perform as stated in the comments:

let r = register.read();       // read
r.bit_field1 = value1;         // read-modify-write
r.bit_field2 = value2;         // read-modify-write
register.write(r);             // write

I suspect that if the two middle assingment expressions were not volatile, they may be optimized by the compiler, but it’s not obvious, and may not be reliable and portable.

What I’d like to see in Rust is something like this:

register.write(bit_field1: value1, bitfiedl2: value2);
  • Compiler combines bit mask for bit1 and bitmask for bit2 into one 32-bit composite bit mask
  • Compiler combines value1 and value2 into one 32-bit composite value
  • Compiler generates only one read-write-modify using the composite bit mask and composite value

Other Uses

Bitfields are not only useful for memory-mapped IO; they can also be used for parsing binary datagrams from communication channels such as TCP/IP, USB, and UART, or for reading packed data such as RGB565 LCD frame buffers and file allocation tables on mass storage devices.

I also think it would be quite cool to be able to iterate over bitfields.

for (bit_field: BitField) in register {

There’s lots of potential here to make programming with bitfields a joy. With the right language features (static-if, CTFE, metaprogramming, etc…) there’s no need to build this into the language; one could probably implement it as a crate.

Hardware Abstraction

I think the right way to model this is with traits as traits specify interfaces. AFAICS, there are two approaches here:

I think there’s opportunity to push the envelope here, iff Rust has the right features and they are stabilized.

Almost everything about the hardware (core, peripherals, board, memory, etc…) should be known at compile-time, and with the right compile-time features (static-if, CTFE, metaprogramming), there’s no reason why any of this information needs to be sorted out at runtime.

In an attempt to illustrate, imagine the current supply chain for a microcontroller board:

  1. ARM releases a core (e.g. Cortex-M4F) - We know core peripherals and features at compile-time
  2. ST creates a the silicon (e.g STM32F407VG) - We know optional core features, pin count, memory sizes, and ST peripheral features at compile-time
  3. Board manufacturer creates a board - We know our clock crystal, which pins are soldered where, and what features can be used where at, you guessed it, compile-time

With all of this information available at compile-time, we should be able to aggregate the crates we need, fill out all clock, pin, memory, etc… info, and let the compiler generate the code using the compile-time logic supplied by the crate’s authors. Crate authors can add a static assert to check for any misconfigurations and display a friendly message at, once gain, compile-time.

Allow me to illustrate with C++ template-like syntax (I don’t know how else to describe it):

// Crate as created by ARM
// ----------------------------------------------------------------------
class ARMCortexM; 

template <bool FPU, bool DSP, bool MPU>
class ARMCortexM4 : ARMCortexM {
    static if (FPU) {
        // Add FPU features
    static if (DSP) {
        // Add DSP features
    static if (MPU) {
        // Add MPU features

// Crate as created by silicon vendor: ST
// ----------------------------------------------------------------------
template <uint FlashSize, package Package>
class STM32F4F : ARMCortexM4<true, true, false> {
    static if (Package == LQFP100) {
        // Do whatever is required for LQFP100 package
    else {
        static assert(false, "Sorry this model is only available in the LQFP100 package.")

typedef STM32F407VG STM32F4<1024*1024, LQFP100>

// Crate as created by board manufacturer
// ----------------------------------------------------------------------
template <typename TMCU, typename TCrystal, typename TPinLayout>
class Board;

typedef ST_Disco_PinLayout PinLayout<PinA0<Out, PullUp>, PinA1<AF:I2C1, Pullup>, etc...>
typedef ST_Disco_Crystal Crystal<25MHz>
typedef ST_Disco Board<STM32F40VG, ST_Disco_Crystal, ST_Disco_PinLayout>

// Finally the programmer can start writing his/her application code
// ----------------------------------------------------------------------
main() {}
    // After calling MyBoard.init(), all core, MCU, and board peripherals are 
    // properly initialized and we're ready to roll.

No more need for tools like ST’s CubeMX to generate code for you. The necessary compile-time features are already in the Rust language and the authors of the crates (ARM Cortex-M4 crate, STM32F407VG crate, PinLayout crate, etc…) already wrote all the necessary compile-time code-gen logic for you; you just have to snap together the Legos. The resulting code generated by the compiler will be highly specialized and highly optimized to your hardware because the compiler has everything it needs to do so.

I think Zinc had a similar goal to what I’ve envisioned here, and I think it was a pretty good one. Zinc’s demise was not caused by any fundamental flaws in its design, IMO. After reading the post-mortem, I would argue that it was Rust’s difficulty fleshing out its compile-time/metaprogramming story (or lack thereof) that killed Zinc. Zinc, or something like it, may rise from the ashes after Rust eventually works these features out.

Action Items:

So after all this dreaming, what am I actually suggesting for 2017 Roadmap?

  1. Finalize Compile-time/Metaprogramming Features - Figure out Rust’s compile-time and metaprogramming story, make it great, and stabilize it. Consider the bitfield and hardware abstraction ideas I just mentioned while fleshing out features and design.
  2. Create a Rich Bitfield Crate - Leverage those compile-time and metaprogramming features to make a super cool bitField crate, that programmers might actually want to use.
  3. Start Creating the Ecosystem - Leverage those compile-time and metaprogramming features to make hardware abstracted crates analogous to the Core–>Silicon–>Board supply chain, so board designers “solder” their software components together at compile time just like they solder their board components together, resulting in a highly specialized and hightly optimized board support package (BSP) crate.

(Actually, that looks like a multi-year roadmap; perhaps we will see some progress on item 1 in 2017)

1 Like

I’ve written some type-safe bit field code that may suit your needs: https://gist.github.com/hannobraun/ab5804e6b7a54b70997b761a02acbdfd

This is a quick copy-paste from my internal repository. I’m going to properly document and release this soon-ish.

What I’d like to see is some way to set up multiple heaps.

Right now you can only plug in a custom global allocator but the long term plan is to make allocators part of the type signature of collections. With this you’ll be able to use several custom allocators and pick which allocator to use with which collection instance. Something like this:

// in the standard libraries
trait Allocator { .. }
struct Vec<T, A> where A: Allocator { .. }

// in your library crate
// Allocator that uses SRAM
struct SramAllocator { .. }
impl Allocator for SramAllocator { .. }

// repeat for other allocators

// in your application
// allocate in SRAM
let xs: Vec<i32, SramAllocator> = Vec::new();
// allocate in CCRAM
let xs: Vec<i32, CcramAllocator> = Vec::new();
// allocate in external DRAM
let zs: Vec<i32, DramAllocator> = Vec::new();

See RFC #1398 for details. The RFC has been accepted but not yet implemented. IIRC, some details about integration with GCs need to be ironed out before work on the implementation starts.


I’d like to see the ability to do something like this:

You might be able to do that just using macros (if you are a wizard), which are stable, and you certainly can do that using syntax extensions but those are unstable.

I personally prefer generating API from a standard format like SVD files (they are just XML files) as they can be used from other languages and are more amenable to (non-Rust specific) tooling.

Ergonomics and Convenience

Easily cross-reference back to the datasheet

Easily walk the hierarchy with ‘.’ notation.

See the list of bitfields in a given register using Racer/RLS after typing register.

See the documention for each bitfield by hovering your mouse over the bitfield in your code editor.

All this should be doable once RLS is more mature (Or perhaps it already works; I haven’t tried RLS). At least the API that svd2rust generate contains (or could contain) all this information in the form of doc comments.

Query information about the bitfield at runtime or compile-time

LLVM has a “constant folding pass” (I think that was the name) that will evaluate most of the expression you have written (the if expressions) at compile time without me having to complicate my code with a bunch of const fn (functions that are always evaluated at compile time).

I’d suggest looking at the disassembly before “adding CTFE everywhere to make thing faster” because it may not be necessary at all.

There are some places where CTFE would be necessary though. Like deciding the type signature of something:

static FOO: [u8; mem::sizeof::<SomeType>()] = ..;


Elide read-modify-write When Possible

That’s a nice use of CTFE! You certainly won’t be able to do that with Rust’s CTFE right now and I’m not sure if it will support exactly the same features in the future.

I’d personally provide the same functionality, today, with more automatically generated API ;-).

Setting Multiple BitFields with One Write

Consider the following code. If I’m not mistaken, this code will perform as stated in the comments:

Nah, Rust doesn’t follow the C model of “static volatile variables” (static volatile int FOO). Instead, it uses LLVM’s model of “volatile operations”: you have two intrinsics (they are just functions): read_volatile and a write_volatile to perform volatile operations on memory, all other of memory accesses (*foo = 42, foo |= 1) are non-volatile.

So, instead, if you write that code like this:

let mut r = read_volatile(&register);
r.bit_field1 = value1;
r.bit_field2 = value2;
write_volatile(&mut register, r);

You get a single RMW operation and LLVM is able to optimize everything between the read_volatile and the write_volatile calls.

What I’d like to see in Rust is something like this:

register.write(bit_field1: value1, bitfiedl2: value2)

That’s not valid Rust syntax so it won’t work :-). You could certainly write macros to provide that syntax though.

I’d personally avoid macros that make use of Yet Another Syntax that people would need to learn and try to solve things using types and API.

Other Uses

they can also be used for parsing binary datagrams

I can’t really comment on this but I have seen it mentioned before. IIRC, erlang has bitfield pattern matching or something like that.

Hardware Abstraction

OK, this seems pretty different from what I mentioned, which was more about generic programming. What you wrote here is mainly about “configuration”.

I think the goal is great: avoid misconfiguration, prevent misuse at compile time, optimize initialization, etc. But

No more need for tools like ST’s CubeMX to generate code for you.

I don’t see why using code generation to solve problem is a bad or inferior solution. Or, conversely, why solving the problem using metaprogramming is a better solution. I think will have to try both before reaching a conclusion.

And even if metaprogramming has been proven to be the superior solution elsewhere, Rust simply doesn’t have the required features right now. So, I think we should try to tackle this problem using code generation instead of waiting for features that only $DEITY knows how long will take to land in the compiler.

This last part is aimed at this comment of yours:

Start Creating the Ecosystem

Leverage those compile-time and metaprogramming features

I don’t think we should wait for those. “Don’t let perfect be the enemy of the good” as the say.

1 Like

I’ve been spending my time for the past two weeks exploring options for modeling memory-mapped I/O registers and have come away with some perspective on Rust’s features for volatile memory which may be of value to the 2017 Roadmap.

While Rust’s read_volatile/write_volatile are usable for modeling volatile memory, Rust does not provide any volatile guarantees, nor does it have rich enough modeling features to properly encapsulate volatile access in a type.

##What Does Volatile Mean? There seems to be several different interpretations on what volatile means. Java’s definition is different from C/C++'s definition, and LLVM’s and Rust’s implementation implies (I couldn’t find a formal definition) a different definition still. So I’m going to abandon all current definitions and try obtain understanding independently as it relates to bare-metal programming and embedded systems.

There are generally 3 kinds of memory that the typical microcontroller employs today: ROM, RAM, and memory-mapped I/O.

  • ROM is typically Flash memory where the program is stored so its contents are retained between power cycles. For most applications it is treated as read-only memory, only written to when updating the program. The program typically runs directly out of this Flash memory; there is no need to copy it to RAM for execution.
  • RAM is the memory used for the program’s stack, heap, and mutable data. All contents are lost between power cycles.
  • Memory-mapped I/O is the memory used to interact with the hardware peripherals (UART, digital inputs/outputs, analog inputs/outputs, etc…). The fact that this is called memory is actually a misnomer. It’s I/O that just happens to be exposed to the programmer as memory. In some cases it can’t even store state. It is primarily used to control and acquire data from the hardware.

With RAM, the CPU, for all intents and purposes, is the only reader and writer. With memory-mapped I/O the CPU is not the only reader/writer; the memory is shared with the physical world. For example, when an electrical signal tied to one of the MCU’s pins changes, the contents of the memory-mapped I/O corresponding to that pin also changes, regardless of what the CPU is doing. Because this memory’s contents can change outside of the control of the CPU, the compiler should never optimize away or re-order any access to a memory-mapped I/O register.

Examples (warning: phony syntax):

'static mmio: mut u32;

// If the compiler optimized away the second `x = mmio` statement, the 
// program would return the value 0 instead of the correct value 1
fn getValue() -> u32 {
    let mut x = mmio;  // CPU reads value 0
    x = mmio;          // CPU read value 1 because they physical world 
	  	       // changed the value since the previous read
    return x;

// The compiler may think that since it already set mmio to 1, there's
// no reason to read it back into x.  This would be incorrect because the
// physical world may have changed the contents of mmio since the CPU last
// wrote the value 1 to it.
fn getValue() -> u32 {
    mmio = 1;
    x = mmio;   
    return x;

(There are quite a few other scenarios, but I suspect most reading this thread are already well aware of them.)

So, my definition of volatile is “the memory’s contents can change at any time outside of the control or knowledge of the CPU”. Therefore, volatility is a property of the memory itself, not the way it is accessed. In fact read_volatile and write_volatile don’t read or write memory any differently than any other load/store functions; they just ensure the order and inclusion of instructions is as-the-programmer-wrote-it, so the compiler shouldn’t get clever about reordering or optimizing away instructions. One other way to ensure correct treatment of volatile memory is to compile code that accesses it without any optimizations (though that can have other consequences).

Volatile and Memory Safety

Adding volatile accessor functions to the programmer’s toolbox does not provide any guarantee that all accesses to volatile memory are done through those functions; it is up to the programmer to get it right. read_volatile and write_volatile provide no more help for programming volatile memory than malloc and free do for dynamic memory allocation and management.

To quote @briansmith 's comment from a prior newsgroup discussion:

A huge part of the value proposition of Rust is that it uses its type system to prevent common types of bugs in error-prone aspects of programming. It seems to me that a useful constraint is “this memory must only be accessed via volatile loads and stores, not normal loads and stores”; that is, there should be some way to use types to force the exclusive use of volatile reads and writes and/or to disable the * dereferencing operator for volatile objects in favor of explicit volatile_load, volatile_store, nonvolatile_load, and nonvolatile_store methods. Yet, the current volatile load/store proposal does not take advantage of Rust’s type system to help programmers avoid mistakes at all. This doesn’t seem “rustic“ to me.

If we want to include volatile memory safety into Rust’s definition of memory safety, we are are forced to admit that Rust is currently not safe.

Learning From Others (The D Programming Language)

Some time ago, the D programming language went through this same consideration. D Improvement Proposal 62 was probably one of the most well-written proposals the D language ever saw. It is well worth the read for anyone seriously researching this topic. To quote:

Volatility is a property of a memory location. For example, if you have a memory mapped register at 0xABCD which represents the current time as a 32bit uint, then every read from that memory will return a different result. The memory at address 0xABCD is volatile, it can change at any time and therefore does not behave like normal memory. The compiler should not optimize reads from this address. All accesses to 0xABCD can’t be optimized.

If one reads the formal discussion, you’ll find the following comment from Andrei Alexandrescue:

The DIP is correct in mentioning that “volatility” is a property of the memory location so it should apply to the respective variable’s type, as opposed to just letting the variable have a regular type and then unsafely relying on calls to e.g. peek and poke.

D ultimately opted for volatile_load and volatile_store intrinsics not because they disagreed with the DIP 62’s premise, but because of the complexity it introduces into the type system.

Walter Bright:

Volatile has caused a major increase in the complexity of the C++ type system - a complexity far out of proportion to the value it provides. It would have a similar complex and pervasive effect on the D type system.

Andre Alexandrescue:

Clearly an approach that adds a qualifier would have superior abstraction capabilities, but my experience with C++ (both specification and use) has been that the weight to power ratio of volatile is enormous.

The Purist vs The Practitioner

While I have presented a case here arguing that volatility is a characteristic of the memory itself and not its access, I have to acknowledge Walter Bright’s and Andre Alexandrescue’s conclusions that such a specialized feature does not justify the complexity it may introduce into the type system. This may also have been why LLVM chose to implement volatile load and store operations over a memory qualifier.

read_volatile and write_volatile do provide the tools needed to implement some abstraction (e.g. VolatileCell) over volatile memory in the same way shared_ptr, unique_ptr, etc… provide abstractions over the management of dynamically allocated memory in C++. volatile-register, Tock, Zinc, have all provided such implementations, and they do allow programmers to write excellent software for their domains. However, they leak their implementation to users in the form of get(), set() or read(), write() methods. This forces the users to trade familiar idioms like a = a + b for the much less ergonomic a.set(a.get() + b.get()) or set(a, get(a) + get(b)).

D, with its assignment operator overloading and property accessor features, can better hide function call syntax from the user, so it was more practical for D to to adopt volatile accessor functions over a volatile memory qualifier.

##Suggestion for the 2017 Roadmap If the 2017 Roadmap will include any progress in defining Rust’s memory model, please revisit Rust’s definition of volatile and how it can provide guarantees over read_volatile and write_volatile in the same way Rust provides guarantees over malloc, new and free, delete.


I’m not so sure there is anything wrong with the compiler doing these optimizations.

There is no synchronization here so, supposing your CPU memory accesses were infinitely fast like the ideal computer, the last load would give back the same value again because the physical world hasn’t had time to change. The compiler here is just making you computer really fast for a bit, which is generally it’s entire job.

The main thing I think volatile means is that the memory access “may have side effects”. In reality, most memory mapped loads don’t need to be volatile as they don’t have side effects. Only things where reading actually removes the value from some FIFO, or similar, need volatile reads in that regard.

Another thing volatile is used for, with rust at the moment, is when the value of this access is unrelated to another value from an access at the same address. This is needed, for example, when a write to an address writes to register A but a read from that address reads register B. So the compiler won’t make the optimization in the second example, not because the value might change between the write and the read though, but because they are accessing completely different things.

The second use I mention I don’t really think should be handled with volatile, rather it might be handled with address spaces, i.e., tag the pointer used to write A with a different address space with the pointer used to read B even though they point to the same address. This would hopefully allow the compiler to optimize 2 subsequent writes to A into 1 unlike with volatile.

So my suggestion for 2017 road map is to expose LLVM address spaces.

EDIT: Note, I’m not really sure if LLVM address spaces do exactly what I think they do, the LLVM manual says they are target specific. What I really mean is it would be good to have a separate concept for the second case I mention, whether or not LLVM address spaces are the answer, I’m not sure at the moment.

A key property of volatile loads is that they only read the given memory once. This isn’t just because of possible side-effects: with normal (non-atomic) loads, the compiler assumes that data in memory does not change unless explicitly written to. This is not the case for many memory-mapped registers which can have different values from one read to another.

Non-CPU hardware may read/write RAM through DMA, so this is not entirely correct.[quote=“parched, post:32, topic:4096”] In reality, most memory mapped loads don’t need to be volatile as they don’t have side effects. [/quote]

Interrupt acknowledgment on ARM is (traditionally) done with MMIO reads, and it very much has side effects :smile:.

Yes but this just the case of a value changing from outside your thread of execution and you should handle that like normal multi-threaded code, i.e., using an atomic load or the like.

That is another example of why I say most. The general case of reading some control register, counter value or similar , like in JinShil’s example, don’t have side effects.

I would argue that this is (1) not all that different from the situation with other safe abstractions, and (2) not necessarily a bad thing.

I will illustrate the first point with the malloc and free situation you mentioned. There is actually nothing in the Rust language (insofar a language can be separated from its library) that provides a safe abstraction around heap allocation and deallocation. Box used to be a magic built-in thing back in the days of ~T and @T, but that language has little to do with the language of today. Since Rust 1.0, AFAIK the only thing about Box that distinguishes it from an ordinary library type is the ability to move out of *some_box — and that’s just nobody wrote the RFC for a DerefMove trait and implemented it, AFAIK the consensus is that this ability should be available to libraries eventually. The safety of Box arises from more general features of Rust (lifetimes, ownership) that are not specific to heap memory management, much less to Box.

Regarding the second point, while I fully understand how frustrating it is to have to type get and set (or equivalents) everywhere, I believe it has merit (even if it’s not ideal) that volatile accesses don’t look like any other variable reference. Memory accesses in ordinary programming (especially in high-level languages) are a means to an end. As long as the algorithm still works, programmers and compilers gladly combine, duplicate, move, add and remove reads and writes. Access to a volatile location is something else entirely. As ugly as it can be en masse, a method call is already universally recognized as saying: watch out, something non-trivial might be happening here.

That is not to say I’m happy with VolatileCell et al., but I am not sufficently unhappy with it that I’d support complicating the language to give it syntactic sugar.



This forces the users to trade familiar idioms like a = a + b for the much less ergonomic a.set(a.get() + b.get()) or set(a, get(a) + get(b)).

Is arithmetic with MMIO registers that common that we should introduce a new language feature (overloaded assignments) to make this particular pattern easier to use? I, for one, haven’t encountered anything like this in my MMIO related code. (I do believe overloaded assignments would be useful in other contexts like Linear Algebra libraries with the ergonomics and performance of e.g. Eigen)

volatile-register, Tock, Zinc, have all provided such implementations

I think it’s important to note that several implementations of seemingly the “same thing” exist. I’d like us to settle on a single implementation but the problem I see is that we don’t agree on which operations are safe and which are unsafe. I’d like the unsafe code guidelines team to look at this issue as a goal for 2017 (much earlier if possible). But the embedded community needs to write something like an RFC with all the involved details (volatile operations, read/write “permissions”, etc.) first so the “unsafe team” can comment on it understanding all the constraints first.


The problem with the current Rust is that it is easier and prettier to access volatile memory the wrong way than it is to access it the right way. I don’t think we need to necessarily make accessing it the wrong way easier + prettier, but we do need to make accessing it the wrong way harder and uglier.


Syntactic sugar is one possible approach, but it is also the least attractive, and ultimately inadequate if volatile memory safety is a priority.

What would be ideal is some way mark memory as volatile and have the compiler add the read_volatile and write_volatile accessors for you, removing any possibility of human error. That can take the form of a #[volatile] attribute, a volatile keyword modifier, a compiler intrinsic, or perhaps some other that minds greater than mine can think of.

@parched alluded to the fact that LLVM’s address space feature may have a way of telling the optimizer “this is volatile memory, don’t reorder or optimize access to it”. I’d be very interested in knowing from one of the LLVM developers if LLVM address spaces provides such a thing, and if the backend developers plan on implementing it.

If any feature, even mildly similar to those I just mentioned existed, syntactic sugar wouldn’t even be a consideration. All familiar Rust idioms would apply equally well to volatile memory as they do to normal memory. Rust code that makes use of volatile memory wouldn’t look any different than Rust code that doesn’t (aside from the volatile marker at the declaration), and it would be guaranteed at compile time to do the right thing further raising Rust’s pillar of memory safety.


Access to volatile memory is trivial, and doesn’t need any special consideration by the programmer. It needs special consideration by the compiler.

Atomic loads permit more optimizations than volatile loads. For example, an atomic 32-bit load followed by a mask with 0xff can be transformed into an atomic 8-bit load, or two atomic 32-bit loads to adjacent addresses can be transformed into an atomic 64-bit load (alignment permitting). Both of these are likely to crash or produce incorrect results for hardware registers. Also, atomic stores permit all sorts of silly optimizations, and if you’re using a volatile wrapper for stores, you may as well use it for loads too. Even without optimizations, atomic accesses tend to expand to different instruction sequences than volatile accesses, which is usually harmless (albeit wasteful) if used on hardware registers, but not always: for example, on PowerPC, running a load-acquire instruction on an address within a direct-store segment causes a fault.

Access to volatile memory is trivial

I disagree. Accessing MMIO registers can cause side effects. And I don’t think volatile operations should be “transparent” and look like normal memory accesses. For example:

REGISTER |= 1 << 1;
REGISTER |= 1 << 2;


let mut t = REGISTER;
t |= 1;
t |= 1 << 1;
t |= 1 << 2;

Are these equivalent? You can’t tell with just looking at this code! If REGISTER is a “volatile chunk of memory” (as per your model) then the first snippet does 3 RMW operations and the latter only does one; more importantly these snippets will likely have different semantics because writing to REGISTER could cause a side effect (e.g. turning one/some of N LEDs).

But if instead you saw this:

let mut t = REGISTER.read();
t |= 1;
t |= 1 << 1;
t |= 1 << 2;

You’d wonder “Why isn’t this using direct assignments?”. Then you’d go to the documentation and learn about volatile operations, MMIO and the side effects they could cause. That’s how good Rust code should be: beginner friendly.

Overall, :-1: from me to making volatile operations transparent. That hides side effects which are important to note when reading code. Making code shorter to write at the expense of readability goes against unwritten rules of writing good Rust code.


A load that is merely atomic won’t actually get to main memory (in this case, your memory mapped registers) until it’s bumped out of the level 3 cache. Clearly, you don’t want to delay turning an LED on like that.

I think that expecting OOM to panic is okay IFF libcore includes an unwinding library that:

  • has ZERO OS dependencies and can be used in kernel mode
  • does not perform any allocations itself
  • is very small

What would be ideal is some way mark memory as volatile

Also, this already exists today in the form of VolatileCell<T> (or its variants); all read/writes through this wrapper will lower to read_volatile/write_volatile operations.

If the argument is “library authors may forget to expose chunks of memory as VolatileCell<T> and instead expose them as plain static variables” then you can replace VolatileCell<T> with #[volatile] and the problem will persist. (I think the solution to this problem is encouraging the use of code generators like svd2rust that provide an API that uses volatile operations under the hood)

Ok @japaric, you got me. I have to agree. Touchez!