What are the semantics of `write_volatile` with compound types?

The documentation for write_volatile says:

Volatile operations are intended to act on I/O memory, and are guaranteed to not be elided or reordered by the compiler across other volatile operations.

I'm also assuming it's also not allowed to duplicate operations, or generate spurious stores.

For primitive types, I assume it will use a store instruction which matches the size of the type where possible, which will result in a correspondingly sized bus operation. (There's still a question of how primitive types that are too large for the hardware are handled, like u64 on a 32 bit CPU - but I'll leave that for now.)

(Other questions which I'm leaving aside for now: what is "I/O memory" precisely, whether write_volatile makes any guarantees on visible store ordering, or specifically sized bus transactions.)

But what are the semantics for compound types?

Structs

If T is a struct, will it treat this as a single monolithic object with a given size, and generate as few store operations as possible, or will it write each separate field as a distinct volatile store operation?

In the first case, what happens if the structure has padding? Will it also generate stores for the padding fields? Typically these are considered uninitialized, and accessing their memory explicitly is UB, but the implementation can choose to read/write them if its more convenient to do so. But if it's a write_volatile, then generating store instructions which cover uninitialized padding will effectively be generating spurious stores which don't correspond to any Rust-level object, possibly resulting in unintended behaviour.

This can be resolved by treating each field separately, so that they're individually stored in a volatile way. But this raises the question of order since "volatile operations are [...] guaranteed to not be [...] reordered by the compiler across other volatile operations". So what's the canonical order of the fields? Is it the textual order in the source, or the layout the compiler chooses for the structure? (Any case where the precise layout matters will use #[repr(C)], so maybe this is moot, as a matter of specification it would be good to clarify.)

Or is it simply that if you do a write_volatile on a structure then you're giving up all this fine-grained control, and you should do your own per-field volatile stores if it really matters to you?

Arrays

Likewise with arrays, if T is an array or slice, will it attempt to write the whole thing as a single object, or each element individually? The same questions about padding apply to gaps between elements. And if each element is treated as a separate store, what order are they stored in? Is it guaranteed to be low index to high?

Or, like structures, if you're doing volatile writes on a whole array then you're giving up control, and you should write each element individually if you care? (And if its an array of structures, each field in each element.)

(Others)

I'm not as concerned about tuples/enums/unions since they're much less likely to be used in a way where the exact volatile semantics matter, but it would be nice to resolve all these questions for them as well for completeness.

4 Likes

Short answer: don't use write_volatile with compound types if you can avoid it.

Long answer: The exact semantics are not defined, and it'd be a significant effort to ever stabilize. It uses an LLVM store instruction with the volatile flag, which does not specify how the actual write occurs. This is all it says about a store volatile:

the optimizer is not allowed to modify the number or order of execution of this store with other volatile operations

So, the "number of stores" for aggregates is unspecified, but it's not allowed to make optimizations regarding the number or order. I presume this means that if a write would, unoptimized, take 4 stores, it will never optimize it to fewer. The actual output is pretty bad for structs even when aligned. The only way to avoid unnecessary code size cost is to use #[repr(transparent)] to ensure an identical ABI when you only have a single field.

3 Likes

So, the "number of stores" for aggregates is unspecified, but it's not allowed to make optimizations regarding the number or order. I presume this means that if a write would, unoptimized, take 4 stores, it will never optimize it to fewer.

I think your godbolt examples disprove both of these points. For the unaligned [u8; 4] structure, it emits single byte load/stores, but in the order 0, 3, 2, 1.

For the aligned structure with [u8; 4] it uses a single word load/store.

The example that we ran into that prompted this post was a struct (with just a bunch of normal small scalar fields) write to IO memory that rustc/llvm turned into vector stores, which made things very confused.

I don't think so, it just highlights how few guarantees there are about memory setting of aggregates. store volatile only forbids reordering with other volatile store/loads, and doesn't specify how the store of the aggregate itself is affected. Definitely take a look at the LLVM IR here. The unaligned [u8; 4] does a store i32 with align 1, the aligned [u8; 4] is a store i32 with align 4, and the struct of 4 u8 emits 4 separate store volatile i8.

Yes, since it's aligned, it can always use a single store. This isn't very scientific, but you could describe this as the way to store an aligned [u8; 4] and not a matter of optimization.

To note, LLVM is describing in terms of the LLVM IR data model. Rust's data model doesn't have a stably defined lowering to the LLVM IR data model, and Rust [u8; 4] could lower to LLVM <i8 x 4> or i32 where Rust (u8, u8, u8, u8) lowers to LLVM { u8, u8, u8, u8 }, with only the latter being an aggregate type in LLVM parlance, IIRC.

I didn't check any actual details, but the details don't particularly matter since they aren't guaranteed. The informal expectation is that write_volatile::<T> will correspond to a specific write sequence for each T, using the "obvious" one for primitive T, but unspecified for other T (that aren't repr-transparent wrappers of a primitive).

3 Likes

Is there any chance of a guarantee that the write sequence for a given write_volatile::<T> is consistent within a single compiled binary?

That is, given the following code, what guarantees can I lean on?

fn example(target: *mut [u8; 4]) {
    let block: [u8; 4] = [1, 2, 3, 4];
    // Write 1
    unsafe { target.write_volatile(block) };

    let block: [u8; 4] = [5, 6, 7, 8];
    // Write 2
    unsafe { target.write_volatile(block) };
}

Is it true that if "Write 1" writes in order 1, 3, 2, 4, then "Write 2" will write in order 5, 7, 6, 8? Or could the two writes of the same type T to the same target happen in different orders?

If you want 4 in-order u8 writes then do a loop of 4 write_volatile::<u8>(). If you want it to be a single instruction then do the appropriate endianness conversion and do a write_volatile::<u32>().

Volatile is currently so underspecified that you should make your intent very explicit.

Also note that volatiles are not atomics.

1 Like

And what if I don't care whether it's 4 in-order u8 writes, or a single instruction, I just care that it's consistently one or the other and that write_volatile doesn't choose between the available options. Is there any form of guarantee that write_volatile will be the same writes for a given target address and type, or do I have to be fully explicit about the primitive types I'm using in order to guarantee a consistent write order?

Neither Rust nor LLVM documents any such guarantee, so regardless of if Rust works like that today, this is not something you could depend on for future compiles. I strongly recommend you write an e.g. write_to_volatile method for your structs that implements specific write semantics.

1 Like