Add volatile operations to core::arch::x86_64

There are a couple of discussions about how to do MMIO, interprocess-shared memory, etc. efficiently, and with guaranteed semantics without introducing undefined behavior.

The main issue with ptr::read/write_volatile<T> is that their semantics aren’t precisely specified yet. While we should fix that, the semantics do depend on the actual T being read, the architecture being targeted, whether the pointer is aligned or not (right now ptr::read/write_volatile<T> does not support unaligned read or writes, but we could add _unaligned variants to support that).

One of the options that we have available is to add intrinsics with guaranteed semantics for each architecture to core::arch. For example, for x86_64, it could look like this:

/// (Module documentation)
///
/// Semantics of volatile memory operations: volatile memory 
/// operations (reads and writes) must be emitted by the 
/// compiler and cannot be re-ordered across other volatile 
/// operations.
/// 
/// The result of a volatile read is frozen (`freeze` is applied to it). 
/// That is, reading uninitialized memory via a volatile read never 
/// returns uninitialized memory, the result of the read is picked
/// non-deterministically.
///
/// When volatile reads participate in data-races with any write 
/// operation, the result of what is read is picked non-deterministically. 
/// When volatile writes participate in data-races with other 
/// volatile-write operations, the result that is written is picked 
/// non-deterministically. If volatile writes participate in data-races
/// with other write operations, the behavior is undefined.
/// A race betweeen:
///
/// * a volatile atomic read with volatile atomic writes reads the 
///   content of memory either before or after any of the writes - the 
///   read will not observe partial modifications because the reads 
///   and the writes are atomic
/// * a volatile atomic write with other volatile atomic writes results
///   in the content of any of the writes being written to memory - the 
///   memory won't contain partial results of the different writes
///
/// Non-atomic volatile operation perform the reads and writes as a 
/// sequence of smaller volatile atomic reads or writes. Data races 
/// result in partial results being read from or written to memory.

/// Volatile atomic 8-bit load
///
/// 8-bit volaitle atomic load from `x`. 
///
/// If the load introduces a data-race, the result is picked 
/// non-deterministically.
unsafe fn volatile_atomic_load_u8(x: *const u8) -> u8; 

/// Volatile aligned atomic load u16/u32/u64
///
/// 16/32/64-bit volatile atomic load from aligned `x`.
///
/// If the load introduces a data-race, the result is picked 
/// non-deterministically.
///
/// If `x` is not aligned, the behavior is undefined.
unsafe fn volatile_atomic_load_u16(x: *const u16) -> u16;
unsafe fn volatile_atomic_load_u32(x: *const u32) -> u32;
unsafe fn volatile_atomic_load_u64(x: *const u64) -> u64;

/// Volatile unaligned load u16/u32/u64/u128/256/512
///
/// Volatile 16/32/64/128/256/512-bit unaligned load.
/// 
/// This operation is not necessarily a single atomic load. 
/// The memory is read in a data-race free way by performing 
/// either a single volatile atomic load, or multiple smaller volatile 
/// atomic loads in an unspecified order .
///
/// If the load introduces a data-race, the result is picked 
/// non-deterministically.
unsafe fn volatile_load_u16_unaligned(x: *const u16) -> u16;
unsafe fn volatile_load_u32_unaligned(x: *const u32) -> u32;
unsafe fn volatile_load_u64_unaligned(x: *const u64) -> u64;
#[target_feature(enable = "sse")]
unsafe fn volatile_load_u128_unaligned(x: *const u128) -> u128;
#[target_feature(enable = "avx")]
unsafe fn volatile_load_256_unaligned(x: *const [u8; 32]) -> [u8; 32];
#[target_feature(enable = "avx512")]
unsafe fn volatile_load_512_unaligned(x: *const [u8; 64]) -> [u8; 64];

/// Volatile atomic 8-bit write
///
/// 8-bit volaitle atomic write of `x` to `ptr`. 
///
/// If there is a data-race with another volatile atomic write to `ptr`, 
/// the memory written to `ptr` is picked 
/// non-deterministically.
unsafe fn volatile_atomic_write_u8(x: *mut ptr, x: u8); 

/// Volatile aligned atomic write u16/u32/u64
///
/// 16/32/64-bit volatile atomic wrote of `x` to `ptr`.
///
/// If there is a data-race with another volatile atomic write to `ptr`, 
/// the memory written to `ptr` is picked non-deterministically.
///
/// If `x` is not aligned, the behavior is undefined.
unsafe fn volatile_atomic_write_u16(ptr: *mut u16, x: u16);
unsafe fn volatile_atomic_write_u32(ptr: *mut u32, x: u32);
unsafe fn volatile_atomic_write_u64(ptr: *mut u64, x: u64);

/// Volatile unaligned write u16/u32/u64/u128/256/512
///
/// Volatile 16/32/64/128/256/512-bit unaligned write of `x` to `ptr`.
/// 
/// This operation is not necessarily a single atomic write. The 
/// memory is written to `ptr` in a data-race free way by performing
/// either a single volatile atomic write, or multiple 
/// smaller volatile atomic writes in an unspecified order .
///
/// If there is a data-race with another volatile write to `ptr`, 
/// the memory written to `ptr` is picked non-deterministically.
unsafe fn volatile_write_u16_unaligned(ptr: *mut u16, x: u16);
unsafe fn volatile_write_u32_unaligned(ptr: *mut u32, x: u32);
unsafe fn volatile_write_u64_unaligned(ptr: *mut u64, x: u64);
// Optionally:
#[target_feature(enable = "sse")]
unsafe fn volatile_load_128(x: *const [u8; 16]) -> __m128;
#[target_feature(enable = "avx")]
unsafe fn volatile_load_256(x: *const [u8; 32]) -> __m256;
#[target_feature(enable = "avx512")]
unsafe fn volatile_load_512(x: *const [u8; 64]) -> __m512;

Users could then either use these intrinsics directly to get guaranteed semantics, or use them to build more generic abstractions in a library.

For example, one could use these to build a generic read_volatile_atomic<T: SizeBoundt> API in a library that rejects reads not supported by the target, e.g., by only implementing the trait SizeBound for , e.g., u64, on targets that support the operation on that type.

1 Like

Your text is wider than the code block, making the comments in there really hard to read. Could you try to improve the formatting?

the result of what is read or written is indeterminate

“indeterminate” is not a word that we used in Rust so far I think, and given its history of confusion and ambiguity in C, I’d rather avoid it. “uninitialized” is the term I usually use. Or if you want to say it returns an unspecified sequence of bits, the result is “picked non-deterministically”.

When volatile reads and writes participate in data-races with other volatile operations

Your data race semantics only cover volatile-volatile pairs of accesses, why that? For volatile reads, we can say that if this is in a data race with any other write access, the result is NOT UB but instead some non-deterministically chosen but initialized data is returned. And in fact we need this for the use-case that triggered this.

The use of the term “atomic” is a bit confusing since the operation is not atomic in the sense of having an atomic memory ordering. I don’t have a good idea for a better name though.

And finally, note that implementing these intrinsics requires several things that LLVM does not do or at least not commit to:

  • Volatile reads always return frozen data. (This seems to be the case in current LLVM but I have seen no documentation that this is more than an accident.)
  • Volatile writes do not cause UB when being in a race with another write. (I wouldn’t even know how to check if LLVM currently does this the way we want.)

I don’t think we should accept intrinsics that we cannot implement with the desired semantics in a way that is explicitly guaranteed by the backend.

1 Like

Done.

“uninitialized” is the term I usually use. Or if you want to say it returns an unspecified sequence of bits, the result is “picked non-deterministically”.

I’ve changed that to “picked non-deterministically”. but maybe we should give these values a name (non-deterministic value). I suspect we would it at least use it for defining freeze. Are we using these terms in some other parts of the documentation already?

Your data race semantics only cover volatile-volatile pairs of accesses, why that?

Logic bug, fixed. EDIT: with “When volatile reads participate in data-races with any write operation, the result of what is read is picked non-deterministically. When volatile writes participate in data-races with other volatile-write operations, the result that is written is picked non-deterministically. If volatile writes participate in data-races with non-volatile write operations, the behavior is undefined.”

The use of the term “atomic” is a bit confusing since the operation is not atomic in the sense of having an atomic memory ordering.

I agree.

  • Volatile reads always return frozen data. (This seems to be the case in current LLVM but I have seen no documentation that this is more than an accident.)

You are assuming that we would lower these to volatile load/stores in LLVM-IR. While we could do that, we don’t have to. These are architecture specific, so we can always just insert the right machine code using asm!(... : "volatile"). That would have the defined semantics, although it might inhibit more optimizations than necessary. We could lower them to volatile load/store, and then “launder” the undef (e.g. by using freeze if we ever get that, or by using an asm! block just for that.

  • Volatile writes do not cause UB when being in a race with another write. (I wouldn’t even know how to check if LLVM currently does this the way we want.)

The documentation does not mention anywhere that write-write races are undefined behavior either. The only thing that AFAIK is currently documented is a load-write race makes the load return undef, which would be ok for us (see above). FYI I’ve opened an bug in the LLVM documentation about this, but I don’t expect this bug to be resolved any time soon: https://bugs.llvm.org/show_bug.cgi?id=42435

Worst case, we can prevent all re-ordering by using inline assembly as well to implement the writes.

1 Like

Where are you reading that the read is volatile, but the write is not ? AFAICT, the read is indeed volatile, but the write must be volatile as well for that use case to ever work.

EDIT: Expanding on that: if the write isn’t volatile, the thread of execution doing the write has UB, and it doesn’t really matter what the code doing the read does. If the compiler can prove that the write cannot be observed without invoking UB (e.g. due to a data-race), it can optimize the write away. While this misoptimization can’t break havok in the thread of execution doing the read (the compiler can’t optimize the volatile read away), undefined behavior is undefined.

1 Like

I agree finding a less unwieldy name would be good. There’s nothing special about the value. If I tell you that the relevant bitstring is 0b0001111, you cannot tell if it was picked non-deterministically by freeze or whether the user used the literal 15u8 in the source. It’s the process of picking the value that we have to find a name for.

This is in strong contrast to the initial stored in uninitialized memory, 0bUUUUUUUU (8 uninitialized bits). That’s a very special value, the process of picking it was rather mundane (the standard says fixes the initial content of all memory to be that).

This seems like a problem. That means if I do a volatile write and the untrusted thread does a non-atomic non-volatile write, we have UB. Didn’t we want to avoid that?

(Your next post indicates you were confused by what I was trying to say here.)

Fair.

That’s a very good point! They are UB in C++, so I assumed LLVM would copy that, but given that they diverge from C++ for read-write races, who knows that they do for write-write races.

However, to my knowledge, the formalizations of the LLVM memory model make write-write races UB. I will email some people and ask about this.

That would however mean leaving the space where we could have any hope of saying anything formal or definite (with the kinds of specs we currently have).

See above – I meant that our write is volatile, but the one we are racing with is not.

1 Like

I think this should be saying that the memory gets uninitialized. That’s how I read LLVM’s spec when it says

  • Otherwise, if there is no write to the same byte that happens before Rbyte, Rbyte returns undef for that byte.
  • Otherwise, if Rbyte may see exactly one write, Rbyte returns the value written by that write.
  • Otherwise, if R is atomic, and all the writes Rbyte may see are atomic, it chooses one of the values written. See the Atomic Memory Ordering Constraints section for additional constraints on how the choice is made.
  • Otherwise Rbyte returns undef .

Arguably, when “reading from a write-write race”, you see more than one write, and not all of them are atomic, so you end up in the last case. So all reads from a write-write race must return undef, which is achieved by putting undef in memory.

1 Like

It’s the process of picking the value that we have to find a name for.

Makes sense. Might be worth opening an issue in the UCG repo for this.

IIUC the use case correctly, we have a trusted thread of execution that only does volatile reads from shared memory (no writes to it), and untrusted threads of execution that write to that shared memory. So I don’t understand what you mean by “our write is volatile”.

The trusted thread of execution does not invoke UB doing volatile reads, but if an untrusted thread of execution uses a normal write to write to the shared memory, that write is unsynchronized and non-volatile. With what’s proposed here, that’s UB (e.g. as mentioned the compiler can just remove that write, or determine that all execution paths leading to that write are unreachable).

In the use case from @hsivonen, both threads of execution are part of the same program, so the whole program has UB.

To avoid the UB, we would need to define what the behavior of those racy non-volatile unsynchronized writes are. One way to do that would be to say that the write is not required to happen, but if it does, it writes a bit-pattern picked non-deterministically. That allows the compiler to optimize the writes away, but does not make the writes themseles UB, such that the code is still reachable.

That would however mean leaving the space where we could have any hope of saying anything formal or definite (with the kinds of specs we currently have).

The specified semantics would still be that volatile cannot be reordered across volatile. The implementation might be more strict, and never reorder anything across these intrinsics. We could relax that up to the specified semantics in the future.

I think this should be saying that the memory gets uninitialized. That’s how I read LLVM’s spec when it says

[…]

Arguably, when “reading from a write-write race”, you see more than one write, and not all of them are atomic, so you end up in the last case.

What the proposal intended to say (but failed) is that a volatile write racing with another volatile write, what gets written is picked non-deterministically, and that this race is not undefined behavior. Note that per the definition of volatile, LLVM must emit both writes.

Now, if there are reads, then we might have a data-race, and that’s UB. But if the reads are all volatile, the first line of the LLVM spec says:

  • If R is volatile, the result is target-dependent.

The intrinsics being proposed are target specific - we know to which instructions they lower to for this particular target, and we use that to guarantee that the value read is picked non-deterministically without invoking undefined behavior.

For other read operations, the behavior would probably be undefined.

1 Like

Rereading @hsivonen’s post, I think the writes coming from the untrusted thread are being performed by JITted code, which probably provides stronger guarantees about data races than we do.

Since the use case is “mutithreaded Wasm”, it sounds like there would be nothing preventing two Wasm threads from accessing the same memory simultaneously, as opposed to the mentioned scenario of one Wasm thread and one native thread. In that case, if the Wasm code were executed by an interpreter using regular Rust writes/reads, it would already be well into UB territory. Instead, an interpreter would have to use volatile or atomic accesses to be safe, while a JIT would have to ensure that the instructions it generates are safe in the face of data races – or at least that any misbehavior caused by them doesn’t escape the sandbox. (Not sure exactly what the Wasm spec requires here.)

Assuming the Wasm VM they’re using does provide such guarantees (because it would be completely unsafe if not), all we have to worry about is whether the trusted thread causes UB. In other words, the untrusted thread can be treated as if it were part of a different process even if it really isn’t.

Edit: That said, it would be nice to provide guarantees in the “volatile read / nonvolatile write” case even if it isn’t necessary in @hsivonen’s scenario.

Yes, it would be nice to guarantee this. It makes the wasm interpreter/JIT easier to implement and allows it to generate better code, and it is mandatory for other use cases like OS kernels where the adversary can use arbitrary machine code.

Many people are satisfied with the guarantee being de facto provided by all current implementations, but it is unsatisfying to have a memory modeal that does not really support those very important “supervisor code” real-world use cases and to have to defer to the implementation like this.

1 Like

I’m not sure I follow what this has to do with WASM. These are core::arch::x86_64 intrinsics, which do not exist if the target is wasm32 (only core::arch::wasm32 exists there).

If you are modificating a shared buffer across two processes, at the end what matters is the semantics of the instructions being executed by the CPU. Rust provides guarantees based on Rust operations, WASM does the same for WASM. If those guarantees have some requirements, then you have to think how the WASM guarantees satisfy Rust requirements and vice-versa. If a Rust guarantee requires an atomic write from the WASM side, but WASM doesn’t have any instruction with that guarantee, then unless Rust’s defines the behavior for the case in which this does happen, all bets are off.

AFAICT, WASM supports optimizing machine code generators, so if you emit WASM code with two consecutive writes to the same memory, e.g., using LLVM volatile, the machine code generator might optimize one of the writes out. Also, all WASM writes support unaligned memory, and if that faults, the machine code generated will catch the CPU exception, and retry with a non-aligned write (or might just always emit unaligned writes, etc.). So if you want to write to shared memory from WASM, you probably need to use WASM atomics, and hope that two atomic writes to the same memory location don’t get optimized out.

Oh, I see. I thought the trusted thread would also want to write.

If the trusted thread only reads, then we don’t need any new write volatile intrinsics. So we might hold off on finalizing them until we know more about the constraints. Sometimes trusted threads also have to write to memory shared with untrusted code, so that does seem like a use-case to consider eventually.

That part has nothing to do with memory or volatile though. It also affects that thread doing out-of-bounds accesses or so. Whatever solution we have for that needs to apply to all of that, and I see no way in which we could do something specifically with volatile accesses that would help here. It’s just a separate discussion.

And what I am saying is I don’t see how to get those semantics from LLVM. But what we maybe can get is that if two volatile writes race, the memory in that memory becomes uninitialized.

1 Like