Loads and stores to/from outside the memory model

In theory, you can use llvm.memcpy.element.unordered.atomic with element_size == 1 to avoid indivisibility, and it could optimize to regular loads/stores.

In practice... it looks like LLVM will currently perform that optimization only if the pointer is known to be aligned, even though it's perfectly valid even if the pointer is unaligned.

And of course, for now Rust doesn't even expose that intrinsic.

Taking a lock in that case is actually necessary, because you're requesting an atomic load/store, and SIMD load/stores are not guaranteed to be atomic at the architecture level. However, in theory, LLVM should be able to create SIMD loads/stores out of either llvm.memcpy.element.unordered.atomic, or a series of regular atomic loads/stores (e.g. loading from a pair of AtomicU64s in sequence, at least on x86 where atomicity is free at a 64-bit granule).

In practice... it doesn't know how to do that. So you're better off with volatile.

I'm not aware of any way that a malicious process could cause volatile loads from a shared memory segment to misbehave on any architecture I've heard of. (Not that it matters, but that includes Itanium. Itanium has an 'uninitialized' bit for registers, but not for memory.)

The following might be more controversial, but since the definition of volatile is basically 'defer to hardware semantics', I believe that it would be illegal for any future version of LLVM to assume that concurrent volatile accesses to the same memory can't happen* – as long as it's targeting current architectures. After all, current architectures have semantics that explicitly allow concurrent loads and stores. Instead, I'd say that concurrent volatile accesses (that are not also atomic) are merely non-portable, since hypothetically there could be some bizarre architecture where they would trap or even cause broader UB.

* Not that there would be much benefit in LLVM making that assumption even if it could, considering that it's very limited in what kinds of optimizations it can perform on volatile accesses. But that's beside the point.

1 Like