With some help from the UCG group, I've recently been trying to improve the imperfect state of Rust's volatile by providing an alternative to
ptr::[read|write]_volatile that builds on LLVM's atomic volatile loads and store rather than non-atomic ones.
The expected benefits of this alternative are that...
- Data races become unambiguously well-defined behavior, instead of being in this weird situation where LLVM claims that volatile data races are UB but happens to always compile them into something sensible in practice.
- Accidental load/store tearing becomes a thing of the past, as asking for a load/store which is not supported at the target level is now a compile-time error.
- The proposed API makes it hard to accidentally mix volatile and non-volatile accesses to a memory location, which is almost always a mistake in some volatile usage scenarios.
One open question, however, is whether the proposed API can fully replace
ptr::[read|write]_volatile in all circumstances as currently specified, in which case the former API could just be reimplemented in terms of its successor.
What's unclear here is if there is a way to specify atomic volatile accesses so that they work on targets that have a very weak memory model without global cache coherence, where even
Relaxed atomics require special load/store instructions or synchronization, such as GPUs (NVPTX, AMDGPU...) and virtual machines (WASM, JVM...).
I've tried to come up with atomic-but-super-weakly-ordered semantics which are morally equivalent to those of LLVM's
unordered atomics (but without any explicit commitment to translating into them, since we don't want to commit to supporting LLVM's unordered at this point in time), in the hope that this is weak enough for such targets. But I'm not sure if that is enough, and I need input from people familiar with such targets:
- Do these targets guarantee atomicity of native load/store instructions, in the sense that when a load and store race against each other, the load can observe either the former or the new value of the target memory region but nothing else?
- Has there been any study on whether LLVM's unordered atomic ordering specifically (since that's what we'll most likely use for the initial implementation) is implementable without using special loads and store instructions or extra synchronization on those architectures?
- I know that LLVM's
unorderedhas specifically been designed to accomodate the needs of the JVM, so I guess the answer is yes in that case, but I don't know if it is also an appropriate model for other VMs like WASM, or for "normal" GPU loads & stores.
- I know that LLVM's