Original thread: Bit-wise reasoning for atomic accesses
(Meta: It's annoying that the auto-lock for old threads blocks not only replying, but even "reply as linked topic".)
Somewhat recently, I discovered three things:
I was partly mistaken about what optimizations LLVM could theoretically apply to
I was right that LLVM could legally coalesce multiple loads into one (assuming no potentially-aliasing stores or barriers in between), yet currently does not. IIRC, I looked at
bytesand concluded that that optimization would probably be enough to fix the specific performance regression observed, although I don't remember the details.
However, for the atomic load to be "zero-overhead" compared to a normal load, LLVM should also be able to remove unused loads entirely. And I recently discovered that doing so would be illegal for
Relaxed, thanks to an edge case involving fences.
LLVM has another ordering,
unordered, which is even weaker than
Relaxed(which it calls
monotonic). It was designed for Java and is not currently exposed in Rust or C. Somehow I misread the spec and thought that reads didn't respect happens-before but could return literally any value ever written to the memory address. In fact, they do respect happens-before; makes sense, since even non-atomic loads respect happens-before. Derp. As such, it appears to be suitable for the
unorderedloads, LLVM already performs the kinds of optimizations we want! See appendix.
Anyway, this is really encouraging. Exposing
unordered as a new variant of Rust's
atomic::Ordering seems like an easy step forward that, if I'm right, would let us fix the
bytes UB immediately. It wouldn't solve all our problems in the area of "intentionally racy reads" – in particular, it wouldn't help with non-atomic SIMD loads – but nor does it seem likely to conflict with any solution to that problem.
One potential downside is that adding a new memory ordering could complicate the job of non-LLVM backends. Also, since C doesn't support
unordered, it may be a problem for backends (whether LLVM-based or not) that compile to C – or to C-derived bytecodes, like WebAssembly. However:
unorderedsemantics are strictly weaker than
Relaxed, so you can treat the new ordering the same way as
Relaxedif you don't mind the slight loss of optimization potential.
- It's not like C has some alternative way to get the same effect. The concern is only that Rust would be encouraging programmers to do something that C doesn't support, and Rust already does plenty of that; consider its approach to aliasing. (On the other hand, consider tail calls.)
- The difference between
Relaxedis mainly in high-level optimization semantics. So if you run optimizations before compiling to C/bytecode/etc., the loss will be lower or nonexistent. For C, you might not want to perform optimizations if you want the output to be human-readable, but oh well.
- If you're compiling to C with GNU extensions, you can actually approximate the optimization semantics of an
unorderedload with an appropriately-marked
asmblock, or potentially with other hacks (
Appendix: Testing LLVM optimizations on atomics
USE_UNORDERED = false:
playground::test_remove_unused_load: movl (%rdi), %eax // The load is still here. retq playground::test_coalesce_redundant_loads: movl (%rdi), %ecx // Load #1. movl (%rdi), %eax // Load #2. addl %ecx, %eax // Add the two values. retq
USE_UNORDERED = true:
playground::test_remove_unused_load: retq // Do nothing, just return. playground::test_coalesce_redundant_loads: movl (%rdi), %eax // Only one load. addl %eax, %eax // Add the result to itself. retq
Edit: improved appendix. Rust already supports unordered loads as an unstable intrinsic, no need to hack up LLVM IR.