Original thread: Bit-wise reasoning for atomic accesses
(Meta: It's annoying that the auto-lock for old threads blocks not only replying, but even "reply as linked topic".)
Somewhat recently, I discovered three things:
-
I was partly mistaken about what optimizations LLVM could theoretically apply to
Relaxed
atomic loads.I was right that LLVM could legally coalesce multiple loads into one (assuming no potentially-aliasing stores or barriers in between), yet currently does not. IIRC, I looked at
bytes
and concluded that that optimization would probably be enough to fix the specific performance regression observed, although I don't remember the details.However, for the atomic load to be "zero-overhead" compared to a normal load, LLVM should also be able to remove unused loads entirely. And I recently discovered that doing so would be illegal for
Relaxed
, thanks to an edge case involving fences. -
LLVM has another ordering,
unordered
, which is even weaker thanRelaxed
(which it callsmonotonic
). It was designed for Java and is not currently exposed in Rust or C. Somehow I misread the spec and thought that reads didn't respect happens-before but could return literally any value ever written to the memory address. In fact, they do respect happens-before; makes sense, since even non-atomic loads respect happens-before. Derp. As such, it appears to be suitable for thebytes
use case. -
For
unordered
loads, LLVM already performs the kinds of optimizations we want! See appendix.
In fact, @RalfJung suggested unordered
in the original thread, but I didn't test it then because of 1 and 2 above.
Anyway, this is really encouraging. Exposing unordered
as a new variant of Rust's atomic::Ordering
seems like an easy step forward that, if I'm right, would let us fix the bytes
UB immediately. It wouldn't solve all our problems in the area of "intentionally racy reads" – in particular, it wouldn't help with non-atomic SIMD loads – but nor does it seem likely to conflict with any solution to that problem.
One potential downside is that adding a new memory ordering could complicate the job of non-LLVM backends. Also, since C doesn't support unordered
, it may be a problem for backends (whether LLVM-based or not) that compile to C – or to C-derived bytecodes, like WebAssembly. However:
-
unordered
semantics are strictly weaker thanRelaxed
, so you can treat the new ordering the same way asRelaxed
if you don't mind the slight loss of optimization potential. - It's not like C has some alternative way to get the same effect. The concern is only that Rust would be encouraging programmers to do something that C doesn't support, and Rust already does plenty of that; consider its approach to aliasing. (On the other hand, consider tail calls.)
- The difference between
unordered
andRelaxed
is mainly in high-level optimization semantics. So if you run optimizations before compiling to C/bytecode/etc., the loss will be lower or nonexistent. For C, you might not want to perform optimizations if you want the output to be human-readable, but oh well. - If you're compiling to C with GNU extensions, you can actually approximate the optimization semantics of an
unordered
load with an appropriately-markedasm
block, or potentially with other hacks (__attribute__((pure))
).
Thoughts?
Appendix: Testing LLVM optimizations on atomics
With USE_UNORDERED = false
:
playground::test_remove_unused_load:
movl (%rdi), %eax // The load is still here.
retq
playground::test_coalesce_redundant_loads:
movl (%rdi), %ecx // Load #1.
movl (%rdi), %eax // Load #2.
addl %ecx, %eax // Add the two values.
retq
With USE_UNORDERED = true
:
playground::test_remove_unused_load:
retq // Do nothing, just return.
playground::test_coalesce_redundant_loads:
movl (%rdi), %eax // Only one load.
addl %eax, %eax // Add the result to itself.
retq
Edit: improved appendix. Rust already supports unordered loads as an unstable intrinsic, no need to hack up LLVM IR.