It would be nice to know what optimization exactly LLVM is failing to perform. Does LLVM just not coalesce redundant relaxed loads at all, or is there something more subtle going on?
@carllerche do you have a benchmark that shows the 30% perf regression that the comment talks about ?
The comment mentions 10%. It was a long time ago, but I would guess the benches in the repo would show it.
Yep, it doesn't:
use std::sync::atomic::{Ordering::*, AtomicUsize};
pub fn load_twice(x: &AtomicUsize) -> (usize, usize) {
let a = x.load(Relaxed);
let b = x.load(Relaxed);
(a, b)
}
->
playground::load_twice:
movq (%rdi), %rax
movq (%rdi), %rdx
retq
That sucks. It may be possible to partially work around this on rustc's end by marking the load
function with LLVM's readonly
attribute, which is equivalent to __attribute__((pure))
in C. The semantics are that the function can read through its arguments and global state, but must not modify global state and must be deterministic. This means the compiler can fold multiple calls with the same arguments into one if there were no intervening writes. And, based on this C equivalent, LLVM does do so, even if the function is marked always_inline (in which case I feared it would inline the calls before it had a chance to merge them).
In theory we'd like a stricter semantics, where the function can read through its arguments but not read global state. That way, LLVM could theoretically merge reads even if there were intervening writes to other locations, if it could prove the addresses didn't alias. (This would only be valid for relaxed loads.) Unfortunately, the only stricter option is readnone
, aka __attribute__((const))
in C, which requires that the function not read anything from memory, not even from its arguments.
Anyway, as for actually fixing the issue in LLVM, this bug report looks on point:
https://bugs.llvm.org/show_bug.cgi?id=37716
I went ahead and left a comment there.
Can't you get this pattern of writes with a normal atomic?
// mut s_a: AtomicU32; /* initial value should not matter */
// mut s_b: AtomicPtr<AtomicU32> = AtomicPtr::null();
// mut x: u32
// Thread A
// these stores are non-atomic because we have &mut access
s_a = AtomicU32::new(1);
s_a = AtomicU32::new(0);
// but these store are atomic
s_b.store(&mut s_a, Ordering::Release);
s_a.store(0, Ordering::Relaxed);
// Thread B
let mut x = None;
let tmp = s_a.load(Ordering::Acquire);
if tmp != ptr::null() {
x = Some((*tmp).load(Ordering::Relaxed));
}
I'm quite sure that here, x
should either be None
or Some(0)
. If removing the second non-atomic store (as a Thread A modification, ignoring anything thread B does) would be legal, then thread B could see Some(1)
(or worse, Some(uninitialized)
).
Good point! So the optimization can likely not break bytes
. Interesting.
What we actually want for bytes
is some sort of load_freeze
, that returns a non-undef
value. @RalfJung thinks that having per-bit undef
for loads and combining it with LLVMâs freeze
would work.
I've needed that in jemalloc-sys
: liballoc_jemalloc nallocx should be ReadOnly ¡ Issue #46046 ¡ rust-lang/rust ¡ GitHub
So maybe having a #[rustc_readonly]
attribute would be useful.
How hard would it be to make experiments with LLVM the unordered
ordering for this? I am not sure about how this is formally modeled, but it might actually be the "no-undef
-on-read-write-races" that we need.
If it can be used from Rust, probably not terribly hard.
It would help if the Rust compiler had an equivalent to READ_ONCE
and WRITE_ONCE
as defined in the Linux kernel. Those force the compiler to perform a single load or store respectively, while not using a machine atomic. With those, you can safely make assumptions like âif READ_ONCE
races with WRITE_ONCE
or an atomic write, youâll get either the old value or new value, never a mix of bothâ.
@josh The Linux kernel uses volatile accesses for that, and Rust has those.
The problem is that neither LLVM nor C actually specify volatile accesses to have the effect you described when it comes to data races. There is no interaction between the concurrency model and volatile accesses specified.
C doesnât, LLVM doesnât, but in practice every real architecture does, and thus the Linux kernel does. Thatâs the reason those primitives exist.
There has been discussion in the C standards committee of standardizing something similar to the Linux kernelâs memory model.
That just doesn't matter. We are not talking about architecture semantics here, we are talking about programming language semantics. See this example by @alexcrichton for why arguing with architectures just does not make any sense in this discussion.
We need a commitment from LLVM, or else this will be a hack that "happens to work" but remains UB and could stop working any time LLVM adds a new optimization.
Iâm talking about programming language semantics as well. My point is precisely that we need the facilities to implement READ_ONCE
and WRITE_ONCE
primitives we can rely on.
C doesnât specify that interaction with respect to volatile
accesses, but in practice that is the interaction on every existing compiler and target, so the Linux kernel counts on that.
I agree itâd be reasonable for LLVM to guarantee that a volatile read will never have undef
/poison
in it, it will always be frozen. These reads cannot be duplicated, after all.
Has anybody ever talked with the LLVM devs about this? Is that something they would be willing to guarantee?
I just found this, seems very related: there is no such thing as a âbenignâ data race. In particular this has an explicit section on âdisjoint bit manipulationâ (Cc @carllerche)
Note that the example in that section is valid for normal reads/writes, in which case a racing read that could observe the speculative store would be undefined behavior, but not if the read and write are both atomic (even if relaxed), in which case the read is guaranteed to observe some value that was previously written.
Sure, relaxed accesses donât make a data race (just a race condition, and these are allowed).
But this thread is precisely about code using a non-atomic (volatile) racy access.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.