Bit-wise reasoning for atomic accesses

comex · November 20, 2018, 2:48am

It would be nice to know what optimization exactly LLVM is failing to perform. Does LLVM just not coalesce redundant relaxed loads at all, or is there something more subtle going on?

gnzlbg · November 20, 2018, 8:14am

@carllerche do you have a benchmark that shows the 30% perf regression that the comment talks about ?

carllerche · November 20, 2018, 3:22pm

The comment mentions 10%. It was a long time ago, but I would guess the benches in the repo would show it.

comex · November 21, 2018, 12:28am

Yep, it doesn't:

use std::sync::atomic::{Ordering::*, AtomicUsize};
pub fn load_twice(x: &AtomicUsize) -> (usize, usize) {
    let a = x.load(Relaxed);
    let b = x.load(Relaxed);
    (a, b)
}

->

playground::load_twice:
	movq	(%rdi), %rax
	movq	(%rdi), %rdx
	retq

That sucks. It may be possible to partially work around this on rustc's end by marking the load function with LLVM's readonly attribute, which is equivalent to __attribute__((pure)) in C. The semantics are that the function can read through its arguments and global state, but must not modify global state and must be deterministic. This means the compiler can fold multiple calls with the same arguments into one if there were no intervening writes. And, based on this C equivalent, LLVM does do so, even if the function is marked always_inline (in which case I feared it would inline the calls before it had a chance to merge them).

In theory we'd like a stricter semantics, where the function can read through its arguments but not read global state. That way, LLVM could theoretically merge reads even if there were intervening writes to other locations, if it could prove the addresses didn't alias. (This would only be valid for relaxed loads.) Unfortunately, the only stricter option is readnone, aka __attribute__((const)) in C, which requires that the function not read anything from memory, not even from its arguments.

Anyway, as for actually fixing the issue in LLVM, this bug report looks on point:

https://bugs.llvm.org/show_bug.cgi?id=37716

I went ahead and left a comment there.

arielb1 · November 26, 2018, 9:51am

Can't you get this pattern of writes with a normal atomic?

// mut s_a: AtomicU32; /* initial value should not matter */
// mut s_b: AtomicPtr<AtomicU32> = AtomicPtr::null();

// mut x: u32

// Thread A

// these stores are non-atomic because we have &mut access
s_a = AtomicU32::new(1);
s_a = AtomicU32::new(0);
// but these store are atomic
s_b.store(&mut s_a, Ordering::Release);
s_a.store(0, Ordering::Relaxed);


// Thread B
let mut x = None;
let tmp = s_a.load(Ordering::Acquire);
if tmp != ptr::null() {
    x = Some((*tmp).load(Ordering::Relaxed));
}

I'm quite sure that here, x should either be None or Some(0). If removing the second non-atomic store (as a Thread A modification, ignoring anything thread B does) would be legal, then thread B could see Some(1) (or worse, Some(uninitialized)).

RalfJung · November 26, 2018, 11:30am

Good point! So the optimization can likely not break bytes. Interesting.

arielb1 · November 26, 2018, 11:45am

What we actually want for bytes is some sort of load_freeze, that returns a non-undef value. @RalfJung thinks that having per-bit undef for loads and combining it with LLVM’s freeze would work.

gnzlbg · November 27, 2018, 6:19pm

I've needed that in jemalloc-sys: liballoc_jemalloc nallocx should be ReadOnly · Issue #46046 · rust-lang/rust · GitHub

So maybe having a #[rustc_readonly] attribute would be useful.

RalfJung · December 13, 2018, 5:01pm

How hard would it be to make experiments with LLVM the unordered ordering for this? I am not sure about how this is formally modeled, but it might actually be the "no-undef-on-read-write-races" that we need.

carllerche · December 13, 2018, 7:12pm

If it can be used from Rust, probably not terribly hard.

josh · December 14, 2018, 7:57pm

It would help if the Rust compiler had an equivalent to READ_ONCE and WRITE_ONCE as defined in the Linux kernel. Those force the compiler to perform a single load or store respectively, while not using a machine atomic. With those, you can safely make assumptions like “if READ_ONCE races with WRITE_ONCE or an atomic write, you’ll get either the old value or new value, never a mix of both”.

RalfJung · December 15, 2018, 9:57am

@josh The Linux kernel uses volatile accesses for that, and Rust has those.

The problem is that neither LLVM nor C actually specify volatile accesses to have the effect you described when it comes to data races. There is no interaction between the concurrency model and volatile accesses specified.

josh · December 15, 2018, 10:04am

C doesn’t, LLVM doesn’t, but in practice every real architecture does, and thus the Linux kernel does. That’s the reason those primitives exist.

There has been discussion in the C standards committee of standardizing something similar to the Linux kernel’s memory model.

RalfJung · December 15, 2018, 10:49am

That just doesn't matter. We are not talking about architecture semantics here, we are talking about programming language semantics. See this example by @alexcrichton for why arguing with architectures just does not make any sense in this discussion.

We need a commitment from LLVM, or else this will be a hack that "happens to work" but remains UB and could stop working any time LLVM adds a new optimization.

josh · December 15, 2018, 10:57am

I’m talking about programming language semantics as well. My point is precisely that we need the facilities to implement READ_ONCE and WRITE_ONCE primitives we can rely on.

C doesn’t specify that interaction with respect to volatile accesses, but in practice that is the interaction on every existing compiler and target, so the Linux kernel counts on that.

RalfJung · December 15, 2018, 12:10pm

I agree it’d be reasonable for LLVM to guarantee that a volatile read will never have undef/poison in it, it will always be frozen. These reads cannot be duplicated, after all.

Has anybody ever talked with the LLVM devs about this? Is that something they would be willing to guarantee?

RalfJung · February 23, 2019, 12:59pm

I just found this, seems very related: there is no such thing as a “benign” data race. In particular this has an explicit section on “disjoint bit manipulation” (Cc @carllerche)

comex · February 23, 2019, 5:50pm

Note that the example in that section is valid for normal reads/writes, in which case a racing read that could observe the speculative store would be undefined behavior, but not if the read and write are both atomic (even if relaxed), in which case the read is guaranteed to observe some value that was previously written.

RalfJung · February 24, 2019, 11:45am

Sure, relaxed accesses don’t make a data race (just a race condition, and these are allowed).

But this thread is precisely about code using a non-atomic (volatile) racy access.

system · May 25, 2019, 11:45am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
`unordered` as a solution to "Bit-wise reasoning for atomic accesses" Unsafe Code Guidelines	24	3594	December 22, 2024
Reading and erasing bits outside of an allocation in inline asm Unsafe Code Guidelines	2	301	August 27, 2024
Blog post: Thoughts on trusting types and unsafe code Unsafe Code Guidelines	47	5969	March 25, 2019
Reordering of writes via differently-typed pointers Unsafe Code Guidelines	14	2025	March 25, 2019
Type-safe atomics usage libs	17	1826	August 23, 2021

Bit-wise reasoning for atomic accesses

Related topics