Should this code be legal?
Main thread sends pointer to its local variable to another thread using relaxed AtomicPtr operations, second thread only writes to that variable then notifies main thread using Release - Acquire pair.
Since first thread do single write before starting second thread and never does any writes after, there should not be any need to synchronize any reads from variable made in second thread even if it had read it.
Since second thread does only writes to variable, it should not need to acquire it (e.g. make it current value visible to second thread) so it should not require synchronization with sender.
Why it is not allowed?
use std::sync::atomic::Ordering::*;
use std::sync::atomic::*;
use std::sync::*;
static P: AtomicPtr<u8> = AtomicPtr::new(core::ptr::null_mut());
static MUTEX: Mutex<()> = Mutex::new(());
static CONDVAR: Condvar = Condvar::new();
static IS_FINISHED: AtomicBool = AtomicBool::new(false);
fn main() {
for _ in 0..1000 {
P.store(core::ptr::null_mut(), Relaxed);
IS_FINISHED.store(false, Relaxed);
let mut val: u8 = 0;
let _t1 = std::thread::spawn(|| {
while P.load(Relaxed).is_null() {
std::hint::spin_loop();
}
unsafe {
let ptr = P.load(Relaxed);
// Access 2
*ptr = 127;
}
IS_FINISHED.store(true, Release);
let _g = MUTEX.lock().unwrap();
CONDVAR.notify_one();
});
// Access 1
P.store(&mut val, Relaxed);
let mut guard = MUTEX.lock().unwrap();
while !IS_FINISHED.load(Acquire) {
guard = CONDVAR.wait(guard).unwrap();
}
// Access 3
assert_eq!(val, 127);
}
}
error: Undefined Behavior: Data race detected between (1) non-atomic write on thread `main` and (2) non-atomic write on thread `<unnamed>` at alloc1927. (2) just happened here
--> src\main.rs:24:17
|
24 | *ptr = 127;
| ^^^^^^^^^^ Data race detected between (1) non-atomic write on thread `main` and (2) non-atomic write on thread `<unnamed>` at alloc1927. (2) just happened here
|
help: and (1) occurred earlier here
--> src\main.rs:33:17
|
33 | P.store(&mut val, Relaxed);
| ^^^^^^^^
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: BACKTRACE (of the first span):
= note: inside closure at src\main.rs:24:17: 24:27
note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace
error: aborting due to 1 previous error
I can’t comment on the question whether this is intended behavior, but I can minimize your example:
use std::sync::atomic::{AtomicPtr, Ordering::Relaxed};
static P: AtomicPtr<u8> = AtomicPtr::new(core::ptr::null_mut());
fn main() {
let val: u8 = 0;
// take reference here
let r = &val;
// start thread here
let t = std::thread::spawn(|| loop {
match P.load(Relaxed) {
ptr if ptr.is_null() => std::hint::spin_loop(),
ptr => {
unsafe {
let _v = *ptr;
}
break;
}
}
});
P.store(r as *const u8 as _, Relaxed);
t.join().unwrap();
}
miri is happy
use std::sync::atomic::{AtomicPtr, Ordering::Relaxed};
static P: AtomicPtr<u8> = AtomicPtr::new(core::ptr::null_mut());
fn main() {
let val: u8 = 0;
// start thread here
let t = std::thread::spawn(|| loop {
match P.load(Relaxed) {
ptr if ptr.is_null() => std::hint::spin_loop(),
ptr => {
unsafe {
let _v = *ptr;
}
break;
}
}
});
// take reference here
let r = &val;
P.store(r as *const u8 as _, Relaxed);
t.join().unwrap();
}
error: Undefined Behavior: Data race detected between (1) non-atomic write on thread `main` and (2) non-atomic read on thread `<unnamed>` at alloc1507. (2) just happened here
--> src/main.rs:14:30
|
14 | let _v = *ptr;
| ^^^^ Data race detected between (1) non-atomic write on thread `main` and (2) non-atomic read on thread `<unnamed>` at alloc1507. (2) just happened here
|
help: and (1) occurred earlier here
--> src/main.rs:22:13
|
22 | let r = &val;
| ^^^^
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: BACKTRACE (of the first span):
= note: inside closure at src/main.rs:14:30: 14:34
I’m not entirely sure why creating &val counts as a “non-atomic write” (to val), as far as I can tell.
The plot thickens… apparently, this alone makes a difference in my minimized example… (I’m not entirely sure what exactly is going on here, and whether or not it’s all intended behavior).
use std::sync::atomic::{AtomicPtr, Ordering::Relaxed};
static P: AtomicPtr<u8> = AtomicPtr::new(core::ptr::null_mut());
fn main() {
let val: u8 = 0;
// already take a (unused) reference here
let _ = &val;
// take reference here
let t = std::thread::spawn(|| loop {
match P.load(Relaxed) {
ptr if ptr.is_null() => std::hint::spin_loop(),
ptr => {
unsafe {
let _v = *ptr;
}
break;
}
}
});
// take the actual reference here
let r = &val;
P.store(r as *const u8 as _, Relaxed);
t.join().unwrap();
}
Regarding your original code, I would just assume that Rust probably has a rule like “when &mut …expr… is evaluated, then …expr… must be valid for writes at that point”. And this means, for soundness purposes, a write happens at that point, i.e. after the thread was started, and that write is unsynchronized. E.g. this is accepted (though I would probably even make sure to create the pointer before the thread starts):
use std::sync::atomic::Ordering::*;
use std::sync::atomic::*;
use std::sync::*;
static P: AtomicPtr<u8> = AtomicPtr::new(core::ptr::null_mut());
static MUTEX: Mutex<()> = Mutex::new(());
static CONDVAR: Condvar = Condvar::new();
static IS_FINISHED: AtomicBool = AtomicBool::new(false);
fn main() {
for _ in 0..10 {
P.store(core::ptr::null_mut(), Relaxed);
IS_FINISHED.store(false, Relaxed);
let mut val: u8 = 0;
let r = &mut val;
let _t1 = std::thread::spawn(|| {
while P.load(Relaxed).is_null() {
std::hint::spin_loop();
}
unsafe {
let ptr = P.load(Relaxed);
// Access 2
*ptr = 127;
}
IS_FINISHED.store(true, Release);
let _g = MUTEX.lock().unwrap();
CONDVAR.notify_one();
});
// Access 1
P.store(r, Relaxed);
let mut guard = MUTEX.lock().unwrap();
while !IS_FINISHED.load(Acquire) {
guard = CONDVAR.wait(guard).unwrap();
}
// Access 3
assert_eq!(val, 127);
}
}
but this isn’t
use std::sync::atomic::Ordering::*;
use std::sync::atomic::*;
use std::sync::*;
static P: AtomicPtr<u8> = AtomicPtr::new(core::ptr::null_mut());
static MUTEX: Mutex<()> = Mutex::new(());
static CONDVAR: Condvar = Condvar::new();
static IS_FINISHED: AtomicBool = AtomicBool::new(false);
fn main() {
for _ in 0..10 {
P.store(core::ptr::null_mut(), Relaxed);
IS_FINISHED.store(false, Relaxed);
let mut val: u8 = 0;
let r = &mut val;
let _t1 = std::thread::spawn(|| {
while P.load(Relaxed).is_null() {
std::hint::spin_loop();
}
unsafe {
let ptr = P.load(Relaxed);
// Access 2
*ptr = 127;
}
IS_FINISHED.store(true, Release);
let _g = MUTEX.lock().unwrap();
CONDVAR.notify_one();
});
let r = &mut *r;
// Access 1
P.store(r, Relaxed);
let mut guard = MUTEX.lock().unwrap();
while !IS_FINISHED.load(Acquire) {
guard = CONDVAR.wait(guard).unwrap();
}
// Access 3
assert_eq!(val, 127);
}
}
At least NLL wise, &mut LV is a "deep write" of LV, and I assumed with my own tinkering that was what Miri was talking about with &mut val being a non-atomic write. But that doesn't explain why &val would also be called a write (in NLL it's a "deep read").
Intuitively, I'd guess maybe the compiler wants to be allowed to initialize val only after the thread was started, if it had otherwise been unused before, and thus calls out the &val as the first use site as a write even though the true write operation is the initialization. I'm not familiar with the formal rules that may or may not validate this interpretation.
Edit: Some minor testing suggests it might be less about “first usage” of a variable and more about “first usage that forces the variable to have a location in memory”.
(see my “minor testing” here)
Like… this fails:
use std::sync::atomic::{AtomicPtr, Ordering::Relaxed};
static P: AtomicPtr<u8> = AtomicPtr::new(core::ptr::null_mut());
fn main() {
let val: u8 = 0;
let val2 = val;
println!("{val2}");
// take reference here
let t = std::thread::spawn(|| loop {
match P.load(Relaxed) {
ptr if ptr.is_null() => std::hint::spin_loop(),
ptr => {
unsafe {
let _v = *ptr;
}
break;
}
}
});
// already take a (unused) reference here
let _ = &val;
// take the actual reference here
let r = &val;
P.store(r as *const u8 as _, Relaxed);
t.join().unwrap();
}
but this doesn’t
use std::sync::atomic::{AtomicPtr, Ordering::Relaxed};
static P: AtomicPtr<u8> = AtomicPtr::new(core::ptr::null_mut());
fn main() {
let val: u8 = 0;
let val2 = *&val; // or e.g. val.clone()
println!("{val2}");
// take reference here
let t = std::thread::spawn(|| loop {
match P.load(Relaxed) {
ptr if ptr.is_null() => std::hint::spin_loop(),
ptr => {
unsafe {
let _v = *ptr;
}
break;
}
}
});
// already take a (unused) reference here
let _ = &val;
// take the actual reference here
let r = &val;
P.store(r as *const u8 as _, Relaxed);
t.join().unwrap();
}
Of course, we might or might not be also dealing with some false negatives here, maybe that first usage by-reference does not actually/officially guarantee a stable memory position like that, but miris implementation works with that approach.
Stacked Borrows has this rule, creation of a mutable reference triggers a "phantom" write. However, Tree Borrows is more permissive, and only triggers a phantom read. (Which will still cause your final example to fail)
(Where scopes actually cover isn't notated in the human consumption only MIR format.) This doesn't seem like it should have semantic impact; everything w.r.t. the place should be happening in the same causual order in both cases.
Without having dived into Miri, it likely is a false positive due to places not being allocated into memory until their reference is taken. It's an optimization for most cases (there's a lot of bookkeeping happening for allocations in Miri) but results in a false positive here. Assuming that's a correct assessment, the fix would be to assign a timestamp corresponding to the local place initialization when reifying into an allocation, instead of the timestamp of the reification itself.
The original program here is not a false positive, Miri is right here. &mut val is, conceptually, a write to val, and that write races with the write in the other thread.
However, this version with a raw pointer still shows the same error. So I think @CAD97 is on to something, the "delayed allocation" trick doesn't interact properly with the data race detection. This could be tricky to fix...
The idea is that the compiler should be allowed to insert spurious writes even if you didn't do any explicit write. Having &mut act as a write ensures that this is possible.
As was mentioned above, in Tree Borrows this got relaxed to having &mut x (and &x) just be a read, not a write. This avoids a bunch of UB found in the wild. That gives up on spurious writes but still allows the compiler to insert spurious reads (we set the dereferenceable attribute to inform LLVM about this). The "implicit write on &mut" is somewhat experimental, but the implicit read is almost certainly going to happen, and that is sufficient to make the original program UB.
(Tree Borrows is another experimental aliasing model, a successor of sorts to Stacked Borrows. Where Stacked Borrows probably rejects too many programs, Tree Borrows probably accepts too many.)
But why could allowing spurious writes be desirable? Would it help optimizations? Are those optimizations done in practice today, or are they just theoretical?
If it reduces the number of UB found in the wild, while not giving up some very important optimization, it seems that tree borrows is right on this one.
(Spurious reads on the other hand helps enabling speculative reads / reads that are done in advance whether we will actually need them or not; this seems much more useful)
and other code motion along those lines. The most obvious way doing so is beneficial is when it reduces register pressure.
Doing the write earlier counts as spurious because unknown() could read the memory location and then diverge, at least per TB's rules. The write after would be UB in that case, but it never happens if unknown() doesn't return normally.
fn foo(x: usize, y: &mut usize) {
for _ in 0..x {
*y += 1;
}
}
Allowing spurious writes through &mut allows optimizing that to *y += x, otherwise it has to branch for the case without a write: if x != 0 { *y += x; }
But if the for loop never actually ran the body, that *p = 1 wouldn't have happened, so in order to be allowed to do that you need to be allowed to do spurious writes.
(The read version of that is more commonly applicable, but the write version exists too.)