Heh. That does remove some of the urgency, so to speak. Does anyone know of any other examples of Rust code that intentionally uses non-atomic loads instead of Relaxed ones [edit: in situations where a race is possible]?
Or even examples of code that does use Relaxed but could potentially go faster with Unordered. This includes loads that, after inlining, are:
Completely unused
Might be rare; I'm not really sure
Performed repeatedly in a loop, but would be okay to hoist out of the loop (i.e. the value is not expected to change during the loop, or the code doesn't care if it does)
Though hoisting is only possible if the compiler can prove there are no aliasing writes within the loop, which is often hard, especially with the noalias woes
Performed multiple times in succession, when one load would suffice
None of the above low-likelihood examples appear to be worth the risk of adding a new memory ordering mode that lacks provable theoretical foundations.
loop {
let head = self.head.load(Acquire);
// safety: this is the **only** thread that updates this cell.
let tail = self.tail.unsync_load();
if tail.wrapping_sub(head) < self.buffer.len() as u32 {
// Map the position to a slot index.
let idx = tail as usize & self.mask;
// Don't drop the previous value in `buffer[idx]` because
// it is uninitialized memory.
self.buffer[idx].as_mut_ptr().write(task);
// Make the task available
self.tail.store(tail.wrapping_add(1), Release);
return;
}
// The local buffer is full. Push a batch of work to the global
// queue.
match self.push_overflow(task, head, tail, global) {
Ok(_) => return,
// Lost the race, try again
Err(v) => task = v,
}
}
From the article I don't think Self::push_overflow writes self.tail, but it does "include stronger atomic operation [than Acquire]".
Thanks, but I just edited my last post to clarify – I meant non-atomic loads that can race. In that case, if the comment is correct, there are no potentially racing stores so no UB.