Is this a data race?

Does the following code contain a data race and UB?

static X: AtomicUsize = AtomicUsize::new(0);

// thread 1
X.store(92, Ordering::Relaxed);

let non_atomic_x: &usize = unsafe { transmute::<&AtomicUsize, &usize>(&X) };
assert_eq!(*non_atomic_x, 92);

// thread 2
X.store(92, Ordering::Relaxed);

And, for the fun of it, poll (rule: please give your choice before reading the replies in this topic):

  • Yes
  • No
  • Unsure, but think yes
  • Unsure, but think no

0 voters

3 Likes

Data race: (according to https://docs.oracle.com/cd/E19205-01/820-0619/geojs/index.html)

A data race occurs when:

  • two or more threads in a single process access the same memory location concurrently, and
  • at least one of the accesses is for writing, and
  • the threads are not using any exclusive locks to control their accesses to that memory.

So, we have

assert_eq!(*non_atomic_x, 92); and X.store(92, Ordering::Relaxed);

Which is

two or more threads in a single process access the same memory location concurrently, and

at least one of the accesses is for writing, and

the threads are not using any exclusive locks to control their accesses to that memory.

  • both are not syncronized

So yes, it is as data race

4 Likes

I don’t think that that explanation is applicable here (it even seems to predate C11 memory model). For example, I am pretty sure that using a relaxed load would guarantee that the assert does not fire, but that would still be flagged as a data race by the above definition.

I'm basing this off of saying that exclusive locks also extends to atomics ordering. So a relaxed load would be fine, but not a non-atomic load. As long as both memory accesses are syncronized in some way (atomics, mutex, etc.).

3 Likes

I would say it is one because the load is non atomic.

Why is that?

Not all data races are problems. Some non-blocking synchronization algorithms are full of what some definitions would classify as data races, but they're still correct.

Is there a real scenario behind this? Where you'd want a synchronized write followed by unsynchronized reads, hoping this is ok as long as concurrent writes actually have the same value...

This is true for some definitions, but the C++ and Rust spec-level definition of "data race" is specifically the kind of race that is undefined behavior.

8 Likes

Both the C++ spec and the LLVM Language Reference make it UB for a non-atomic read to race with an atomic write, with no exception for the case where the write is of the same value that was previously stored at that location. So the answer is yes.

That said, I cannot think of any reasonable compiler optimization that could break that code, so you could probably get away with YOLOing it, if you really wanted to. :wink:

11 Likes

I can come up with an utterly insane but spec compliant atomic implementation that causes the assert to fail:

  • There is a global Mutex<()>.
  • Any atomic access must hold the lock to said mutex.
  • The atomic write first acquires the mutex lock, then (unsynchronized) zeroes the location, then (unsynchronized) writes the desired value, then unlocks the mutex.
  • The unsynchronized read happens between the two "halves" of the atomic write.
    • If the read were synchronized, it could not happen while the write to the location holds the lock, thus would be guaranteed to read the correct value
    • But because the read is not synchronized with a (potentially?) concurrent write, it can read any possible value (or be a UB data race, depending on your exact memory model (IIUC, rustc/LLVM says this is UB)) from the memory location.

(In practice the assert will probably be optimized out.)

1 Like

Is it really meaningful to say something isn't a data race because the same value is written in either case? That doesn't "feel" right.

I don't quite find that utterly insane. It sounds like a valid cache coherency protocol, although I don't think any existing implementation offers (or uses) it. The zeroing of the location which you describe could as well be another previous relaxed write (by thread 2) that is simply resolved at that point, just before writing 92, because it is opportune to do so. Then *non_atomic_x might read garbage entirely. Or the relaxed write only appears atomic because the coherency bus provides a lock but in reality the memory is written in two separate operations. Then the non-atomic read might just read a teared value.

Well, it's obviously meaningful in the senses that it's a coherent definition, it's perfectly unambiguous when something is or is not a data race according to that definition (assuming we're already clear on what "read", "write", "thread", etc mean), and it's been proven implementable and optimizeable in practice.

IMO, this is just a case of us humans having intuitions that will never match any precise definition (and I do mean "us"; I certainly share the intuition that it "doesn't feel right"). Racing on "the same value" is merely the most trivial of the special cases that we are inclined to believe "should" be a "benign" data race. But even if we (and LLVM and all of the CPUs) actually endorsed and made well-defined that special case in Rust's (and their) official semantics, next up would be things like "racing on x += 1 can't possibly make x smaller, right?". You have to draw the line somewhere. AFAIK making "same value races" well-defined has no significant practical benefit, and would be hellish for CPUs/compilers to detect, thus hurting everyone's runtime and compile times, so it's outside the line.

3 Likes

There is a real story behind this, but it’s messier. The unsynchronized load happens in asm (where it is just a load).

2 Likes

That's a completely different question though (and part of the reason many are not eager to embrance inline asm). Because instead you have to ask: Is a load instruction on that particular platform on which the asm! runs equivalent to a relaxed load? To which the answer is most/very likely yes for register sizes types and common platform implementations. Inline asm sidesteps all of the abstract machine based discussion of UB, or at least it is supposed to and hence so hard to reason about and define precisely.

(Edit: this also means you may be able to use the sound atomic.load(Ordering::Relaxed) with the same real-world performance as *transmuted_atomic).

7 Likes

I also used something very similar to this when mixing some FFI with some self-pointing structs (I used "self-pointing" as pointers are used instead of references, and I wanted to distinguish from self-referential structs). The code looks like this:

// used by FFI
#[repr(C)]
struct C {
    data_ptr: *mut u32,
    // ...
}

// Rust wrapper around C
#[repr(transparent)]
struct Wrapper(C);

impl Wrapper {
    fn process(&self) -> u32 {
        unsafe { ffi_read_fn(&self.0 as *const C) }
    }
}

// same structure as C but data_ptr is atomic
#[repr(C)]
struct Shadow {
    data_ptr: AtomicPtr<u32>,
    // ...
}

// contains both a shadowed C, and the data it points to
struct SmallWrapper {
    shadow: Shadow,
    data: u32,
}


impl Deref for SmallWrapper {
    type Target = Wrapper;
    fn deref(&self) -> &Wrapper {
        // the pointer is *mut instead of *const because of type of C,
        // but &Wrapper will only use it for reading
        let data_ptr = &self.data as *const u32 as *mut u32;
        self.shadow.data_ptr.store(data_ptr, Ordering::Relaxed);
        let ptr = &self.shadow as *const Shadow as *const Wrapper;
        unsafe { &*ptr }
    }
}

Rust uses the C++20 memory ordering semantics.

If one evaluation modifies a memory location, and the other reads 
or modifies the same memory location, and if at least one of the 
evaluations is not an atomic operation, the behavior of the program 
is undefined (the program has a data race) unless there exists a 
happens-before relationship between these two evaluations. 

So, is there then a happens-before relationship between the store of 92 and the read?

Happens-before

Regardless of threads, evaluation A happens-before evaluation 
B if any of the following is true:

1) A is sequenced-before B
2) A inter-thread happens before B

which directs us to the definition of sequenced-before:

Sequenced-before

Within the same thread, evaluation A may be sequenced-before 
evaluation B, as described in evaluation order. 

which brings us to Order of evaluation

which says:

A sequence point is a point in the execution sequence where all
side effects from the previous evaluations in the sequence are 
complete, and no side effects of the subsequent evaluations started.

Rules

1) There is a sequence point at the end of each full 
expression (typically, at the semicolon).

My reading of this is then that there is no race or UB as far as Thread 1 is concerned with itself, as the Relaxed ordering only concerns how the compiler and hardware will constrain the flushing of modified store buffers to cache. They will still enforce a sequenced-before (and therefore happens-before) relation within the same thread of execution.

An unsynchronized write to a value that is read is a race, so Thread 2's write races Thread 1's read. There is no happens-before relation here, so it is technically UB. Probably benign, but technically yolo.

(here's where I stir up the hornet's nest...) races are good, actually.

1 Like

I personally see this as two questions:

  1. If the write on thread 2 mutates the value stored in memory, is the code in the OP a data race?

    To which the answer is yes.

  2. Is writing into memory the very same contents of that memory considered a mutation?

    To which I think the answer is yes.

If the memory hardware required a multi-phase write, e.g., "clearing" the memory to a constant state (all 0b0 or all 0b1) and then writing the specific value, then it is possible that a concurrent thread might be able to observe an intermediate state of the write process. IIRC, some recirculating-delay-line memories in 1950s-era computers and some early EEPROM memories had 2-phase writes. However the system designers would have ensured that the intermediate memory state(s) were unobservable by executing code except in the presence of hardware errors.

In summary, this gives the potential for mutation, but only in the presence of hardware errors.

3 Likes