Idea: Replace SeqCst with SeqCst(Acq|Rel|AcqRel)

HadrienG · October 1, 2019, 7:15am

The RFC that you pointed to essentially went into the opposite direction: it proposed to remove some of the Acquire/Release/AcqRel nuance.

Personally, I like this nuance (again, it is very useful as a beginner lint and a review aid) so I would be unhappy about that.

SeqCst does add a guarantee on top of Acquire/Release, it is just that this guarantee is more obscure than most people think and rarely needed in practice.

To explain what SeqCst does, let's step back first and look at what guarantees Relaxed actually provides us. It is very common for vulgarizations to summarize Relaxed atomic ordering as "no ordering guarantee", but that is actually incorrect.

At the hardware level, cache-coherent CPUs guarantee that all threads agree on a single order for writes to a given memory location. This means that e.g. this execution is forbidden:

// Thread 1
a.store(0, Hardware);  // This fictious ordering rigorously follows HW semantics

// Thread 2
a.store(1, Hardware);

// Thread 3
a.load(Hardware); // returns 0
a.load(Hardware); // returns 1, i.e. thread 1's write to a happened before
                  // thread 2's write from thread 3's perspective.

// Thread 4 (violates cache coherence w.r.t. thread 3)
a.load(Hardware); // returns 1
a.load(Hardware); // returns 0, i.e. thread 2's write to a happened before
                  // thread 1's write from thread 4's perspective.

What Relaxed atomic operations do is to replicate the hardware guarantees of cache coherence at the language level. It does so by, among other things, forbidding compiler optimizers to reorder relaxed memory operations on a given variable with respect to each other. This is why LLVM calls this ordering Monotonic, which I personally agree is a more descriptive name.

Now, this ordering guarantee is all we need as long as we are only manipulating a single shared memory location. But no such guarantee exists when manipulating multiple memory locations. For example, this execution is allowed:

// Thread 1
a_1.store(0, Relaxed);
a_2.store(0, Relaxed);
a_1.store(1, Relaxed);
a_2.store(1, Relaxed);

// Thread 2
a_2.load(Relaxed); // returns 1
a_1.load(Relaxed); // returns 0, this is allowed by Relaxed.

In practice, it can happen as a result of hardware or compilers reordering the store to a_1 after the store to a_2 or reordering the load from a_1 before the load from a_2.

This is where the other atomic memory orderings come into play. In this particular case, as in all cases where there is only a single writer thread at a time, the reordering can be forbidden with an Acquire/Release pair if it is undesirable...

// Thread 1
a_1.store(0, Relaxed);
a_2.store(0, Relaxed);
a_1.store(1, Relaxed);
a_2.store(1, Release);

// Thread 2
a_2.load(Acquire); // returns 1
a_1.load(Relaxed); // returning 0 is not allowed here. If the write of 1 to
                   // a_2 happened, then all previous writes from thread 1
                   // also happened from this thread's perspective.

...however, if there are 2+ writer threads and 2+ atomic variables, Acquire/Release is not enough to guarantee that all threads agree on a single write order anymore:

// Thread 1
a_1.store(0, Relaxed);
a_1.store(1, Release);
a_2.load(Acquire);  // May return 0, i.e. a_2.store(1) has not happened
                    // yet from thread 1's perspective.

// Thread 2
a_2.store(0, Relaxed);
a_2.store(1, Release);
a_1.load(Acquire);  // May return 0 as well, i.e. a_1.store(1) has not
                    // happened yet from thread 2's perspective

The reason why this can happen is that hardware and compilers are allowed to reorder Acquire loads before Release stores. From the "acquire is like acquiring a mutex and release is like releasing a mutex" perspective, this is equivalent to moving extra reads and writes into a mutex-protected region, which is fine.

Now, for most synchronization protocols, this limitation of Acquire/Release does not matter, because synchronization transactions are made visible to other threads in a single atomic store or RMW operation that systematically targets the same atomic variable. As a result, cache coherence of that variable will give you all the total store ordering that you need "for free".

But a few complicated synchronization protocols whose correctness is hard to prove require manipulating multiple atomic variables per transaction. In this case, SeqCst may be needed. What SeqCst guarantees is that, given a set of atomic variables a_i...

If all stores to the a_i variables are SeqCst...
...then all SeqCst loads from the a_i variables agree on a single store order.

From a hardware/compiler semantics point of view, this is almost the same as using Release stores and Acquire loads, except that a SeqCst load to a variable a_1 can't be reordered before a SeqCst store to a different variable a_2 inside a given thread. On most hardware, enforcing this property requires use of an expensive memory barrier instruction.

It has been claimed before that this guarantee is not implementable at all on some hardware like Power. Personally, I am not familiar enough with the Power memory model to judge on this. All I know is the language-level semantics which SeqCst is intended to provide.

I hope this clarifies what extra guarantees SeqCst actually provides with respect to Acquire/Release, and why these guarantees are less frequently needed than most people think.

As for this proposal not matching the C++ memory model, all the nuances of SeqCst that I am proposing are strictly stronger than what C++'s memory_order_seq_cst provides by virtue of eliminating an ambiguity about memory ordering without relaxing any other constraint.

Therefore, you can't introduce a synchronization bug by porting an algorithm that uses C++'s memory_order_seq_cst to these more specific orderings. At worst your program will panic because you asked for an impossible memory ordering, which would be a hidden bug in C++.

Conversely, any program using the proposed SeqCst(Acq|Rel|AcqRel) orderings and which does not panic due to using an incorrect memory ordering has the same meaning if reimplemented using memory_order_seq_cst in C++.

Topic		Replies	Views
Why does Rust not permit CAS operations with Release/Acquire ordering?	8	1001	July 2, 2020
New target support compiler	9	1104	March 25, 2019
Trying to understand correctness of synchronization primitives libs	4	734	January 25, 2021
`unordered` as a solution to "Bit-wise reasoning for atomic accesses" Unsafe Code Guidelines	24	3537	December 22, 2024
Async and the Rust memory model language design	4	976	June 3, 2023

Idea: Replace SeqCst with SeqCst(Acq|Rel|AcqRel)

Related topics