Idea: Replace SeqCst with SeqCst(Acq|Rel|AcqRel)

HadrienG · September 30, 2019, 8:23pm

Among the std::sync::atomic::Orderings, SeqCst holds the dubious distinction of being the only ordering whose synchronization guarantees are ambiguous and require extra context to figure out.

That is because it provides Acquire synchronization on loads, Release synchronization on stores, and AcqRel synchronization on Read-Modify-Write (RMW) operations, which are associated with different memory barriers at the hardware and compiler level.

This ambiguity has practical consequences:

Atomics beginner mistakes like asking for Release loads don't get caught, as they would if Acquire/Release/AcqRel orderings were directly requested, when SeqCst is used as a supposedly safe default as recommended by several introductions to atomics.
Proving the correctness of synchronization code based on SeqCst atomic ordering is more cumbersome, because reviewers must reverse-engineer the associated memory barriers by checking the associated atomic operations.
Reworking a synchronization protocol based on SeqCst is even more dangerous than reworking one based on Acquire/Release/AcqRel, as it is possible to silently change memory barrier semantics by switching between different kind of atomic operations.

I think that we could resolve this ambiguity by replacing SeqCst with more specific memory orderings called SeqCstAcq, SeqCstRel and SeqCstAcqRel, which would come with the following semantics:

SeqCstAcq has the semantics of today's SeqCst, but is only accepted on load operations. Therefore, it unambiguously has Acquire semantics.
SeqCstRel has the semantics of today's SeqCst, but is only accepted on store operations. Therefore, it unambiguously has Release semantics.
SeqCstAcqRel has the semantics of today's SeqCst, but is only accepted on RMW operations. Therefore, it unambiguously has AcqRel semantics.

This should be rather easy to implement, but would come at the cost of some ecosystem churn that would only be acceptable during an edition switch (if at all). In exchange, we would get the following benefits:

The language implementation would be able to notify SeqCst atomics users about unrealistic memory barrier expectations (e.g. expecting loads to have Release semantics).
Memory barriers would be more apparent during code review, which would make it easier to review the correctness of SeqCst atomics-based synchronization protocols.
People who go for SeqCst as a safe default would be forced to reconsider whether they truly, positively need the rather obscure multi-atomics store ordering guarantee that it provides

Given that 1/atomics are hard and therefore deserve the best language ergonomics we can give them, 2/only a fraction of Rust codebases use atomics and 3/only a fraction of those use SeqCst ordering and would therefore be affected by this breaking change, I think that the churn might actually be worthwhile.

What do you think?

Ixrec · September 30, 2019, 9:09pm

Semi-related:

I'm no expert on atomics, but I kinda came away from that RFC thread feeling like there's no real consensus on precisely what any of the atomic orderings mean in any language, and trying to "fix" that has the annoyingly significant practical cost that we can no longer say "it's the same as C++". For example, these two threads seem to be disagreeing on whether SeqCst is "just sugar" for using Acq and Rel in all the right places, or if SeqCst adds some other guarantees on top of that, and I have no idea which is right.

HadrienG · October 1, 2019, 7:15am

The RFC that you pointed to essentially went into the opposite direction: it proposed to remove some of the Acquire/Release/AcqRel nuance.

Personally, I like this nuance (again, it is very useful as a beginner lint and a review aid) so I would be unhappy about that.

SeqCst does add a guarantee on top of Acquire/Release, it is just that this guarantee is more obscure than most people think and rarely needed in practice.

To explain what SeqCst does, let's step back first and look at what guarantees Relaxed actually provides us. It is very common for vulgarizations to summarize Relaxed atomic ordering as "no ordering guarantee", but that is actually incorrect.

At the hardware level, cache-coherent CPUs guarantee that all threads agree on a single order for writes to a given memory location. This means that e.g. this execution is forbidden:

// Thread 1
a.store(0, Hardware);  // This fictious ordering rigorously follows HW semantics

// Thread 2
a.store(1, Hardware);

// Thread 3
a.load(Hardware); // returns 0
a.load(Hardware); // returns 1, i.e. thread 1's write to a happened before
                  // thread 2's write from thread 3's perspective.

// Thread 4 (violates cache coherence w.r.t. thread 3)
a.load(Hardware); // returns 1
a.load(Hardware); // returns 0, i.e. thread 2's write to a happened before
                  // thread 1's write from thread 4's perspective.

What Relaxed atomic operations do is to replicate the hardware guarantees of cache coherence at the language level. It does so by, among other things, forbidding compiler optimizers to reorder relaxed memory operations on a given variable with respect to each other. This is why LLVM calls this ordering Monotonic, which I personally agree is a more descriptive name.

Now, this ordering guarantee is all we need as long as we are only manipulating a single shared memory location. But no such guarantee exists when manipulating multiple memory locations. For example, this execution is allowed:

// Thread 1
a_1.store(0, Relaxed);
a_2.store(0, Relaxed);
a_1.store(1, Relaxed);
a_2.store(1, Relaxed);

// Thread 2
a_2.load(Relaxed); // returns 1
a_1.load(Relaxed); // returns 0, this is allowed by Relaxed.

In practice, it can happen as a result of hardware or compilers reordering the store to a_1 after the store to a_2 or reordering the load from a_1 before the load from a_2.

This is where the other atomic memory orderings come into play. In this particular case, as in all cases where there is only a single writer thread at a time, the reordering can be forbidden with an Acquire/Release pair if it is undesirable...

// Thread 1
a_1.store(0, Relaxed);
a_2.store(0, Relaxed);
a_1.store(1, Relaxed);
a_2.store(1, Release);

// Thread 2
a_2.load(Acquire); // returns 1
a_1.load(Relaxed); // returning 0 is not allowed here. If the write of 1 to
                   // a_2 happened, then all previous writes from thread 1
                   // also happened from this thread's perspective.

...however, if there are 2+ writer threads and 2+ atomic variables, Acquire/Release is not enough to guarantee that all threads agree on a single write order anymore:

// Thread 1
a_1.store(0, Relaxed);
a_1.store(1, Release);
a_2.load(Acquire);  // May return 0, i.e. a_2.store(1) has not happened
                    // yet from thread 1's perspective.

// Thread 2
a_2.store(0, Relaxed);
a_2.store(1, Release);
a_1.load(Acquire);  // May return 0 as well, i.e. a_1.store(1) has not
                    // happened yet from thread 2's perspective

The reason why this can happen is that hardware and compilers are allowed to reorder Acquire loads before Release stores. From the "acquire is like acquiring a mutex and release is like releasing a mutex" perspective, this is equivalent to moving extra reads and writes into a mutex-protected region, which is fine.

Now, for most synchronization protocols, this limitation of Acquire/Release does not matter, because synchronization transactions are made visible to other threads in a single atomic store or RMW operation that systematically targets the same atomic variable. As a result, cache coherence of that variable will give you all the total store ordering that you need "for free".

But a few complicated synchronization protocols whose correctness is hard to prove require manipulating multiple atomic variables per transaction. In this case, SeqCst may be needed. What SeqCst guarantees is that, given a set of atomic variables a_i...

If all stores to the a_i variables are SeqCst...
...then all SeqCst loads from the a_i variables agree on a single store order.

From a hardware/compiler semantics point of view, this is almost the same as using Release stores and Acquire loads, except that a SeqCst load to a variable a_1 can't be reordered before a SeqCst store to a different variable a_2 inside a given thread. On most hardware, enforcing this property requires use of an expensive memory barrier instruction.

It has been claimed before that this guarantee is not implementable at all on some hardware like Power. Personally, I am not familiar enough with the Power memory model to judge on this. All I know is the language-level semantics which SeqCst is intended to provide.

I hope this clarifies what extra guarantees SeqCst actually provides with respect to Acquire/Release, and why these guarantees are less frequently needed than most people think.

As for this proposal not matching the C++ memory model, all the nuances of SeqCst that I am proposing are strictly stronger than what C++'s memory_order_seq_cst provides by virtue of eliminating an ambiguity about memory ordering without relaxing any other constraint.

Therefore, you can't introduce a synchronization bug by porting an algorithm that uses C++'s memory_order_seq_cst to these more specific orderings. At worst your program will panic because you asked for an impossible memory ordering, which would be a hidden bug in C++.

Conversely, any program using the proposed SeqCst(Acq|Rel|AcqRel) orderings and which does not panic due to using an incorrect memory ordering has the same meaning if reimplemented using memory_order_seq_cst in C++.

Ixrec · October 1, 2019, 8:08am

HadrienG:

In this particular case, as in all cases where there is only a single writer thread, the reordering can be forbidden with an Acquire/Release pair if it is undesirable...
// Thread 1
a_1.store(0, Relaxed);
a_2.store(0, Relaxed);
a_1.store(1, Relaxed);
a_2.store(1, Release);

// Thread 2
a_2.load(Acquire) // returns 1
a_1.load(Relaxed) // returning 0 is not allowed here. If the write to
                  // a_2 happened, then all previous writes from thread 1
                  // have happened too.
...but if there are 2+ writer threads and 2+ atomic variables, Acquire/Release is not enough to guarantee that all threads agree on a single write order anymore:

Is the a_1.load(Relaxed) supposed to be a_1.load(Release)?

HadrienG · October 1, 2019, 8:08am

There is no such thing as a load(Release). Relaxed is enough here, because...

Compiler & hardware are not allowed to reorder a_1.store(1, ...) after a_2.store(1, Release) in thread 1 due to the Release memory barrier.
Compiler & hardware are not allowed to reorder a_1.load(...) before a_2.load(Acquire) in thread 2 due to the Acquire memory barrier.

From the perspective of the mutex-locking interpretation of Acquire and Release, these operations would be respectively equivalent to moving a store after the end of a mutex-protected region and moving a load before the start of a mutex-protected region.

(Actually, a_1 loads and stores are not even required to be atomic in this case, which is why Acquire and Release can be used for the purpose of implementing mutexes)

RalfJung · October 17, 2019, 8:26am

Do I understand correctly that this proposal just changes the language-level API for interacting with atomics, but these three new orderings you are proposing actually all behave the same as some existing ordering does now? However, intuitively I have no idea what SeqCstAcq would do on the write part of an RMW. If it is consistent with Acquire it would behave relaxed; that I think is terrible API design (we shouldn't implicitly make anything relaxed).

Independent of that, I feel fairly strongly (and someone linked to my RFC about that) that the odd ones out here are Release/Acquire, in the sense that these actually can mean Relaxed when used with RMW operations. So if we deviate from the C++ API, we should do so by changing what we do with release/acquire orderings. SeqCst is totally fine IMO and this proposed change is a step backwards in API clarity.

That is incorrect. We have precise formal models of what the orderings in C++ mean (we don't for LLVM though). There are issues with these models, but that is not what my RFC was about and I think it is also not is being proposed here. These discussions are solely about how we expose these orderings to the user.

I don't see any nuance here, just misleading naming -- as I explained in my RFC.

"Loads synchronize with the stores they read from if both are at least release/acquire" (as in, using the kind of operation together with the ordering) is IMO much easier to explain that having two variants of every ordering (one for load and one for store) and then having to ensure two things at once: (a) we always match up the load-style (acquire) ordering with the load operation and the store-style ordering with the store operation; and (b) now acquire and release synchronize. I think the only effect that is achieved here is adding an indirection and increasing general confusion.

Having "release" and "acquire" together with a clause "but they can only be used on loads and stores" is bad API design. Those restrictions are not even enforced statically, to make things worse. And then things get really bad when we look at RMW operations (like CAS) that have two orderings, but we provide "easier" to use variants that only take a single argument defining both orderings and Release means "do a relaxed load". I find it hard to see any way in which this is not an awful API.

I personally have a very good understanding of release/acquire semantics, but I always have to pause for a second to remember: was it loads that release, or was it stores? I know what the ordering does, but always have a hard time remembering which of the two sides of the same coin I need to put for loads and which for stores. We should be naming the coin, instead of having two arbitrary unrelated names for its two sides.

The much better API would be based on the observation that for loads and stores there are 3 orderings each, and they are perfectly parallel: relaxed, release/acquire, sequentially-consistent. So we should have a 3-element Ordering enum encoding exactly that, then one can see immediately where each operation is on this ladder. This also fixes the problem of not allowing all orderings for all operations (any variant of this enum makes sense for any operation). And when there is an RMW operation, the single-argument variant can be entirely naturally defined by saying "this is like the two-argument variant (the one that takes two orderings; one for the load part and one for the store part) called with the same ordering twice".

So, to summarize, the current API is bad IMO and this proposal makes it worse.

RalfJung · October 18, 2019, 7:54am

To continue a bit on the line of thought of "we shouldn't allow passing orderings to an operation that do not make sense" (aka "let's use static typing"... aka the Rust way ), an alternative to what I proposed above is to just have separate orderings for loads and stores:

// Proposed above
enum Ordering { Relaxed, AcqRel, SeqCst }

// Alternative
enum LoadOrdering { Relaxed, Acquire, SeqCst } // maybe `Consume`, one day
enum StoreOrdering { Relaxed, Release, SeqCst }

This would work fine for loads and stores as well as the 2-ordering vairants of RMWs (RMW = read-modify-write; example for a 2-ordering RMW). But for 1-ordering RMWs like compare_and_swap and all the fetch_* operations, which ordering should they take? Clearly none of our two enums make sense (which is basically just another way to point out what I consider bad about the current API). We could add a new one:

enum RmwOrdering { Relaxed, AcqRel, SeqCst }

Or, if we want to also still support the partially relaxed operations:

enum RmwOrdering { Relaxed, AcqRlx, RlxRel, AcqRel, SeqCst }

Notice how, for example, we currently say Ordering::Acquire for what should really be RmwOrdering::AcqRlx, indicating that the store part of this operation is relaxed.

This 3-enum proposal I think conveys the nuance @HadrienG mentioned above (LoadOrdering::SeqCst vs StoreOrdering::SeqCst) while also avoiding "accidental relaxed" operations. It is significantly more verbose than the 1-enum variant I described above, but also more flexible (e.g. it can support Consume).

So, if we were pre-1.0, I would strongly argue that we should pick one of these two APIs for our atomics, using proper static typing instead of repeating the mistakes made in C++. My guess is that the 3-enum variant would be preferred due to its flexibility.

Alas, we are post-1.0, so we have to figure out how to live with what we got. But I think it helps to have a discussion for what we'd like the API to look like if we could change everything, and then use that to inform how we develop the API we actually have. Adding things like SeqCstAcq adds more variants to Ordering that are not actually allowed for all operations, and that are confusingly named when considering RMW operations, thus moving us further away from a "good" statically typed API in my view.

HadrienG · October 18, 2019, 8:17am

I really like your counter-proposal, and I think I have a way to make it work with backwards compatibility and in a fashion which would also resolve a longstanding gripe of mine with atomic orderings.

The idea would essentially be to part with the idea that atomic::Ordering has to be a Rust enum, and make it a bitflags-style bitfield with associated consts instead. Then you could allocate some of the bits to load barriers, some other to store barriers, and would have extra bits for future extensions like atomic volatile (if we ever manage to make that work with Rust references).

...but I don't have enough time to write this up right now, as I would rather focus on one thing at a time and move the atomics volatile discussion forward first.

atagunov · October 18, 2019, 2:01pm

New methods alongside old ones using one of Ralf's schemes?

LoadOrdering{..}/StoreOrdering{..}/RmwOrdering{..}
LoadOrdering{..}/StoreOrdering{..}
Ordering{Relaxed,AcqRel,SeqCst}

The last 2 are the only ones that seem easy to understand

HadrienG · October 18, 2019, 6:48pm

Because I still had this idea rolling around in my head throughout the day and needed to put a brain dump somewhere, here's a sketch of a possible backwards-compatible bitfield-based Ordering design:

// Yes, I know, you can't derive some of those. Please pardon my laziness.
#[derive(BitAnd, BitOr, BitXor, Clone, Copy, Debug, Eq, PartialEq)]
struct Ordering(u8);

// Here are the basic ordering bits (which aren't published), adding consume for kicks
impl Ordering {
    const RELAXED_BITS: Ordering = Ordering(0b0000);
    const CONSUME_BIT : Ordering = Ordering(0b0001);
    const ACQUIRE_BIT : Ordering = Ordering(0b0010);
    const RELEASE_BIT : Ordering = Ordering(0b0100);
    const SEQCST_BIT  : Ordering = Ordering(0b1000);
}

// Then we can define the combinations which are accepted on loads.
// Anything else will panic at runtime.
// FIXME: Rejecting bad orderings at compile time would be better.
pub mod load {
    use super::Ordering;

    // Notice how well bitfields map to the relationship between orderings
    pub const Relaxed: Ordering = Ordering::RELAXED_BITS;
    pub const Consume: Ordering = Ordering::CONSUME_BIT | Relaxed;
    pub const Acquire: Ordering = Ordering::ACQUIRE_BIT | Consume;
    pub const SeqCst : Ordering = Ordering::SEQCST_BIT  | Acquire;
}

// ...same for stores...
pub mod store {
    use super::Ordering;

    pub const Relaxed: Ordering = Ordering::RELAXED_BITS;
    pub const Release: Ordering = Ordering::RELEASE_BIT | Relaxed;
    pub const SeqCst : Ordering = Ordering::SEQCST_BIT  | Release;
}

// ...and same for read-modify-write ops
pub mod rmw {
    use super::{Ordering, load, store};

    pub const Relaxed: Ordering = load::Relaxed | store::Relaxed;
    pub const AcqRel : Ordering = load::Acquire | store::Release;
    pub const AcqRlx : Ordering = load::Acquire | store::Relaxed;
    pub const RlxRel : Ordering = load::Relaxed | store::Release;
    pub const SeqCst : Ordering = load::SeqCst  | store::SeqCst;
}

// Finally, we can provide backcompat with atomic::Ordering, while
// recommending the newer variants where they clarify intent
impl Ordering {
    pub const Relaxed: Ordering = Ordering::RELAXED_BITS;

    #[deprecated(note = "Please use load::Acquire or rmw::AcqRlx")]
    pub const Acquire: Ordering = load::Acquire;

    #[deprecated(note = "Please use store::Release or rmw::RlxRel")]
    pub const Release: Ordering = store::Release;
    
    pub const AcqRel : Ordering = rmw::AcqRel;

    // We must initially allow this bad bit pattern for backcompat
    //
    // In the long long run (like, future edition), we can remove it and have
    // load, stores and rmws reject it as the ambiguous mess that it is.
    //
    #[deprecated(note = "Please use load::SeqCst, store::SeqCst, or rmw::SeqCst")]
    pub const SeqCst: Ordering = Ordering::SEQCST_BIT;
}

That's all for now, I'll try to come back to this idea once I have more time to dedicate to it.

In particular, one interesting question is whether we can add stronger typing in the form of dedicated LoadOrdering, StoreOrdering and RmwOrdering types, so that passing load::Acquire to a store is rejected at compile time, without breaking backcompat with the current unified Ordering API. I suspect that this is too much to ask, though.

RalfJung · October 19, 2019, 10:50am

Hm, I'll have to mull over this but bitfields seem like an odd choice to me.

I also still strongly reject to your characterization of SeqCst as an "ambiguous mess", and do not agree we should deprecate it. I quite clearly says "do the strongest thing possible for this operation", and there is nothing ambiguous about that IMO.

HadrienG · October 19, 2019, 12:04pm

To summarize the long posts above, my problem with the "do the strongest thing" interpretation is that the strongest thing depends on the operation being carried out, which means that e.g. load(SeqCst) is not the same as fetch_add(0, SeqCst).

I think that memory barriers specifications should be unambiguous and understandable on their own, without extra operation context. Otherwise, we can't detect and report situations where people have unrealistic barrier expectations, such as the infamous load(Release) which every seqlock author has desired someday.

RalfJung · October 19, 2019, 1:38pm

This is a weird criticism. load(Relaxed) is also not the same as fetch_add(0, Relaxed)... because one is a load and one a fetch_add. I know that's now what you meant but I still don't see why one would expect any symmetry here.

The fact that loads read-from stores introduces an inherent asymmetry, to the extend that I find it not helpful to even try and understand memory orderings without considering the operation on which they are applied. Any (graph-based) formal definition of weak memory makes heavy used of the reads-from relation -- so it is inherently impossible to understand them without considering the "operation context".

Moreover, even if you manage to redundantly re-encode the fact whether we are talking about a load or a store into the ordering, you still need more context: an acquire-load only syncs-with a release-store if they are to the same location! So, the "context" certainly matters, not just the ordering itself. (You spoke about "barriers", which is odd. Release/acquire operations on a location can not be explained by barriers or fences.)

And finally, even if we go through with your proposal, the "operation context" still matters: RMW operations are inherently more powerful than loads and/or stores, and their effects cannot be described by just considering the orderings involved. For example, you can effectively implement SC fences by picking a single global location, and doing a release-acquire RMW operation on that location instead of a fence. (This is assuming SC fences are the only SC operation. When adding SC accesses to the mix, things get subtle and we don't really have a clear understanding of all the interactions.) As another example, even relaxed RMWs are totally ordered on the same location and guarantee to "observe" the latest value, unlike relaxed or even acquire loads.

system · January 17, 2020, 1:38pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Why does Rust not permit CAS operations with Release/Acquire ordering?	8	1001	July 2, 2020
New target support compiler	9	1104	March 25, 2019
Trying to understand correctness of synchronization primitives libs	4	734	January 25, 2021
`unordered` as a solution to "Bit-wise reasoning for atomic accesses" Unsafe Code Guidelines	24	3541	December 22, 2024
Async and the Rust memory model language design	4	976	June 3, 2023

Idea: Replace SeqCst with SeqCst(Acq|Rel|AcqRel)

Related topics