Type-safe atomics usage

With atomic::AtomicUsize and other atomics I have seen a lot of cases of packing extra information into the value: using a sentinel value (0, -1, MAX, null) to indicate that a static value is not yet initialized, reserving certain bits as status fields, recreating enums with const UNINIT = 0, READY = 1, WAITING = 2 etc. This got me thinking about if there are ways to express these use cases with the type system, making these approaches more robust.

One particular case where I think we could provide better ergonomics is the case where you effectively want to treat your atomic like an Option<NonZero*> or Option<NonNull>. Especially if you could do things equivalent to for example Option::or_with; load a value and if it is 0/null update it with a closure (I have seen a lot of ad-hoc implementations of this).

I don't have a concrete proposal yet, but I'm interested to hear peoples ideas on this. Is this something that is recognized by others?

Ideally, there would even be a fully AtomicCell<T>, for the cases where T can be backed by an AtomicU…. The first naïve idea that comes to mind, is to use the size of T, and "generically" bound it by the size of the biggest AtomicU… available on that platform.

  • This can be done in nightly through const_evaluatable_checked.

Alas, this is not enough: indeed, the semantic of atomic operations are not well defined for uninitialized bytes / padding bytes. This means that a type such as (u8, u16), which fits in an AtomicU32, has the issue that performing an AtomicU32-based read there would be reading the padding byte (wherever it may be), operation that has no defined behavior.

So, we end up down the rabbit hole of FromBytes/AsBytes (on top of size-bounding), e.g., a bunch of unsafe marker traits (ideally, autogenerated, see the safe-transmute working group / effor; else manually implemented).

The whole thing is fuzzy enough not to be implemented in the standard library yet, but I could imagine a library crate such as the unsound AtomicCell of ::crossbeam but with the added addition of putting the burden of safety onto users of the library through such a marker trait:

#![feature(const_generics)]

mod lib {
    use ::std::sync::atomic::{*, self};
    
    /// # Safety
    ///     - it must be sound to transmute back and forth between `Self` and `u32` 
    pub
    unsafe trait U32Compat
    :
        Copy +
        // ::zerocopy::AsBytes +
        // ::zerocopy::FromBytes +
    {}
    
    unsafe
    impl U32Compat for u32 {}
    
    pub use ::core;
    pub use ::zerocopy;
    
    #[macro_export]
    macro_rules! safe_impl {(
        $(@for[$($generics:tt)*])?
        $T:ty $(,)?
    ) => (
        // Check the required properties
        const _: () = {
            fn __checks <T> () where
                T : $crate::zerocopy::AsBytes,
                T : $crate::zerocopy::FromBytes,
            {}
            fn __forall<$($($generics)*)?> ()
            {
                let _ = __checks::<$T>;
                let _ = $crate::core::mem::transmute::<$T, $crate::core::primitive::u32>;
            }
        };
        unsafe
        impl<$($($generics)*)?> U32Compat for $T {}
    )}
    
    fn transmute<Src, Dst> (it: Src)
      -> Dst
    where
        Src : U32Compat,
        Dst : U32Compat,
    {
        unsafe {
            // Safety: Src -> u32 -> Dst is safe per `U32Compat` contract
            //         in practice we just cut the middleman.
            ::core::mem::transmute_copy(&it)
        }
    }
    
    #[repr(transparent)]
    pub
    struct AtomicU32Cell<T : U32Compat>(
        AtomicU32,
        PhantomData<T>,
    );
    
    impl<T : U32Compat> AtomicU32Cell<T> {
        fn new (value: T)
          -> AtomicU32Cell<T>
        {
            Self(transmute(value), Default::default())
        }
        
        fn load (
            self: &'_ AtomicU32Cell<T>,
            ordering: atomic::Ordering,
        ) -> T
        {
            transmute(self.0.load(ordering))
        }
        
        fn store (
            self: &'_ AtomicU32Cell<T>,
            value: T,
            ordering: atomic::Ordering,
        )
        {
            self.0.store(transmute(value), ordering)
        }
    }
}

and so on for the other atomic sizes.

For enums you can have a wrapper around them that internally stores AtomicIsize for example

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=ab1b7cc3250c003e33f6cc2a00d99379

I think a better strategy here is to try to get something into the language for easily converting back to a C-like enum from its discriminant value -- that gives you a safe way to use AtomicUsize or whatever.

I would generally object to having atomic anything that isn't a machine word or similar; std::atomic's behavior on non-trivial types is generally regarded as a mistake in C++. It may also be tempting to do RMW operations that can't lower to something like lock xadd or amoadd or whatever; either you'll get it wrong (likely) or write a comparatively inefficient CAS loop that the compiler won't know to optimize into a single RMW intrinsic (less likely).

Atomics should be very thin wrappers over machine instructions (or CAS loop polyfills); more than that is asking for trouble, and you probably don't need it for performance; futex locks are almost always good enough except in tightly constrained scenarios, such as when building primitives like locks, condvars, and counters, or when there is a clear macrobenchmark need.

1 Like

I happen to have such a something right here!

1 Like

C++ solved the problem in C++20 by designating that padding bytes are ignored by things like atomic_compare_exchange. A reasonable implementation would be to zero the padding bytes being stored/compared. With assistance from the compiler, rust could do the same thing. (Edit: Though this wouldn't work nicely with transmuting &mut T to &AtomicCell<T>/AtomicCell<T>::from_mut, though I presume that wouldn't work anyways, since it would have to round up to a valid integer type size)

1 Like

This is not entirely true, reading padding bytes can be totally fine -- it depends on the type you are doing it at. Turning them into u32 is the problem (which could be avoided by using MaybeUninit<u32> instead). Comparing them with other data (compare_exchange) is even more of a problem.

AtomicCell actually exists, as part of crossbeam, but it has all these issues.

I would not say they "solved" it, I'd say they attempted to solve it. I have no idea how to turn the C++ semantics into a formal operational model of what happens. In particular note that they have a really wacky wording regarding unions, which I cannot make much sense of.

Also, how are compilers supposed to lower compare_exchange on types like (u16, u8)? There is no hardware instruction that will compare bytes 0, 1, 2 but ignore byte 3.

As you noted, that is not possible, at least not in an API like this, because of get_mut -- note that from_mut is "easy", one can make that function zero all the padding. get_mut is the killer because now the API/compiler does not control all the stores that happen to this location (any write to a &mut anywhere could be a write to an atomic location).

Also "zero the padding" is not possible for unions because there is no way to even figure out which bytes are padding and which are not. That's probably why the C++ spec basically gives up on unions here...

3 Likes

Making a general and generic Atomic type is impossible except on hardware without atomic size restrictions. Not everything, that can be done in a single-threaded context can be unambiguously translated into code designed for a multi-threaded context.

  • Are you dealing with types, that fit directly into the largest available atomic?
  • Is the type Copy?
  • How many readers and writers?
    • How many threads can access the atomic "simultaneously"?
  • Is a (conditional) lock-free design enough or do you need theoretical wait-freedom?

Crossbeam's AtomicCell is just one of many solutions designed for a specific use-case. I also experimented with my own AtomicCell for a different use-case: It allows arbitrary-sized Copy types, for up to 127 concurrent accesses.

Like I said, std::atomic<WeirdType> was a mistake. =P

1 Like

Sure, I did not phrase that part well enough: I meant atomically reading (/storing/swapping) them. Like, is there a compiler backend that can do that? Maybe if we forego all the CAS ones, we could envision having, at least, intrinsic::atomic_{load,store,swap} over MaybeUninit<uN> storage? That's the key thing I am assuming we are missing, but maybe I am wrong and we aren't.

Ah, I see. So for LLVM this boils down to the question of what happens when you atomic_load an i32 of which some bytes are padding. Assuming padding is poison (which AFAIK is the plan), this depends on what "partially posion" loads do -- to my knowledge, the latest plan here is to make the "explode" to a full poison value. We'd have to do the load at type <i8 x 4> to get byte-level poison granularity. Is that even possible for atomic loads?

crossbeam's AtomicCell falls back to locking (with a set of seqlocks) when the hardware does not support the given size. So, providing the API consistently for all types and platforms is not impossible, but the performance characteristics differ. (This is somewhat similar to thread_local!.)

2 Likes

There was some discussion in the safe transmute RFC about an Atomic<T> type.

I probably don't understand what the desired outcome is.

Why would (AtomicU8, AtomicU16) not suffice?

There is no way to atomicly update both fields of a (AtomicU8, AtomicU16) at the same time. It is possible for another thread to observe the first AtomicU8 field having been written to, but the second AtomicU16 one not yet.

This is why now people would use AtomicU32 and manually treat it as an (AtomicU8, AtomicU16)

1 Like

Ah I see.

Maybe I missed this suggestion but could we make it so that Atomic<T> requires that T have no padding?

Basically, this would make it safe to transmute AtomicUn to types with no padding and only other AtomicUn types?

For example:

AtomicU32 -> (AtomicU8, AtomicU8, Atomic16)

If that was replaced with a #[repr(C)] tuple struct, IIRC, this is already valid and sound. AtomicUn has the same layout as un. It also permits from_mut and get_mut (all of which limits the implementation from providing atomic operations on smaller types with larger types). Thus, #[repr(C)] struct AtomicList(AtomicU8,AtomicU8,AtomicU16); has the same size as AtomicU32, and the transmute will put the values into those fields according to the byte order,

I brought this kind of transformation, AtomicU32 -> (AtomicU8, AtomicU8, Atomic16), up in the UCG on zulip a while back: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-lang.2Fwg-unsafe-code-guidelines/topic/Converting.20.60.26AtomicFoo.60.20to.20.60.26.5BAtomicU8.3B.20N.5D.60.20and.20using.20it

I gather it's called "mixed size access", and is kind of difficult to specify, apparently, and even when specified stuff like providing SeqCst is impossible on weaker architectures.

That said, this kind of trick is used in the linux kernel's locking code (which admittedly is using a different memory model, and allows for some neat tricks in manual userspace (well, using futex or similar) Mutex implementations.

1 Like