Generic `Atomic<T>`

We've made NonZero generic instead of having multiple distinct NonZero* types. We could do a similar thing to unify the Atomic* types. I drafted out what that would look like; does this seem like a reasonable API shape for std to refactor towards? [playground]

trait Sealed {}

#[allow(private_bounds)]
#[unstable(feature = "atomic_internals", issue = "none")]
pub unsafe trait AtomicPrimitive: Sized + Copy + Sealed {
    type AlignedForAtomic;
}

macro impl_atomic_primitive(
    $Inner:ident $(<$T:ident>)? ($Primitive:ty),
    size($size:literal),
    align($align:literal) $(,)?
) {
    impl $(<$T>)? Sealed for $Primitive {}

    #[cfg(target_has_atomic_load_store = $size)]
    const _: () = {
        unsafe impl $(<$T>)? AtomicPrimitive for $Primitive {
            #[cfg(target_has_atomic_equal_alignment = $size)]
            type AlignedForAtomic = $Primitive;
            #[cfg(not(target_has_atomic_equal_alignment = $size))]
            type AlignedForAtomic = $Inner $(<$T>)?;
        }

        #[repr(C, align($align))]
        #[cfg(not(target_has_atomic_equal_alignment = $size))]
        pub struct $Inner $(<$T>)? ($Primitive);
    };
}

impl_atomic_primitive!(AlignedBool(bool), size("8"), align(1));
impl_atomic_primitive!(AlignedI32(i32), size("32"), align(4));
impl_atomic_primitive!(AlignedU32(u32), size("32"), align(4));
// ... snip ...
#[cfg(target_pointer_width = "64")]
impl_atomic_primitive!(AlignedPtr<T>(*mut T), size("ptr"), align(8));

#[repr(C)]
pub struct Atomic<T: AtomicPrimitive>(UnsafeCell<T::AlignedForAtomic>);

macro impl_atomic_fmt($Trait:ident) {
    impl<T> fmt::$Trait for Atomic<T>
    where T: AtomicPrimitive + fmt::$Trait;
}

impl_atomic_fmt!(Debug);
impl_atomic_fmt!(Pointer);

macro impl_atomic_auto_trait {
    (unsafe $Trait:ident) => {
        unsafe impl<T> $Trait for Atomic<T> where T: AtomicPrimitive {}
    },
    ($Trait:ident) => {
        impl<T> $Trait for Atomic<T> where T: AtomicPrimitive {}
    },
}

// Implement auto-traits manually based on `T` to avoid docs showing
// the `AtomicPrimitive::AlignedForAtomic` implementation detail.
impl_atomic_auto_trait!(unsafe Send);
impl_atomic_auto_trait!(unsafe Sync);
impl_atomic_auto_trait!(Unpin);
impl_atomic_auto_trait!(UnwindSafe);
impl_atomic_auto_trait!(RefUnwindSafe);

impl<T> Atomic<T>
where
    T: AtomicPrimitive,
{
    pub const fn new(val: T) -> Self;
    pub const fn into_inner(self) -> T;

    pub const fn as_ptr(&self) -> *mut T;
    pub const unsafe fn from_ptr<'a>(ptr: *mut T) -> &'a Self;

    pub fn get_mut(&mut self) -> &mut T;
    pub fn get_mut_slice(this: &mut [Self]) -> &mut [T];

    pub fn load(&self, order: Ordering) -> T;
    pub fn store(&self, val: T, order: Ordering);
}

impl<T> Atomic<T>
where
    T: AtomicPrimitive<AlignedForAtomic = T>,
{
    pub fn from_mut(val: &mut T) -> &mut Self;
    pub fn from_mut_slice(val: &mut [T]) -> &mut [Self];
}

macro atomic_int($Atom:ident($Int:ty), size($size:literal)) {
    #[cfg(target_has_atomic_load_store = $size)]
    pub type $Atom = Atomic<$Int>;

    #[cfg(target_has_atomic = $size)]
    impl $Atom {
        pub fn swap(&self, val: $Int, order: Ordering) -> $Int;

        pub fn fetch_add(&self, val: $Int, order: Ordering) -> $Int;
        pub fn fetch_sub(&self, val: $Int, order: Ordering) -> $Int;

        // et al
    }
}

atomic_int!(AtomicI32(i32), size("32"));
atomic_int!(AtomicU32(u32), size("32"));

#[cfg(target_has_atomic_load_store = "8")]
pub type AtomicBool = Atomic<bool>;

#[cfg(target_has_atomic = "8")]
impl AtomicBool {
    pub fn swap(&self, val: bool, order: Ordering) -> bool;

    // et al
}

#[cfg(target_has_atomic_load_store = "ptr")]
pub type AtomicPtr<T> = Atomic<*mut T>;

#[cfg(target_has_atomic = "ptr")]
impl<T> AtomicPtr<T> {
    pub fn swap(&self, val: *mut T, order: Ordering) -> *mut T;

    pub fn fetch_ptr_add(&self, val: usize, order: Ordering) -> *mut T;
    pub fn fetch_ptr_sub(&self, val: usize, order: Ordering) -> *mut T;

    pub fn fetch_byte_add(&self, val: usize, order: Ordering) -> *mut T;
    pub fn fetch_byte_sub(&self, val: usize, order: Ordering) -> *mut T;

    // et al
}

I think this sketch both moves everything onto the generic type that's fully just duplicated between all of the Atomic* types. fetch_update might be possible to provide only once on the generic, but I wasn't able to figure out a good way to both keep it gated like it is today and delegate to a non-generic compare_exchange (required due to AtomicBool being a special case).

15 Likes

Comparison: the Swift folks have spent a while hammering out what they think the relevant traits/protocols are, though the final form is only just now going to be released in Swift 6.0:

12 Likes

Since you brought up NonZero: How do/would/should the combination of them work?

type A = NonZero<Atomic<u32>>; // Wrong/Problematic?
type B = Atomic<NonZero<u32>>;

Unless I'm mistaken your example does not allow Atomic<NonZero<u32>> (unless they are explicitly listed/added), while the current ZeroablePrimitives wouldn't allow NonZero<Atomic<u32>> (which would also be wrong, unless UnsafeCell cannot be 0x00).

Granted, I'm not sure if an atomic, non-zero integer is useful (besides having the compiler ensure it is not zero), but something like Atomic<Option<NonZero<u32>>> could be and would (as far as I can tell) be sound.

2 Likes

If this API were adopted and merged, would the intent be to expose it as a public API in stdlib? Or is it intended more as a nice and debugged way to maintain consistency between the implementations of various Atomic* types i.e. is it intended primarily as an implementation detail within the bowels of stdlib?

A relevant opsem issue is Packing pointers into double-word width atomics · Issue #517 · rust-lang/unsafe-code-guidelines · GitHub

TLDR: some people want Atomic<(u64, u64)> (or pointers instead of integers, for provenance). Some platforms (such as x86-64) has double width compare and exchange. That led to a discussion on the general case of tuple atomics. Mixed size atomic access are also related and relevant.

I would like to see a design that doesn't preclude such extensions in the future (fancy concurrent algorithms are not just exiting and cool, but quite useful in low level systems programming). I would personally be fine with not supporting the case where there is padding in the tuple, but other people want that as well.

10 Likes

Ideally it would be possible to use any type small enough to fit the atomic compare and exchange of the target platform. Though I don't think we can currently say "generic over all types where size <= 64" (or other sizes).

Regarding Atomic<(u64, u64)>: Are you talking about something that compares and exchanges both u64 at once or a way to compare and exchange them individually (as if it was a (Atomic<u64>, Atomic<u64>))? I assume you mean the former.

4 Likes

What would be useful is an Atomic<(u64, u64)> where the compiler guaranteed to use the platform equivalent of x86-64's CMPXCHG16B that does both compares and exchanges in a single atomic operation - this should be equivalent to Atomic<u128>, but with the compiler packing 2x u64 into a u128 for you. Similar applies to Atomic<(usize, usize)> on platforms where there's support for a double-usize atomic access.

The latter (Atomic<(u64, u64)> is the same as (Atomic<u64>, Atomic<u64>) doesn't seem particularly useful, because it's trivial to construct given Atomic<u64> exists.

5 Likes

It probably needs to be Copy also. Plus padding being LLVM poison causes issues.

The former (see gh issue that I linked). The other doesn't seem useful at all.

1 Like

The real issue arises when you have a pair of pointers. If you roundtrip via integers you loose provenance. With pointers it is no longer a question of convenience but about actual missing functionality.

7 Likes

The ideal end-state for me would be that if Atomic<T> exists and Atomic<TwoTs> exists such that core::mem::size_of::<TwoTs>() >= (core::mem::size_of<T>() * 2), then we also implement at least the load, store, and compare-exchange functionality for Atomic<(T, T)>.

That way, if pointers are 32-bit, and the platform supports both Atomic<&mut T> and Atomic<u64>, we also get atomic compare-exchange for (&mut T, &mut T) (plus other pointer types), and the same for 64-bit pointers and Atomic<u128>. But if the largest atomic on a platform is Atomic<u32>, and pointers are 32-bits or larger, you don't get Atomic<(*mut T, *mut T)> or Atomic<(&mut T, &mut T)>, because the pre-requisites aren't met.

Now, there are almost certainly compiler implications for this I've not thought about - especially in the area of pointer provenance - that make this hard to do well, but having all possible pairs of atomics for at least compare-exchange, load and store as well as for single items would be useful for some algorithms.

5 Likes

Can we generalize it a tiny bit more, and not require the two types to be the same? I have personally written locking primitives (in x86 assembly language :-P) that used cmpxchg16b on, in Rust terms, (&mut WaiterChain, u64).

Atomic<(T, U)> where T and U are both primitive and size_of::<T>() + size_of<U>() <= size_of::<BiggestSupportedAtomic>() shouldn't be notably harder than Atomic<(T, T)> as you describe.

2 Likes

How does that interact with pointer provenance? It's technically no harder than what I've described, but you're also casting provenance around in the process, which I suspect is a bigger ask than you might expect once you take Strict Provenance into account.

As far as pointer provenance is concerned, we're "just" copying (&T, usize) values around, aren't we? The fact that this particular copy is tagged to emit a hardware atomic operation instead of an ordinary one should not affect the pointer-ness of the &T field of the tuple.

2 Likes

Good point - I managed to confuse myself into thinking you meant being able to swap (&T, usize) with (usize, &T), not swapping two (&T, usize) values over.

If they are not same size you immediately run into issues with padding. How does compare and exchange work when there is padding? Currently not very well! Since according to LLVM padding is poision (completely undefined, not any of the well known 255 values a byte can take). It allows it to assume those byte can be anything for the purpose optimisation, and it doesn't need to be consistent.

2 Likes

Maybe this is a bit crazy, but couldn't we have a way to zero out padding bytes when needed so that they are no longer uninitialized?

In particular Atomic::from_mut could zero out padding bytes and unsafely transmuting from *mut T/&mut T to &Atomic<T> would be considered unsound if T's padding bytes (if any) are not zeroed out. Any method to that writes to an Atomic (e.g. Atomic::set/Atomic::swap/Atomic::compare_exchange) would also zero out padding bytes before doing so. This would guarantee that any Atomic<T> has the same value for the padding bytes and thus compare_exchange would work correctly.

FYI a byte can take 256 values (257 if we count undefined!)

3 Likes

Freeze (as in the LLVM sense, not Freeze the unstable trait) was discussed in the opsem issue I linked above. I don't know how viable it would be to say that padding in atomics (and in values being compared with atomics) has to be zeroed. This is really a question for the opsem and compiler people I think.

My bad, it's been a long day. My point was that poision / undef are additional values in the LLVM abstract machine.

1 Like

This is intended to be like NonZero is — consolidate the docs some and enable use with generic syntax, e.g. with the FFI integer aliases.

This shouldn't. AtomicPrimitive is intended to be perma-unstable, like ZeroablePrimitive is for NonZero.

At the least, it would need to block access to Atomic::get_mut.

The real “fun” is pointer provenance versus the ABA problem.

1 Like

What about Atomic<Enum>?

This is probably out-of-topic with the original proposition of this thread, but it's sometimes needed to store a finite state in an atomic value. The current way of doing this is using an AtomicUsize and some const values. The advantage with enums is that it's impossible to set an invalid value. Plain-old-data enums (enums with no fields) are basically an integer with some values forbidden. So just like AtomicBool is just a fancy AtomicU8, it should be technically possible to have an enum marked with a repr(N) stored internally as an Atomic<N>.

9 Likes

For the use case I'm thinking of -- implementing a mutex, double-width CAS used to increment a waiter count and update a "tail of the wait queue" pointer as a single atomic operation -- it would be fine if the two types were required to be the same size, and indeed for them to be required to be exactly half of the size of a larger type for which a hardware atomic operation exists. I imagine people would eventually want to be able to do more, but the feature (of allowing the types not to be the same) is still useful with these restrictions.

Concretely, assuming hardware compare-and-swap for 32, 64, and 128 bit quantities, and usize <= 64 bits, support for Atomic<T> for T = (u16, s16) and (u32, s32) and (u64, s64) and (&U, usize), but not (u32, u16) or (u8, s8), would be sufficient for me.

2 Likes