Pre-RFC: Pinned synchronization primitives

Summary

Create a new module std::sync::pinned for OS synchronization primitives which make use of the Pin dialect instead of relying on boxing.

Motivation

The current synchronization primitives at std::sync box the OS primitives for some OSes, including all Posix-compliant. This is suboptimal as it increases the possibility of cache misses, and adds overhead to creation of these primitives.

With the new Pin type, we can expose the primitives directly and have the functions simply take self: Pin<&Self>.

This gives more flexibility as it is possible to have, for example, Arc<Mutex<T>> without double-boxing the Mutex.

Guide-level explanation

For the following types:

  • Barrier

  • Condvar

  • Mutex<T>

  • RwLock<T>

Pinned versions are available at the module std::sync::pinned. These versions are all !Unpin.

In all methods of these pinned versions a self: Pin<&Self> argument is taken, therefore the type must be constructed with, for example, Arc::pin or Box::pin.

In order to initialize a pinned::Mutex, for example, one of these can be used:

// Directly initialize using ergonomic constructors
let boxed_mutex: Pin<Box<Mutex<T>>> = Mutex::boxed(...);
let arc_mutex: Pin<Arc<Mutex<T>>> = Mutex::arc(...);

// Explicitly initialize after creation with a custom constructor
// Nothing here is unsafe, the library checks for initialization
let mut mutex: Pin<Box<Mutex<T>>> = Box::pin(Mutex::uninit(...));
mutex.as_mut().init();

Reference-level explanation

The std sys_common already has a pretty good infrastructure for implementing this.

  • The explicit initialization is done by first creating the structure using uninit(value: T) -> Self, then calling init(self: Pin<&mut Self>).
  • The library adds the necessary assertions to make this pattern safe. Using before init causes a panic!. In particular, the implementation can use an Option to wrap the OS primitive. The assertions can be placed such that they are inlined, so if one method is used after another, the assertion can be optimized away for the second method.

Drawbacks

  • Adds further complexity to the std library.

Rationale and alternatives

  • Alternative for the explicit initialization: have unsafe uninit(). This can remove assertion overhead but can lead to some hard to detect undefined behaviour, as in POSIX - for example - this would cause mutexes to work as normal until a double lock happens, which would cause undefined behaviour.
  • Replace the primitives with parking_lot. This has been proposed in the past and has its fair share of drawbacks, such as non-trivial space overhead which can be a deal breaker for some applications.

Prior art

Unresolved questions

  • Should Once be moved to pinned for consistency? The current implementation does not box any primitives, however if we were to use pthread_once_t in the future, it would be necessary.

Future possibilities

  • Lints for double boxing (suggesting Pin<Arc<pinned::Mutex<T>>> when using Arc<sync::Mutex<T>>).
3 Likes

How do you intend for Mutex::new(T) to work, or is the intent to only offer Mutex::new_boxed(T), Mutex::new_arc(T), etc?

FWIW, as additional prior art you may be interested in the sync package we've created for the Linux kernel: linux/mod.rs at rust · Rust-for-Linux/linux · GitHub All the locks use Pin, but we haven't found a better solution than making new() unsafe and requiring you to init once you have the lock in the pinned location.

It could use PTHREAD_MUTEX_INITIALIZER, which is just a constant bit pattern and thus necessarily safe to move around before any operations are performed on it.

If/when parking_lot becomes the basis for the synchronization primitives, it will be unnecessary to implement this proposal as parking_lot natively allows moving of all it's types.

Other possible alternatives:

  • Lobby OS vendors (somehow) to guarantee that these primitives are safe to move when not in use (which is of course the only circumstance where Rust would allow moving). After all, this is almost always true in practice. A mutex may have pointers to it in some global data structure while it has waiters, or even when it's locked without waiters, but usually not when it's just sitting there unlocked. As far as I can tell, Windows already guarantees the relevant primitives (or at least some of them) are save to move, e.g. SRWLOCK.

  • Have std switch away from OS native primitives entirely in favor of parking_lot primitives, which don't have this problem and are often more efficient in other ways. This has been proposed and partially implemented before, but has stalled.

    Though, parking_lot primitives have disadvantages such as not supporting priority inheritance.

2 Likes

We discussed exactly this interface in Pin + Arc + Mutex - #5 by burdges - The Rust Programming Language Forum It could be implemented outside std using nightly with #![feature(libstd_sys_internals)] btw.

We cannot re-implement the old types as Pin<Box<T>> with T being new types. There exist platforms where Mutex involves no allocation. See mutex.rs - source and mutex.rs - source

I've no idea why RwLock does not benefit from this optimization yet rwlock.rs - source and maybe Condvar does so. I'd think Barrier should remain implemented in terms of these so that it benefits.


I'd assume the system primitives should always be exposed, even if parking_lot becomes the default. I like the std::sync::pinned module name, maybe the current primitives should live in std::sync::sys, parking_lot should become core::sync::spin, and core::sync::.. should be type aliases, for which the default can later be changed.

1 Like

Thanks for all the responses!

PTHREAD_MUTEX_INITIALIZER sadly can not be used, as locking it twice causes undefined behaviour.

I really forgot about this when writing. I will suggest having a Mutex::uninit constructor and initializing it using init(self: Pin<&mut Self>). Using an uninitialized Mutex should not cause undefined behaviour, just panic. Hopefully on sequential operations the checks for initialization can be optimized away after the first check.

My understanding is that parking_lot has been rejected for standard library because it has non-trivial overhead memory-wise (the static hashmap). Also linking directly to system libraries has some benefits: it the system is updated to use more efficient syscalls, the program can benefit from it without needing to be rebuilt and check for support to the new features of the system.

That said, if this ever becomes the case, the new module could simply be deprecated, or just delegate to the sync types.

(editted) I just missed the:

An unlocked SRW lock with no waiting threads is in its initial state and can be copied, moved, and forgotten

I'm pretty sure this is new and was not documented before, and that's why the std did not use this in the past. I could be wrong tho.

Coincidentally a PR with that optimization of RwLock was submitted yesterday: #84687

std uses the fact that SRW locks can be moved: on Windows the movable mutex is not boxed:

2 Likes

My mistake, I've corrected the post

I have opened the RFC: Pinned synchronization primitives by nicbn · Pull Request #3124 · rust-lang/rfcs · GitHub

What you're describing is essentially Option::unwrap. As for optimizing away operations, that'd be difficult (impossible?) to guarantee, unless you expose the equivalent of Option::unwrap, which returns a guaranteed initialized Mutex.

Yes, it is not the goal to guarantee the optimization, I'm just pointing out that it could be optimized away if the compiler were smart enough to understand that the flag isn't changing - which could be guaranteed because the methods are all taking shared references in succession, without any mutable references being taken inbetween.

That said, I could not create a snippet where this actually works and gets elided. See: Compiler Explorer

In regards to performance, these checks should be optimized at runtime by the branch predictor on most recent CPUs, however it would be way better if the compiler could optimize this away.

What exactly do you mean by "locking it twice"?

Do you mean initializing multiple mutexes sequentially at the same memory address? I just learned that in older versions of the POSIX spec, doing so would be UB because PTHREAD_MUTEX_INITIALIZER is only suitable for "statically allocated" mutexes. But that restriction was apparently lifted in later versions.

Do you mean locking, unlocking, moving, and locking again? That's UB, but that is why locking/unlocking operations would require Pin.

Neither of those is a likely interpretation of "locking it twice", but I have no idea what you mean otherwise. :slight_smile:

I meant locking it recursively, or "relocking".

This table here shows the behaviour for the different POSIX mutex types: The Open Group Base Specifications Issue 7, 2018 edition

The PTHREAD_MUTEX_INITIALIZER creates a mutex with default type, which may cause undefined behaviour (though it seems to be implementation specific, but we still can not rely on it). So we use pthread_mutex_init to create a mutex with normal type, which means that relocking causes a deadlock.

Ugh, I see. And while it appears that every major OS defines PTHREAD_MUTEX_DEFAULT as equal to one of the other constants, apparently glibc decides to be extra clever and actually take advantage of the UB, only if you don't set a type at all. Including if you use PTHREAD_MUTEX_INITIALIZER. So that's not an option.

Part of me wants to suggest using C++20 wait/wake as futex standins and implementing a custom mutex, but I don't actually like that option. There are benefits to being native.

1 Like