Pre-RFC: Pinned synchronization primitives

nicbn · April 29, 2021, 9:15pm

Summary

Create a new module std::sync::pinned for OS synchronization primitives which make use of the Pin dialect instead of relying on boxing.

Motivation

The current synchronization primitives at std::sync box the OS primitives for some OSes, including all Posix-compliant. This is suboptimal as it increases the possibility of cache misses, and adds overhead to creation of these primitives.

With the new Pin type, we can expose the primitives directly and have the functions simply take self: Pin<&Self>.

This gives more flexibility as it is possible to have, for example, Arc<Mutex<T>> without double-boxing the Mutex.

Guide-level explanation

For the following types:

Barrier
Condvar
Mutex<T>
RwLock<T>

Pinned versions are available at the module std::sync::pinned. These versions are all !Unpin.

In all methods of these pinned versions a self: Pin<&Self> argument is taken, therefore the type must be constructed with, for example, Arc::pin or Box::pin.

In order to initialize a pinned::Mutex, for example, one of these can be used:

// Directly initialize using ergonomic constructors
let boxed_mutex: Pin<Box<Mutex<T>>> = Mutex::boxed(...);
let arc_mutex: Pin<Arc<Mutex<T>>> = Mutex::arc(...);

// Explicitly initialize after creation with a custom constructor
// Nothing here is unsafe, the library checks for initialization
let mut mutex: Pin<Box<Mutex<T>>> = Box::pin(Mutex::uninit(...));
mutex.as_mut().init();

Reference-level explanation

The std sys_common already has a pretty good infrastructure for implementing this.

The explicit initialization is done by first creating the structure using uninit(value: T) -> Self, then calling init(self: Pin<&mut Self>).
The library adds the necessary assertions to make this pattern safe. Using before init causes a panic!. In particular, the implementation can use an Option to wrap the OS primitive. The assertions can be placed such that they are inlined, so if one method is used after another, the assertion can be optimized away for the second method.

Drawbacks

Adds further complexity to the std library.

Rationale and alternatives

Alternative for the explicit initialization: have unsafe uninit(). This can remove assertion overhead but can lead to some hard to detect undefined behaviour, as in POSIX - for example - this would cause mutexes to work as normal until a double lock happens, which would cause undefined behaviour.
Replace the primitives with parking_lot. This has been proposed in the past and has its fair share of drawbacks, such as non-trivial space overhead which can be a deal breaker for some applications.

Prior art

System languages such as C and C++ expose the OS primitives with thin wrappers, without the boxing our current implementation uses. This is possible without Pin as these languages have different move semantics, on which types are not moved by default.
Pin + Arc + Mutex - The Rust Programming Language Forum
sync package for the Linux kernel: linux/mod.rs at rust · Rust-for-Linux/linux · GitHub

Unresolved questions

Should Once be moved to pinned for consistency? The current implementation does not box any primitives, however if we were to use pthread_once_t in the future, it would be necessary.

Future possibilities

Lints for double boxing (suggesting Pin<Arc<pinned::Mutex<T>>> when using Arc<sync::Mutex<T>>).

alex · April 29, 2021, 9:30pm

How do you intend for Mutex::new(T) to work, or is the intent to only offer Mutex::new_boxed(T), Mutex::new_arc(T), etc?

FWIW, as additional prior art you may be interested in the sync package we've created for the Linux kernel: linux/mod.rs at rust · Rust-for-Linux/linux · GitHub All the locks use Pin, but we haven't found a better solution than making new() unsafe and requiring you to init once you have the lock in the pinned location.

comex · April 29, 2021, 9:34pm

It could use PTHREAD_MUTEX_INITIALIZER, which is just a constant bit pattern and thus necessarily safe to move around before any operations are performed on it.

bjorn3 · April 29, 2021, 9:39pm

If/when parking_lot becomes the basis for the synchronization primitives, it will be unnecessary to implement this proposal as parking_lot natively allows moving of all it's types.

comex · April 29, 2021, 9:44pm

Other possible alternatives:

Lobby OS vendors (somehow) to guarantee that these primitives are safe to move when not in use (which is of course the only circumstance where Rust would allow moving). After all, this is almost always true in practice. A mutex may have pointers to it in some global data structure while it has waiters, or even when it's locked without waiters, but usually not when it's just sitting there unlocked. As far as I can tell, Windows already guarantees the relevant primitives (or at least some of them) are save to move, e.g. SRWLOCK.
Have std switch away from OS native primitives entirely in favor of parking_lot primitives, which don't have this problem and are often more efficient in other ways. This has been proposed and partially implemented before, but has stalled.

Though, parking_lot primitives have disadvantages such as not supporting priority inheritance.

JeffBurdges · April 29, 2021, 10:55pm

We discussed exactly this interface in Pin + Arc + Mutex - #5 by burdges - The Rust Programming Language Forum It could be implemented outside std using nightly with #![feature(libstd_sys_internals)] btw.

We cannot re-implement the old types as Pin<Box<T>> with T being new types. There exist platforms where Mutex involves no allocation. See mutex.rs - source and mutex.rs - source

I've no idea why RwLock does not benefit from this optimization yet rwlock.rs - source and maybe Condvar does so. I'd think Barrier should remain implemented in terms of these so that it benefits.

I'd assume the system primitives should always be exposed, even if parking_lot becomes the default. I like the std::sync::pinned module name, maybe the current primitives should live in std::sync::sys, parking_lot should become core::sync::spin, and core::sync::.. should be type aliases, for which the default can later be changed.

nicbn · April 29, 2021, 11:04pm

Thanks for all the responses!

PTHREAD_MUTEX_INITIALIZER sadly can not be used, as locking it twice causes undefined behaviour.

I really forgot about this when writing. I will suggest having a Mutex::uninit constructor and initializing it using init(self: Pin<&mut Self>). Using an uninitialized Mutex should not cause undefined behaviour, just panic. Hopefully on sequential operations the checks for initialization can be optimized away after the first check.

My understanding is that parking_lot has been rejected for standard library because it has non-trivial overhead memory-wise (the static hashmap). Also linking directly to system libraries has some benefits: it the system is updated to use more efficient syscalls, the program can benefit from it without needing to be rebuilt and check for support to the new features of the system.

That said, if this ever becomes the case, the new module could simply be deprecated, or just delegate to the sync types.

(editted) I just missed the:

An unlocked SRW lock with no waiting threads is in its initial state and can be copied, moved, and forgotten

I'm pretty sure this is new and was not documented before, and that's why the std did not use this in the past. I could be wrong tho.

CDirkx · April 30, 2021, 9:13am

Coincidentally a PR with that optimization of RwLock was submitted yesterday: #84687

CDirkx · April 30, 2021, 9:18am

std uses the fact that SRW locks can be moved: on Windows the movable mutex is not boxed:

github.com

rust-lang/rust/blob/a45f0d724ed7116ed475fb0d628bc118418e7cbf/library/std/src/sys/windows/mutex.rs#L25-L26


// Windows SRW Locks are movable (while not borrowed).
pub type MovableMutex = Mutex;

nicbn · April 30, 2021, 12:05pm

My mistake, I've corrected the post

nicbn · May 14, 2021, 3:13pm

I have opened the RFC: Pinned synchronization primitives by nicbn · Pull Request #3124 · rust-lang/rfcs · GitHub

phlopsi · May 14, 2021, 8:51pm

What you're describing is essentially Option::unwrap. As for optimizing away operations, that'd be difficult (impossible?) to guarantee, unless you expose the equivalent of Option::unwrap, which returns a guaranteed initialized Mutex.

nicbn · May 15, 2021, 2:46am

Yes, it is not the goal to guarantee the optimization, I'm just pointing out that it could be optimized away if the compiler were smart enough to understand that the flag isn't changing - which could be guaranteed because the methods are all taking shared references in succession, without any mutable references being taken inbetween.

That said, I could not create a snippet where this actually works and gets elided. See: Compiler Explorer

In regards to performance, these checks should be optimized at runtime by the branch predictor on most recent CPUs, however it would be way better if the compiler could optimize this away.

comex · May 15, 2021, 5:36pm

What exactly do you mean by "locking it twice"?

Do you mean initializing multiple mutexes sequentially at the same memory address? I just learned that in older versions of the POSIX spec, doing so would be UB because PTHREAD_MUTEX_INITIALIZER is only suitable for "statically allocated" mutexes. But that restriction was apparently lifted in later versions.

Do you mean locking, unlocking, moving, and locking again? That's UB, but that is why locking/unlocking operations would require Pin.

Neither of those is a likely interpretation of "locking it twice", but I have no idea what you mean otherwise.

nicbn · May 15, 2021, 7:12pm

I meant locking it recursively, or "relocking".

This table here shows the behaviour for the different POSIX mutex types: The Open Group Base Specifications Issue 7, 2018 edition

The PTHREAD_MUTEX_INITIALIZER creates a mutex with default type, which may cause undefined behaviour (though it seems to be implementation specific, but we still can not rely on it). So we use pthread_mutex_init to create a mutex with normal type, which means that relocking causes a deadlock.

comex · May 16, 2021, 2:22am

Ugh, I see. And while it appears that every major OS defines PTHREAD_MUTEX_DEFAULT as equal to one of the other constants, apparently glibc decides to be extra clever and actually take advantage of the UB, only if you don't set a type at all. Including if you use PTHREAD_MUTEX_INITIALIZER. So that's not an option.

Part of me wants to suggest using C++20 wait/wake as futex standins and implementing a custom mutex, but I don't actually like that option. There are benefits to being native.

system · August 14, 2021, 2:22am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Standard library, synchronization primitives, and undefined behavior libs	24	9960	April 7, 2019
[Blog] Safe pinned initialization language design	8	1810	January 4, 2023
Arc and Pin? libs	11	2710	April 29, 2022
A Formal Look at Pinning language design	25	8022	June 16, 2019
Async/Await series language design	30	8023	March 25, 2019