Non-Send Futures

steffahn · December 11, 2023, 12:35pm

I had given this general idea some thought myself after reading the Jane Street post about proposed OCaml features for thread-safety. They include an API they call “capsule”, which is analogous to the thread-safety boundary for memory that physical threads in Rust offers, but separate from physical threads.

This idea thus doesn’t just relate to Futures. It could also apply to ways of wrapping non-thread-safe data structures in a better way, and more. E.g. for Rust, if you have a graph of Rc pointers contained withing a data structure, that whole thing could safely be transferred between threads, as long as it's ensured that all contained Rc cross threads together.

My most minimal and initial idea of what a “capsule” type in Rust could look like (first and foremost for illustration, and ignoring most of the details and additional feature around how the thing is designed for OCaml in the Jane Street post) was

impl<T> Capsule<T> {
    fn new(init: impl FnOnce() -> T + Send) -> Capsule<T>;
    fn with<R: Send>(
        &mut self,
        cb: impl FnOnce(&mut T) -> R + Send
    ) -> R;
}

impl<T> Send for Capsule<T> // !!
impl<T> Sync for Capsule<T> // this one uninteresting with only `&mut self` API

And such a minimal API can be implemented safely, with some additional 'static bounds, using real threads: Rust Playground. The idea then is, of course, that it in principle could truly be implemented way more efficiently by just calling the callbacks in-place, and merely simulating the memory-boundaries of threads via the API.

Because of thread_locals as well as Mutex’s guard’s Drop implementation, I arrived at the same conclusion that there’s a big conflict with the current meaning of Send & Sync, and one would ideally want two separate kinds of Send/Sync which I thought to call CSend/CSend (for Capsule) and TSend/TSync (for Tthread), in each case T: …Sync just relates to the &T: …Send bound, so I’m focusing mostly on just …Send traits.

The Capsule type above would be written with CSend throughout

impl<T> Capsule<T> {
    fn new(init: impl FnOnce() -> T + CSend) -> Capsule<T>;
    fn with<R: CSend>(
        &mut self,
        cb: impl FnOnce(&mut T) -> R + CSend
    ) -> R;
}

impl<T> CSend for Capsule<T>
impl<T: TSend> TSend for Capsule<T>

the impl<T: TSend> TSend for Capsule<T> makes the whole thing sound in the presence of physical-thread-bound APIs like MutexGuard.

Migration strategies seem a huge pain… just pretending legacy Send (which I still call just “Send”) means the same as CSend makes a large amount of pre-existing code unsound. The sound way is to make old Send equal TSend + CSend (with a strong enough trait-synonym feature, or something equivalent, so that current explicit implementations of the trait would still work and result in the respective TSend + TSync implementation), and then allow new non-braking refinements to existing APIs.

Regarding the implementations, thus an existing one like

impl<T: Send> Send for Option<T> {}

would thus (at least if it were explicitly written) translate to

impl<T: Send> TSend for Option<T> {}
impl<T: Send> CSend for Option<T> {}

i.e.

impl<T: TSend + CSend> TSend for Option<T> {}
impl<T: TSend + CSend> CSend for Option<T> {}

and the desired new implementations

impl<T: TSend> TSend for Option<T> {}
impl<T: CSend> CSend for Option<T> {}

are no longer a breaking change.

As for some types’ respective bounds: A MutexGuard<i32> would be CSend + CSync + !TSend + TSync of course, and generically for MutexGuard<T>, the non-TSend ones would just qualify the same bound on T.

For Rc, one could simply deny CSend while still allowing TSend, as in

// analogous to current `Rc<T>: !Send`
impl<T> !CSend for Rc<T> {}
impl<T> !CSync for Rc<T> {}

// analogous to current `Arc` implemetation
impl<T: TSync + TSend> TSend for Rc<T> {} 
impl<T: TSync + TSend> TSync for Rc<T> {}

though now that I’m writing this, I’m wondering whether or not the TSend one could even be weakened to just T: TSend, since !CSync implies it cannot be referenced on multiple physical threads at once anyways.

Interestingly enough, as TSend and CSend stand for two orthogonal kinds of memory-boundary, one based on capsules and other thread-like boundaries, like tasks for futures; and one based on physical threads. I’m noticing that it’s also feasible to build … well let’s call the above a CSync-limited Rc … to build a TSync-limited version. (Take above implementations and swap the roles of T… with C… traits.) I’ve wondered if that’s necessarily only possible by fully duplicating the type and its API, or whether something similar can be achieved by wrapping a CSync-limited Rc somehow to make it - effectively - TSync-limited.

Another very fascinating take-away from the Jane Street blog (which takes a while to understand, given one has to learn their syntax first) is that when you end the life of a capsule, you can just have it “merge” with any other given context. I.e. in other words: While Rust’s threading API requires today that in

pub fn spawn<F, T>(f: F) -> JoinHandle<T>
where
    F: FnOnce() -> T + Send + 'static,
    T: Send + 'static,

the result of a thread of type T must be allowed to be sent between threads, if we forego the physical threads interpretation it doesn’t have to be the case. A joining thread can simply “inherit” the thread’s local memory, (if it wasn’t for thread_locals and the like). This means for CSend-ness taken into account, the new version could look like

pub fn spawn<F, T>(f: F) -> JoinHandle<T>
where
    F: FnOnce() -> T + TSend + CSend + 'static,
    T: TSend + 'static,

with no requirement of T: CSend. Ultimately, this has the effect that a thread can return some data structure containing Rcs to the thread it joins with, and that should work and be sound.

Of course now thread_local is also a large issue. It works without any trait bounds today, so it would not translate well without breakage. If we were to freshly design Rust today, one could simply say “thread-local data must be CSync”, similar to how global statics must be Sync today. That’s probably the “correct” change, but it would be breaking. Another thing I’ve considered and that might work is an effects system for access to thread_local:

Imagine Rust was single-threaded today, and we wanted to first introduce Sync and Send in the first place. Of course single-threaded Rust allows you to place any data in global variables. You cannot mutably borrow them (without a RefCell), but there’s no restrictions. Also, unsafe code commonly relies on the single-threaded nature of execution for soundness.

How to introduce Send/Sync and thread::spawn, anyways? Well… how did we introduce running Rust at compile-time, even though so many operations, including basic things like allocation, aren’t possible at compile-time? With effects, like const fn. (I call const an effect here, even though it’s arguably the taking-away of an effect; but I’ll stay consistent with const here and just call the additional keyword an “effect”.)

Unmarked functions (ordinary fn) still only run single-threaded. The analogy of single-thread here is “running in the main thread”. Any code that supports being executed in other threads than the main thread would need to be a thread fn. Like const fn, this comes with language restrictions. Notably, in a thread fn, you can no longer access the legacy version of global variables, and only the new ones that require T: Sync are available. The same issues as with const fn apply, we still need a solution for effects in traits (e.g. like T: const Eq or const FnOnce() bounds, etc… and const Destruct) but eventually, we can mark everything thread-safe with thread fn, and write thread::spawn as

pub thread fn spawn<F, T>(f: F) -> JoinHandle<T>
where
    F: thread FnOnce() -> T + Send + 'static + thread Destruct,
    T: Send + 'static + thread Destruct,

where A: thread Destruct means values of type A may be dropped outside of the main thread, and thread FnOnce() -> T can be called outside of the main thread.

Now introducing capsules could in principle work the same. A new effect capsule (naming TBD) would mark code that can be executed in the a context of a Capsule::with callback. It cannot access legacy thread_local, but only a new thread_local with a CSync constraint. (Whether or not there are ways to still have these two thread_local versions merged into one is something I’m not sure yet - probably yes.)

In either case, the obvious downside is that there’s a lot of code to mark then. This same problem applies with const already, it’s slow to mark everything const as such, and it’s manual effort. I believe, for safe code, there can be lints, or even automatic tools, to add const markers to functions that support it, but automation can only go so far, and in any case it would be a deliberate opt-in, also for future-compatibility (this argument of “what if I want to make my function do non-const stuff later!?” is a lot stronger for const, as that’s severely limited compared to non-const). For a capsule effect, even more code could support it than for const, so it would have the unfortunate effect that ultimately, probably most code would be marked with the effect. Editions could help making the opposite the default, but it’s still not an easy problem to solve.

Finally, some more interactions with existing code / other crates, besides the obvious case of thread_local and MutexGuard that currently rely on the “Sync/Send means physical thread” assumption.

Crates like send_wrapper (I believe there was at least one more similar one, too) offer an API that allows moving non-Send data between threads, and ensures safety by guarding access to the data with a check against the current thread’s id. In a world as outlined above, such a type could only remove TSend bounds with such dynamic checks, not CSend ones. With an effects system, the relevant API and all af its users would simply never gain the capsule fn marker.

This does demonstrate that even a breaking change to thread_local itself wouldn’t eliminate all problematic API; code can use ThreadIds and Send bounds manually to connect Send with a notion of physical threads.
On a positive note, the currently problematic Ungil of pyo3 could benefit significantly from changes like the introduction of separate CSend vs TSend. Instead of their current unsound solution to use Send (and in the presence of scoped thread-locals, perhaps even their nightly auto-trait approach isn’t sound), it could simply use CSend bounds and require the callback to be capsule FnOnce at the same time.

Topic		Replies	Views
Blog post: A formulation for scoped tasks language design	20	2802	July 31, 2023
'Type-safe multithreading in Cyclone' and Rust language design	4	2251	March 25, 2019
Pre-RFC async constructs (auto generated futures) should implement Sync unconditionally language design	8	303	October 6, 2024
How often do you want non-send futures? language design	117	8416	November 8, 2019
Post: Async Overloading language design	17	1677	November 25, 2021

Non-Send Futures

Related topics