I had given this general idea some thought myself after reading the Jane Street post about proposed OCaml features for thread-safety. They include an API they call “capsule”, which is analogous to the thread-safety boundary for memory that physical threads in Rust offers, but separate from physical threads.
This idea thus doesn’t just relate to Futures. It could also apply to ways of wrapping non-thread-safe data structures in a better way, and more. E.g. for Rust, if you have a graph of Rc
pointers contained withing a data structure, that whole thing could safely be transferred between threads, as long as it's ensured that all contained Rc
cross threads together.
My most minimal and initial idea of what a “capsule” type in Rust could look like (first and foremost for illustration, and ignoring most of the details and additional feature around how the thing is designed for OCaml in the Jane Street post) was
impl<T> Capsule<T> {
fn new(init: impl FnOnce() -> T + Send) -> Capsule<T>;
fn with<R: Send>(
&mut self,
cb: impl FnOnce(&mut T) -> R + Send
) -> R;
}
impl<T> Send for Capsule<T> // !!
impl<T> Sync for Capsule<T> // this one uninteresting with only `&mut self` API
And such a minimal API can be implemented safely, with some additional 'static
bounds, using real threads: Rust Playground. The idea then is, of course, that it in principle could truly be implemented way more efficiently by just calling the callbacks in-place, and merely simulating the memory-boundaries of threads via the API.
Because of thread_local
s as well as Mutex
’s guard’s Drop
implementation, I arrived at the same conclusion that there’s a big conflict with the current meaning of Send
& Sync
, and one would ideally want two separate kinds of Send
/Sync
which I thought to call CSend
/CSend
(for C
apsule) and TSend
/TSync
(for T
thread), in each case T: …Sync
just relates to the &T: …Send
bound, so I’m focusing mostly on just …Send
traits.
The Capsule
type above would be written with CSend
throughout
impl<T> Capsule<T> {
fn new(init: impl FnOnce() -> T + CSend) -> Capsule<T>;
fn with<R: CSend>(
&mut self,
cb: impl FnOnce(&mut T) -> R + CSend
) -> R;
}
impl<T> CSend for Capsule<T>
impl<T: TSend> TSend for Capsule<T>
the impl<T: TSend> TSend for Capsule<T>
makes the whole thing sound in the presence of physical-thread-bound APIs like MutexGuard
.
Migration strategies seem a huge pain… just pretending legacy Send
(which I still call just “Send
”) means the same as CSend
makes a large amount of pre-existing code unsound. The sound way is to make old Send
equal TSend + CSend
(with a strong enough trait-synonym feature, or something equivalent, so that current explicit implementations of the trait would still work and result in the respective TSend + TSync
implementation), and then allow new non-braking refinements to existing APIs.
Regarding the implementations, thus an existing one like
impl<T: Send> Send for Option<T> {}
would thus (at least if it were explicitly written) translate to
impl<T: Send> TSend for Option<T> {}
impl<T: Send> CSend for Option<T> {}
i.e.
impl<T: TSend + CSend> TSend for Option<T> {}
impl<T: TSend + CSend> CSend for Option<T> {}
and the desired new implementations
impl<T: TSend> TSend for Option<T> {}
impl<T: CSend> CSend for Option<T> {}
are no longer a breaking change.
As for some types’ respective bounds: A MutexGuard<i32>
would be CSend + CSync + !TSend + TSync
of course, and generically for MutexGuard<T>
, the non-TSend
ones would just qualify the same bound on T
.
For Rc
, one could simply deny CSend
while still allowing TSend
, as in
// analogous to current `Rc<T>: !Send`
impl<T> !CSend for Rc<T> {}
impl<T> !CSync for Rc<T> {}
// analogous to current `Arc` implemetation
impl<T: TSync + TSend> TSend for Rc<T> {}
impl<T: TSync + TSend> TSync for Rc<T> {}
though now that I’m writing this, I’m wondering whether or not the TSend
one could even be weakened to just T: TSend
, since !CSync
implies it cannot be referenced on multiple physical threads at once anyways.
Interestingly enough, as TSend
and CSend
stand for two orthogonal kinds of memory-boundary, one based on capsules and other thread-like boundaries, like tasks for futures; and one based on physical threads. I’m noticing that it’s also feasible to build … well let’s call the above a CSync
-limited Rc
… to build a TSync
-limited version. (Take above implementations and swap the roles of T…
with C…
traits.) I’ve wondered if that’s necessarily only possible by fully duplicating the type and its API, or whether something similar can be achieved by wrapping a CSync
-limited Rc
somehow to make it - effectively - TSync
-limited.
Another very fascinating take-away from the Jane Street blog (which takes a while to understand, given one has to learn their syntax first) is that when you end the life of a capsule, you can just have it “merge” with any other given context. I.e. in other words: While Rust’s threading API requires today that in
pub fn spawn<F, T>(f: F) -> JoinHandle<T>
where
F: FnOnce() -> T + Send + 'static,
T: Send + 'static,
the result of a thread of type T
must be allowed to be sent between threads, if we forego the physical threads interpretation it doesn’t have to be the case. A joining thread can simply “inherit” the thread’s local memory, (if it wasn’t for thread_locals and the like). This means for CSend
-ness taken into account, the new version could look like
pub fn spawn<F, T>(f: F) -> JoinHandle<T>
where
F: FnOnce() -> T + TSend + CSend + 'static,
T: TSend + 'static,
with no requirement of T: CSend
. Ultimately, this has the effect that a thread can return some data structure containing Rc
s to the thread it joins with, and that should work and be sound.
Of course now thread_local
is also a large issue. It works without any trait bounds today, so it would not translate well without breakage. If we were to freshly design Rust today, one could simply say “thread-local data must be CSync
”, similar to how global statics must be Sync
today. That’s probably the “correct” change, but it would be breaking. Another thing I’ve considered and that might work is an effects system for access to thread_local
:
Imagine Rust was single-threaded today, and we wanted to first introduce Sync
and Send
in the first place. Of course single-threaded Rust allows you to place any data in global variables. You cannot mutably borrow them (without a RefCell
), but there’s no restrictions. Also, unsafe code commonly relies on the single-threaded nature of execution for soundness.
How to introduce Send
/Sync
and thread::spawn
, anyways? Well… how did we introduce running Rust at compile-time, even though so many operations, including basic things like allocation, aren’t possible at compile-time? With effects, like const fn
. (I call const
an effect here, even though it’s arguably the taking-away of an effect; but I’ll stay consistent with const
here and just call the additional keyword an “effect”.)
Unmarked functions (ordinary fn
) still only run single-threaded. The analogy of single-thread here is “running in the main thread”. Any code that supports being executed in other threads than the main thread would need to be a thread fn
. Like const fn
, this comes with language restrictions. Notably, in a thread fn
, you can no longer access the legacy version of global variables, and only the new ones that require T: Sync
are available. The same issues as with const fn
apply, we still need a solution for effects in traits (e.g. like T: const Eq
or const FnOnce()
bounds, etc… and const Destruct
) but eventually, we can mark everything thread-safe with thread fn
, and write thread::spawn
as
pub thread fn spawn<F, T>(f: F) -> JoinHandle<T>
where
F: thread FnOnce() -> T + Send + 'static + thread Destruct,
T: Send + 'static + thread Destruct,
where A: thread Destruct
means values of type A
may be dropped outside of the main thread, and thread FnOnce() -> T
can be called outside of the main thread.
Now introducing capsules could in principle work the same. A new effect capsule
(naming TBD) would mark code that can be executed in the a context of a Capsule::with
callback. It cannot access legacy thread_local
, but only a new thread_local
with a CSync
constraint. (Whether or not there are ways to still have these two thread_local
versions merged into one is something I’m not sure yet - probably yes.)
In either case, the obvious downside is that there’s a lot of code to mark then. This same problem applies with const
already, it’s slow to mark everything const
as such, and it’s manual effort. I believe, for safe code, there can be lints, or even automatic tools, to add const
markers to functions that support it, but automation can only go so far, and in any case it would be a deliberate opt-in, also for future-compatibility (this argument of “what if I want to make my function do non-const
stuff later!?” is a lot stronger for const
, as that’s severely limited compared to non-const
). For a capsule
effect, even more code could support it than for const
, so it would have the unfortunate effect that ultimately, probably most code would be marked with the effect. Editions could help making the opposite the default, but it’s still not an easy problem to solve.
Finally, some more interactions with existing code / other crates, besides the obvious case of thread_local
and MutexGuard
that currently rely on the “Sync
/Send
means physical thread” assumption.
-
Crates like send_wrapper (I believe there was at least one more similar one, too) offer an API that allows moving non-
Send
data between threads, and ensures safety by guarding access to the data with a check against the current thread’s id. In a world as outlined above, such a type could only removeTSend
bounds with such dynamic checks, notCSend
ones. With an effects system, the relevant API and all af its users would simply never gain thecapsule fn
marker.This does demonstrate that even a breaking change to
thread_local
itself wouldn’t eliminate all problematic API; code can useThreadId
s andSend
bounds manually to connectSend
with a notion of physical threads. -
On a positive note, the currently problematic
Ungil
ofpyo3
could benefit significantly from changes like the introduction of separateCSend
vsTSend
. Instead of their current unsound solution to useSend
(and in the presence of scoped thread-locals, perhaps even their nightly auto-trait approach isn’t sound), it could simply useCSend
bounds and require the callback to becapsule FnOnce
at the same time.