I had given this general idea some thought myself after reading the Jane Street post about proposed OCaml features for thread-safety. They include an API they call “capsule”, which is analogous to the thread-safety boundary for memory that physical threads in Rust offers, but separate from physical threads.
This idea thus doesn’t just relate to Futures. It could also apply to ways of wrapping non-thread-safe data structures in a better way, and more. E.g. for Rust, if you have a graph of Rc
pointers contained withing a data structure, that whole thing could safely be transferred between threads, as long as it's ensured that all contained Rc
cross threads together.
My most minimal and initial idea of what a “capsule” type in Rust could look like (first and foremost for illustration, and ignoring most of the details and additional feature around how the thing is designed for OCaml in the Jane Street post) was
impl<T> Capsule<T> {
fn new(init: impl FnOnce() -> T + Send) -> Capsule<T>;
fn with<R: Send>(
&mut self,
cb: impl FnOnce(&mut T) -> R + Send
) -> R;
}
impl<T> Send for Capsule<T> // !!
impl<T> Sync for Capsule<T> // this one uninteresting with only `&mut self` API
And such a minimal API can be implemented safely, with some additional 'static
bounds, using real threads: Rust Playground. The idea then is, of course, that it in principle could truly be implemented way more efficiently by just calling the callbacks in-place, and merely simulating the memory-boundaries of threads via the API.
Because of thread_local
s as well as Mutex
’s guard’s Drop
implementation, I arrived at the same conclusion that there’s a big conflict with the current meaning of Send
& Sync
, and one would ideally want two separate kinds of Send
/Sync
which I thought to call CSend
/CSend
(for C
apsule) and TSend
/TSync
(for T
thread), in each case T: …Sync
just relates to the &T: …Send
bound, so I’m focusing mostly on just …Send
traits.
The Capsule
type above would be written with CSend
throughout
impl<T> Capsule<T> {
fn new(init: impl FnOnce() -> T + CSend) -> Capsule<T>;
fn with<R: CSend>(
&mut self,
cb: impl FnOnce(&mut T) -> R + CSend
) -> R;
}
impl<T> CSend for Capsule<T>
impl<T: TSend> TSend for Capsule<T>
the impl<T: TSend> TSend for Capsule<T>
makes the whole thing sound in the presence of physical-thread-bound APIs like MutexGuard
.
Migration strategies seem a huge pain… just pretending legacy Send
(which I still call just “Send
”) means the same as CSend
makes a large amount of pre-existing code unsound. The sound way is to make old Send
equal TSend + CSend
(with a strong enough trait-synonym feature, or something equivalent, so that current explicit implementations of the trait would still work and result in the respective TSend + TSync
implementation), and then allow new non-braking refinements to existing APIs.
Regarding the implementations, thus an existing one like
impl<T: Send> Send for Option<T> {}
would thus (at least if it were explicitly written) translate to
impl<T: Send> TSend for Option<T> {}
impl<T: Send> CSend for Option<T> {}
i.e.
impl<T: TSend + CSend> TSend for Option<T> {}
impl<T: TSend + CSend> CSend for Option<T> {}
and the desired new implementations
impl<T: TSend> TSend for Option<T> {}
impl<T: CSend> CSend for Option<T> {}
are no longer a breaking change.
As for some types’ respective bounds: A MutexGuard<i32>
would be CSend + CSync + !TSend + TSync
of course, and generically for MutexGuard<T>
, the non-TSend
ones would just qualify the same bound on T
.
For Rc
, one could simply deny CSend
while still allowing TSend
, as in
// analogous to current `Rc<T>: !Send`
impl<T> !CSend for Rc<T> {}
impl<T> !CSync for Rc<T> {}
// analogous to current `Arc` implemetation
impl<T: TSync + TSend> TSend for Rc<T> {}
impl<T: TSync + TSend> TSync for Rc<T> {}
though now that I’m writing this, I’m wondering whether or not the TSend
one could even be weakened to just T: TSend
, since !CSync
implies it cannot be referenced on multiple physical threads at once anyways.
Interestingly enough, as TSend
and CSend
stand for two orthogonal kinds of memory-boundary, one based on capsules and other thread-like boundaries, like tasks for futures; and one based on physical threads. I’m noticing that it’s also feasible to build … well let’s call the above a CSync
-limited Rc
… to build a TSync
-limited version. (Take above implementations and swap the roles of T…
with C…
traits.) I’ve wondered if that’s necessarily only possible by fully duplicating the type and its API, or whether something similar can be achieved by wrapping a CSync
-limited Rc
somehow to make it - effectively - TSync
-limited.
Another very fascinating take-away from the Jane Street blog (which takes a while to understand, given one has to learn their syntax first) is that when you end the life of a capsule, you can just have it “merge” with any other given context. I.e. in other words: While Rust’s threading API requires today that in
pub fn spawn<F, T>(f: F) -> JoinHandle<T>
where
F: FnOnce() -> T + Send + 'static,
T: Send + 'static,
the result of a thread of type T
must be allowed to be sent between threads, if we forego the physical threads interpretation it doesn’t have to be the case. A joining thread can simply “inherit” the thread’s local memory, (if it wasn’t for thread_locals and the like). This means for CSend
-ness taken into account, the new version could look like
pub fn spawn<F, T>(f: F) -> JoinHandle<T>
where
F: FnOnce() -> T + TSend + CSend + 'static,
T: TSend + 'static,
with no requirement of T: CSend
. Ultimately, this has the effect that a thread can return some data structure containing Rc
s to the thread it joins with, and that should work and be sound.
Of course now thread_local
is also a large issue. It works without any trait bounds today, so it would not translate well without breakage. If we were to freshly design Rust today, one could simply say “thread-local data must be CSync
”, similar to how global statics must be Sync
today. That’s probably the “correct” change, but it would be breaking. Another thing I’ve considered and that might work is an effects system for access to thread_local
:
Imagine Rust was single-threaded today, and we wanted to first introduce Sync
and Send
in the first place. Of course single-threaded Rust allows you to place any data in global variables. You cannot mutably borrow them (without a RefCell
), but there’s no restrictions. Also, unsafe code commonly relies on the single-threaded nature of execution for soundness.
How to introduce Send
/Sync
and thread::spawn
, anyways? Well… how did we introduce running Rust at compile-time, even though so many operations, including basic things like allocation, aren’t possible at compile-time? With effects, like const fn
. (I call const
an effect here, even though it’s arguably the taking-away of an effect; but I’ll stay consistent with const
here and just call the additional keyword an “effect”.)
Unmarked functions (ordinary fn
) still only run single-threaded. The analogy of single-thread here is “running in the main thread”. Any code that supports being executed in other threads than the main thread would need to be a thread fn
. Like const fn
, this comes with language restrictions. Notably, in a thread fn
, you can no longer access the legacy version of global variables, and only the new ones that require T: Sync
are available. The same issues as with const fn
apply, we still need a solution for effects in traits (e.g. like T: const Eq
or const FnOnce()
bounds, etc… and const Destruct
) but eventually, we can mark everything thread-safe with thread fn
, and write thread::spawn
as
pub thread fn spawn<F, T>(f: F) -> JoinHandle<T>
where
F: thread FnOnce() -> T + Send + 'static + thread Destruct,
T: Send + 'static + thread Destruct,
where A: thread Destruct
means values of type A
may be dropped outside of the main thread, and thread FnOnce() -> T
can be called outside of the main thread.
Now introducing capsules could in principle work the same. A new effect capsule
(naming TBD) would mark code that can be executed in the a context of a Capsule::with
callback. It cannot access legacy thread_local
, but only a new thread_local
with a CSync
constraint. (Whether or not there are ways to still have these two thread_local
versions merged into one is something I’m not sure yet - probably yes.)
In either case, the obvious downside is that there’s a lot of code to mark then. This same problem applies with const
already, it’s slow to mark everything const
as such, and it’s manual effort. I believe, for safe code, there can be lints, or even automatic tools, to add const
markers to functions that support it, but automation can only go so far, and in any case it would be a deliberate opt-in, also for future-compatibility (this argument of “what if I want to make my function do non-const
stuff later!?” is a lot stronger for const
, as that’s severely limited compared to non-const
). For a capsule
effect, even more code could support it than for const
, so it would have the unfortunate effect that ultimately, probably most code would be marked with the effect. Editions could help making the opposite the default, but it’s still not an easy problem to solve.
Finally, some more interactions with existing code / other crates, besides the obvious case of thread_local
and MutexGuard
that currently rely on the “Sync
/Send
means physical thread” assumption.
-
Crates like send_wrapper (I believe there was at least one more similar one, too) offer an API that allows moving non-
Send
data between threads, and ensures safety by guarding access to the data with a check against the current thread’s id. In a world as outlined above, such a type could only removeTSend
bounds with such dynamic checks, notCSend
ones. With an effects system, the relevant API and all af its users would simply never gain thecapsule fn
marker.This does demonstrate that even a breaking change to
thread_local
itself wouldn’t eliminate all problematic API; code can useThreadId
s andSend
bounds manually to connectSend
with a notion of physical threads. -
On a positive note, the currently problematic
Ungil
ofpyo3
could benefit significantly from changes like the introduction of separateCSend
vsTSend
. Instead of their current unsound solution to useSend
(and in the presence of scoped thread-locals, perhaps even their nightly auto-trait approach isn’t sound), it could simply useCSend
bounds and require the callback to becapsule FnOnce
at the same time.
Oooh interestingly Capsule
looks a lot like an experimental crate I wrote a while ago: diplomatic_bag - Rust (docs.rs).
I'm slightly confused by your TSend
and CSend
, you say:
For
Rc
, one could simply denyCSend
while still allowingTSend
,
but I don't see how Rc
could possibly be TSend
as if two Rc
s end up on different threads at the same time you can immediately make UB. More generally, I can't see how something could be TSend
but not CSend
, if it can move thread then why couldn't it move between Capsules
?
On the original post, I nearly want to suggest that Tokio could just do:
trait TaskSend {}
impl<T: Send> TaskSend for T {}
impl<T: TaskSend> TaskSend for Cell<T> {}
This doesn't work for specialization reasons, and more importantly isn't very useful because it's not an auto-trait and isn't implemented on generators. However, I do think this gives a possible path for question (3) "Is there a backwards-compatible path we could take to make this a reality?":
- Add
TaskSend
(bikesheddable) an auto trait likeSend
and havetrait Send: TaskSend
- Implement
TaskSend
for currently!Send
types likeCell
. - Libraries like tokio drop their bounds from
Send
toTaskSend
This wouldn't be entirely backwards compatible, things like bounds on associated types couldn't be relaxed easily because someone might be relying on theirSend
nature, but that's always going to be a problem.
On question (2) "Is this an important problem in practice to look into?", I don't think I've ever hit this issue in practice, purely because I can always drop down to the thread-safe versions of things. Admittedly those types are marginally slower but not enough so that I've ever worried about it.
Yep. The lack of CSend
(and CSync
) already means that clones of the same Rc
will never end up on different threads simultaneously.
The Rc<Something>: TSend
however is important to make, say, a Capsule<Vec<Rc<Something>>>
still implement Send
(Send
== TSend + CSend
). This is only sound if Something
isn’t a type like MutexGuard<()>
, hence we track the whole TSend
-ness all the way through Rc
, Vec
, and Capsule
. The capsule prevents that any clones of the Rc
’s leak anywhere else (accessibly) than the Vec
in question, and sending the whole capsule at once to another physical thread is thus ultimately allowed.
So ultimately TSend
means “can me moved to another physical thread”, but does not imply any real thread-safety. For actually directly sending something between threads, like sending something through a mpsc::channel
, you usuallly need TSend + CSend
. It’s only relevant in contexts where identifying threads matters, e.g. by the OS caring which specific thread does something, or some API that looks at the ThreadId
, or API that involves thread_local
s.
I’ve just realized that one place where this comes up is async traits. Right now, we need a new language to specify sendness of futures returned by async method.
In a world where sendness isn’t required for work-stealing, I think we could safely make all futures !Send, without any extra linguistic surface area?
True, although there are futures that are not movable between OS threads. We'd need some mechanism to write traits/bounds that accept ?OsThreadSend
futures and that problem is identical to the current one with Send
.
Rc isn’t just about simultaneous mutation, though. It uses non-atomic operations, which means you can’t safely increment the reference count on one OS thread and decrement it on another. By contrast, if you have one OS thread that’s multiplexing futures, it should actually work fine. So I think that makes it CSend but not TSend.
It's a common misconception (or, should I say, oversimplification). Remember that threads routinely migrate across CPU cores. In other words, in certain situations, you can:
The biggest problem here is thread-local storage, which has poor compatibility with async execution model and Rust currently does not provide any protection in this area.
That’s true of pretty much any !Send
type: “if you happen to not expose any of the problem cases, nothing goes wrong”.
To make
Foo
unsound it is sufficient to addfn get_a(&self) -> Rc<u32>
(which would be a totally innocuous and normal thing to do if not for theunsafe impl Send
).
from the response, Controversial opinion: keeping `Rc` across `await` should not make future `!Send` by itself - #5 by trentj - The Rust Programming Language Forum
The task at hand isn’t just to develop a rule that allows safe programs and conservatively disallows unsafe programs. It also has to be a model a human can understand, depend mostly on local reasoning for an efficient debug build, produce diagnostics that are highly likely to point to the correct conflict if a user breaks the rules, and allow for evolution of implementation across versions without breaking API. So having the compiler guess what is and isn’t safe has to be done very carefully.
Which doesn’t mean it’s impossible! Swift is trying to do it right now, using a very conservative escape analysis: SE-0414: Region Based Isolation - Proposal Reviews - Swift Forums. But Send checking has been in Rust a lot longer than it’s been in Swift.
We already have a very good and widely known model. It's called threading. And Rust supports it splendidly. You spawn tasks/threads which may be executed in parallel and may migrate between CPU cores freely. You get safety by carefully guarding boundaries between tasks/threads using Send
/Sync
.
I believe that an asynchronous execution model should emulate threading model as closely as possible (after all, fundamentally they are not that different from each other), which the current Rust async model clearly fails at. Asynchronous execution model allows very nice additional tricks possible because of the additional scheduling control, but they are just extensions to the base model.
This is one of the reasons why I strongly dislike Rust async and stay as far away from it as possible. Rust got itself into a corner by making Future
implementers nothing more than a common type, instead of treating task (persistent) stacks specially like we do with thread stacks. In my opinion, we would've been much better with a stackfull (fiber) model together with semi-breaking (i.e. introduced in a new edition) changes to TLS to fix the async incompatibility and, maybe, with additional "async" compilation targets.
Yes, I know that Rust had it pre-1.0, but, in my experience, we can implement it in a much lighter-weighted fashion, especially on top of new APIs like io-uring
. And people do it in production as, for example, can be seen in this comment! At my work we also plan to completely side-step Rust async and use our custom stackfull executor (our solution for the TLS problem is, unfortunately, "just carefully review all TLS uses").
Another common counter-argument to the stackfull model is embedded (bare-metal) devices. But there is a potential solution for that as well. Compiler in most cases is able to compute exactly how much stack space functions will use and on embedded devices you often have to track it either way (e.g. using tools like stack-sizes
). So future compiler in theory could compute the stack size necessary for a stackfull task. There are also potential niceties about being able to build hybrid cooperative/preemptive systems (e.g. higher priority tasks launched by an interrupt may forcefully preempt tasks with lower priority) not possible in the stackless model, but it's a whole another topic.
Another breakage example is various OS APIs that just mandate that things happen on a particular execution thread, like
pthread_mutex_unlock
. Though I think that the turtle those APIs stand on are thread locals again?
This is arguably off-topic, but - it's usually not thread locals.
On OSes like Darwin where pthread mutexes use priority inheritance by default, there's a good reason you can't unlock from another thread. With priority inheritance, if a high-priority thread wants the mutex, the OS temporarily boosts the priority of the current owner in the hope they'll finish their work and unlock the mutex sooner. But that doesn't work if the current owner handed off the MutexGuard
to some random other thread. The OS has to know who is doing the work. (It might be nice if there was a way to tell the OS to switch mutex ownership to a specified other thread, but there is no API for it.)
Most OSes don't enable priority inheritance by default, and in the implementations I've seen, pthread_mutex_unlock
has an ownership check just as a way to detect API misuse, not because the implementation internals require it.
Linux too has mutices with PI (though not by default). We use them heavily at work on real time Linux. The lack of exposing this functionality in the Rust standard library is annoying to be honest.
Just a quick remark on effects:
The effect should be "tls" or so. Effects generally say "this code may do something"; absence of effects implies "code definitely does not do something".
"const" is not an effect, it's the absence of an effect.
It's not just TLS, there's also thread IDs.
Basically your argument relies on the Rust concept of threads being entirely "virtual", with no way for code to check "which thread it is in". Then the async runtime can just pretend like each async task is its own thread and it can migrate that between host OS threads. (This can be modeled perfectly in my RustBelt soundness proof.) But this is broken both by Rust threads being tied to TLS and by thread IDs.
If you mean ThreadId
, aren't those implemented using TLS?
I’m aware, hence also the remark a little bit further up in my post:
Just like with const
, the newly introduced kind of function is the one without the effect, so it’s the one that gets the keyword, so that keyword-less functions keep the same meaning as before “unrestricted TLS access[1] being allowed”. Negated names in keywords (say, not_tls
[2]) would be a bit weird. The name is a placeholder anyways. I’m working analogous to const
also since it’s precedent and familiar to Rust programmers, thus familiar to all readers.
Ah true. The ID is assigned using a global counter but then it is stored using TLS. So I guess a no-tls function could also not fetch the current thread ID.
I’ll stay consistent with
const
here and just call the additional keyword an “effect”.
I did indeed miss the paragraph you quoted in your reply, sorry for that. However calling "const" an effect is just as misleading. These are technical terms and misuse of technical terms is a great way to cause misunderstandings. (I'm not involved in the effects initiative so I don't know their latest terminology. But if they plan to call "const" an effect then I think that's a mistake.)
IMO no_tls
is also way more clear than capsule
. It fits well with other qualifiers that are considered regularly, like no_unwind
. (But anyway we don't have to bikeshed this now. I just wanted to reduce the confusion around the term "effect". We already have plenty of confusion there as things stand. )
While Rust's ThreadId
is physically dissasociated from the OS, thread, the OS provides still a concept of thread ID which is fundamentally tied to the actual execution thread. Rust std doesn't expose a way to get the current OS thread ID IIRC (unix can get pthread_t
from a join handle, but that's definitely an OS thread anyway, even in a world with virtual thread isolation), but linking in extern code which provide that access is straightforward and would currently be implied sound.
(While const
is the removal of an effect, being generic over const
is still an generalization over an effect, it's just that the effect is the absence of the const
restriction. So const
would still qualify as an “effect keyword” since it controls an effect.)
Yes, it can be summarized like this. To use your terminology, I think that ideally Rust should define a basic model of entirely "virtual" threads/tasks and it should be the only model available in std
. Most of code can run in this basic model without any issues. The model then can be extended on an opt-in basis with additional capabilities available for native OS threads or fiber-based asynchronous execution contexts.
I don't see a big problem with ThreadId
. In async context it would simply change meaning to TaskId
. We also probably could somehow emulate "task-local storage", which would be used as "async TLS".
I probably should explicitly highlight that I imply introduction of new exprimental "asynchronous" fiber-based compilation targets, something like x86_64-unknown-linux-io-uring
. Shoehorning asynchronous execution model on top of synchronous std
, while probably possible, would be a much harder task.
While I do think having !Send
futures and work stealing at the same time would be awesome, I don't think this alone justifies the amount of breakage required. If we ever plan to have rust 2.0 I would certainly put this on the list. But for the mean time, I think our best path forward is to drop the !Send
bounds wherever possible.
I think within rust is the only time I've ever seen these kinds of negated effects discussed, The only previous encounter being a recent post by boats.
https://without.boats/blog/poll-next/
This is a lead-in to a longer term vision of introducing a “pinned effect.”
Which would be like a not_move
.