I am wondering, do we have some kind of a story to make this safe? Implementing green-threads "in userland" by manually switching stacks is a well-known systems programming trick. Can Rust express it in a sound way?
Is the only problem that TLS will be shared between coroutines that happen to be scheduled on the same thread? I wonder if it's sufficient to just add an intrinsic like LocalKey::disable() that causes all LocalKey::with() to panic in the current thread (and maybe a with_unchecked() for unsafe access). This seems like it would just cost an extra TLS variable that gets checked on TLS access, and an extra unlikely branch.
Of course, if you touch TLS outside of Rust you lose, which no one expects to be a problem they have to deal with.
I mean, I suppose you could do something like hang a reference into that TLS off of every other LocalKey's TLS contents, but that also seems expensive in terms of memory. I can't think of a way to do this that doesn't incur some nasty cost even if you never call this function.
(Oh, and this version is potentially self-racy so I think you lose no matter what. =D)
@bjorn3 While it's absolutely difficult, it should possible to do in a way that cooperates with other libraries in the same address space. There are conventions to follow for how TLS gets allocated, and there's other code out there that manually allocates TLS.
If you're swapping back and forth between different userspace coroutines on the same thread, and the TLS is associated with the thread, you could have two different coroutines access the same TLS in ways that violate borrow-checking rules; for instance, you could borrow something as &mut from TLS, swap coroutines, and then the other coroutine could do its own &mut borrow of the same TLS without knowing about the first borrow.
It might be possible to avoid that by using async functions as the coroutines, and only allowing switches between coroutines when awaited; however, that may or may not fit the desired coroutine model.
std::thread::LocalKey::with only provides immutable access as it is re-entrant. You only have a problem when you try to send the coroutine to another thread and then exit the original thread. If it stays on the same thread you just have multiple immutable references. If it goes to another thread but the old thread stays alive, you may simply access tls storage for the old thread for the current LocalKey::with call.
If you use interior mutability, like Cell, and swap to a different coroutine you could get two exclusive borrows at the same time. I.e. if you called Cell::swap on both coroutines. Because swapping coroutines could happen at any time (even during Cell::swap)
Compilers cache the addresses of thread-local variables accessed within the same stack frame. After resumption on a different thread, the cache can be out of data and refer to TLS of the previous thread, even though the code is now running on a different thread.
I suspect assumptions made by unsafe and C code
could get broken by cooperative multi-threading
unless each co-routine has its own TLS
and thus its own copy of thread-local variables
Logical assumptions made by safe code may get broken too if more than a single co-routine has access to a thread-local variable. Suppose some library is using this mechanism to pass "hidden" parameters to functions for example..
I suppose it's same as for stackless co-routines.
It's nice to be able to schedule a task on any OS-level thread from a pool
rather than just on a single fixed OS-level thread.
==
So it seems there are two considerations here
when a stackful co-routine moves to a different executing OS-level thread (in a pool) it's nice to take thread-locals with it
when multiple stackful co-routines share the same executing OS-level thread it's nice if each has its own private copy of thread-local variables - there may be logical assumptions relying on this in both safe and unsafe code and in libraries written in C
It seems rather appealing to be able to switch TLS at will and a library doing that might be a useful companion to libpthread both within and outside of Rust.
Since it hasn't been brought up so far: The approach should be fine if the coroutines are not migrated between threads (1:N scheduling instead of M:N scheduling). Is that right?
Might still be interesting for some applications then.