What is the current safety story for library-based stackful coroutines?

One interesting reslult of the last round of tech empower benchmarks is that one of the top slots is occupied by may-minihttp:

may is a stackful coroutine library -- it uses asm tricks to implement cooperative scheduling of green threads.

However, may is not safe to use -- spawning a coroutine is an unsafe operation, because acessing TLS from a couroutine can lead to undefined behavior (and not just errors): may/coroutine_impl.rs at 5e1fe6392360267c092e7c5879ab96e5d68fc28d · Xudong-Huang/may · GitHub.

I am wondering, do we have some kind of a story to make this safe? Implementing green-threads "in userland" by manually switching stacks is a well-known systems programming trick. Can Rust express it in a sound way?

4 Likes

Is the only problem that TLS will be shared between coroutines that happen to be scheduled on the same thread? I wonder if it's sufficient to just add an intrinsic like LocalKey::disable() that causes all LocalKey::with() to panic in the current thread (and maybe a with_unchecked() for unsafe access). This seems like it would just cost an extra TLS variable that gets checked on TLS access, and an extra unlikely branch.

Of course, if you touch TLS outside of Rust you lose, which no one expects to be a problem they have to deal with.

1 Like

This would be twice as expensive as the already relatively expensive single TLS access.

2 Likes

I mean, I suppose you could do something like hang a reference into that TLS off of every other LocalKey's TLS contents, but that also seems expensive in terms of memory. I can't think of a way to do this that doesn't incur some nasty cost even if you never call this function.

(Oh, and this version is potentially self-racy so I think you lose no matter what. =D)

One option would be to actually allocate TLS for each coroutine, and set up the appropriate register when switching.

3 Likes

That would require intimate knowledge about the libc and libpthreads implementation. A quick search for how to set the TLS manually gave the following comment about it being hard: It's hardly well known, but I implemented an SMTP greylisting proxy called Spey ... | Hacker News

@bjorn3 While it's absolutely difficult, it should possible to do in a way that cooperates with other libraries in the same address space. There are conventions to follow for how TLS gets allocated, and there's other code out there that manually allocates TLS.

Wait, what's the issue of sharing TLS with things running on the same thread? Isn't that just how threads work?

Wait are these coroutines Send or something?

If you're swapping back and forth between different userspace coroutines on the same thread, and the TLS is associated with the thread, you could have two different coroutines access the same TLS in ways that violate borrow-checking rules; for instance, you could borrow something as &mut from TLS, swap coroutines, and then the other coroutine could do its own &mut borrow of the same TLS without knowing about the first borrow.

It might be possible to avoid that by using async functions as the coroutines, and only allowing switches between coroutines when awaited; however, that may or may not fit the desired coroutine model.

1 Like

std::thread::LocalKey::with only provides immutable access as it is re-entrant. You only have a problem when you try to send the coroutine to another thread and then exit the original thread. If it stays on the same thread you just have multiple immutable references. If it goes to another thread but the old thread stays alive, you may simply access tls storage for the old thread for the current LocalKey::with call.

1 Like

If you use interior mutability, like Cell, and swap to a different coroutine you could get two exclusive borrows at the same time. I.e. if you called Cell::swap on both coroutines. Because swapping coroutines could happen at any time (even during Cell::swap)

1 Like

The library in question uses co-operative scheduling and not pre-emptive scheduling, so coroutine switching only happens at specific yield points.

2 Likes

Compilers cache the addresses of thread-local variables accessed within the same stack frame. After resumption on a different thread, the cache can be out of data and refer to TLS of the previous thread, even though the code is now running on a different thread.

There are no &mut inside the implementation of Cell::swap, so even with preemptive scheduling this is not a problem I think.

So why are these Send?

I suspect assumptions made by unsafe and C code
could get broken by cooperative multi-threading
unless each co-routine has its own TLS
and thus its own copy of thread-local variables

Logical assumptions made by safe code may get broken too if more than a single co-routine has access to a thread-local variable. Suppose some library is using this mechanism to pass "hidden" parameters to functions for example..

I suppose it's same as for stackless co-routines.
It's nice to be able to schedule a task on any OS-level thread from a pool
rather than just on a single fixed OS-level thread.

==

So it seems there are two considerations here

  • when a stackful co-routine moves to a different executing OS-level thread (in a pool) it's nice to take thread-locals with it
  • when multiple stackful co-routines share the same executing OS-level thread it's nice if each has its own private copy of thread-local variables - there may be logical assumptions relying on this in both safe and unsafe code and in libraries written in C

It seems rather appealing to be able to switch TLS at will and a library doing that might be a useful companion to libpthread both within and outside of Rust.

Sounds like you want something akin to Lunatic.

We'd rather have actually safe stackful coroutines, instead. But hey, to each their own. ^^

@josh, would you know any from the top of you mind? examples of code doing that?

Since it hasn't been brought up so far: The approach should be fine if the coroutines are not migrated between threads (1:N scheduling instead of M:N scheduling). Is that right?

Might still be interesting for some applications then.

...unless coroutines "contaminating" each other's thread locals is perceived as a problem?