What is the current safety story for library-based stackful coroutines?

One interesting reslult of the last round of tech empower benchmarks is that one of the top slots is occupied by may-minihttp:

may is a stackful coroutine library -- it uses asm tricks to implement cooperative scheduling of green threads.

However, may is not safe to use -- spawning a coroutine is an unsafe operation, because acessing TLS from a couroutine can lead to undefined behavior (and not just errors): may/coroutine_impl.rs at 5e1fe6392360267c092e7c5879ab96e5d68fc28d · Xudong-Huang/may · GitHub.

I am wondering, do we have some kind of a story to make this safe? Implementing green-threads "in userland" by manually switching stacks is a well-known systems programming trick. Can Rust express it in a sound way?

1 Like

Is the only problem that TLS will be shared between coroutines that happen to be scheduled on the same thread? I wonder if it's sufficient to just add an intrinsic like LocalKey::disable() that causes all LocalKey::with() to panic in the current thread (and maybe a with_unchecked() for unsafe access). This seems like it would just cost an extra TLS variable that gets checked on TLS access, and an extra unlikely branch.

Of course, if you touch TLS outside of Rust you lose, which no one expects to be a problem they have to deal with.

This would be twice as expensive as the already relatively expensive single TLS access.

1 Like

I mean, I suppose you could do something like hang a reference into that TLS off of every other LocalKey's TLS contents, but that also seems expensive in terms of memory. I can't think of a way to do this that doesn't incur some nasty cost even if you never call this function.

(Oh, and this version is potentially self-racy so I think you lose no matter what. =D)

One option would be to actually allocate TLS for each coroutine, and set up the appropriate register when switching.

2 Likes

That would require intimate knowledge about the libc and libpthreads implementation. A quick search for how to set the TLS manually gave the following comment about it being hard: It's hardly well known, but I implemented an SMTP greylisting proxy called Spey ... | Hacker News

@bjorn3 While it's absolutely difficult, it should possible to do in a way that cooperates with other libraries in the same address space. There are conventions to follow for how TLS gets allocated, and there's other code out there that manually allocates TLS.

Wait, what's the issue of sharing TLS with things running on the same thread? Isn't that just how threads work?

Wait are these coroutines Send or something?

If you're swapping back and forth between different userspace coroutines on the same thread, and the TLS is associated with the thread, you could have two different coroutines access the same TLS in ways that violate borrow-checking rules; for instance, you could borrow something as &mut from TLS, swap coroutines, and then the other coroutine could do its own &mut borrow of the same TLS without knowing about the first borrow.

It might be possible to avoid that by using async functions as the coroutines, and only allowing switches between coroutines when awaited; however, that may or may not fit the desired coroutine model.

std::thread::LocalKey::with only provides immutable access as it is re-entrant. You only have a problem when you try to send the coroutine to another thread and then exit the original thread. If it stays on the same thread you just have multiple immutable references. If it goes to another thread but the old thread stays alive, you may simply access tls storage for the old thread for the current LocalKey::with call.

If you use interior mutability, like Cell, and swap to a different coroutine you could get two exclusive borrows at the same time. I.e. if you called Cell::swap on both coroutines. Because swapping coroutines could happen at any time (even during Cell::swap)

The library in question uses co-operative scheduling and not pre-emptive scheduling, so coroutine switching only happens at specific yield points.

1 Like

Compilers cache the addresses of thread-local variables accessed within the same stack frame. After resumption on a different thread, the cache can be out of data and refer to TLS of the previous thread, even though the code is now running on a different thread.

There are no &mut inside the implementation of Cell::swap, so even with preemptive scheduling this is not a problem I think.

So why are these Send?