IIUC this is what he does in both his MSVC and LLVM implementations: TLS are not cached across @llvm.experimental.coro.suspend
invocations, but I don't know if the intrinsic handles that or if clang does. The current revision of the RFC does not mention anything about this so it might be clang doing it. I've pinged him on the issue and will let you know once he answers.
But it seems that what he focuses mostly is stackless coroutines, which cannot be transferred between threads.
I think you have the wrong expectations about this RFC. This RFC proposes primitives for defining functions with suspension points and transforming them into state machines (as well as optimization passes on those). Implementing coroutines as state machines is not the only way of implementing coroutines, but it is one of the most efficient ways of implementing coroutines that we know.
Clang uses these primitives to implement the C++ Coroutines Technical Specification which supports both stackless and stackfull coroutines with a combination of language features, library types, and runtime support (including a multi-threaded system scheduler that migrates coroutines between threads).
This RFC is basically the set of primitives that the main author of C++ Coroutines (both the specification and MSVC and Clang implementations) thinks would be useful to the whole LLVM community, such that other languages can reuse these to build whatever coroutine semantics they want (not necessarily those of the C++ coroutines TS).
However, this has obviously only be tested for clang and C++ coroutines, and hence why he is asking for feedback. IMO he is only going to get good feedback if people try to reuse the primitives to implement coroutines in other languages and report their findings in the mailing list. The RFC has had no responses.
One of the places where this shows is, for example, in the definition of a coroutine stackframe, where the size of the frame must be a constant. What happens when a coroutine has a DST in its stack frame, like a C99 VLA or a Rust DST? Then the size of the stackframe must be dynamic, the coroutine itself becomes a DST, but it might still be possible to avoid any memory allocation at run-time. Since C++ doesn't have DSTs, this is left as future work (mainly because clang does offer VLAs in C++ as an extension, and at some point it might want to allow using VLAs inside coroutines as well). Progress would be faster here if frontends with DSTs would give this a try.
So @zonyitoo if you want to give his LLVM fork a try I think that would be awesome. You might want to contact him first and tell him about it in case he has any hints.