Single-threaded coroutines and sharing data between them


#1

I just wanted to get these ideas down. I don’t expect anyone to implement them – I’m just putting them out there for consideration.

The model is of a bunch of coroutines running in the same thread and sharing a common structure (shared state). This is the actor pattern or similar. The coroutines are used by the main stack code to handle actions that need to span several ‘waits’, whether that be waiting for timers or external events or callbacks or whatever. The main stack code wakes up coroutines as necessary as events arrive (this can be in a library).

By coroutines I mean genuine cooperative coroutines with their own stacks, with yields under program control (i.e. not pre-emptive).

The key to this is the data shared between all the coroutines. Since the coroutines run in the same thread and the yields are well-defined, for the execution time between two yields, the code has exclusive access to the structure shared between the coroutines (the actor state, if you like).

If the compiler kept track of the yield-points, and internally marked any function that might yield as a potential yield-point all the way back through the call stack, then the compiler could allow borrows of a mutable pointer to the shared data for free (i.e. compile-time checked). Effectively the code has a lock on the data between yield-points, for free.

This could be done with runtime checks, but this is such a lightweight model it seems it could be worth optimising for. Single-threaded cooperative coroutines are very lightweight to start with (just a few instructions to switch stacks at a yield), and with effectively free locking provided by the static analysis, it becomes a very efficient approach to writing code which has to block, but which is neater coded in a serial fashion (with loops, nested function calls, etc, etc).

The alternative to this is to keep the same information that would be kept on the coroutine stack (loop counters, etc) instead in a structure, and feed events to handlers associated with that structure. But this quickly loses all sense of the flow and structure of the code. Each event handler fractionally advances the state of the structure, but the overall picture is lost.

This is not about parallel processing or even much about concurrency. Rather it is about rearranging event-handling code structure to make it as clear as possible. Coroutine stacks are used to hold state, and natural serial program structure is used, with the yields hidden in library functions, mostly.

(Context for these ideas is current paid work I’m doing in Lua, where Lua’s single-threaded coroutines have permitted a very clear expression of the code (after having tried other ways of expressing it). However in Lua I have to mentally keep a track of where yields happen, and where in the code it is safe to make state changes without risking races. In Rust it could be done a lot better.)

Requirements to implement this model (at maximum efficiency):

  • Library support for creating/cleaning up coroutine stacks, switching to another stack (yield)
  • Compiler support for temporarily borrowing shared state between yield-points with no runtime checks, which requires tracking yield-points

#2

I’ve implemented something exactly like this in Ruby (https://celluloid.io)

Having shared mutable state that is shared among coroutines that suspend when the actor is waiting on a message can lead to some confusing behaviors. Specifically, depending on message ordering state may or may not get mutated, which can be confusing. Code “appears” blocking, but may be mutating state behind your back.

It’s basically the same problems as shared mutable state flying around in event loops ala Node.js, with “magical” yield points where your program gets suspended and resumed, as opposed to explicit promises, callbacks, or functions. It makes writing code easier, but incredibly magical and confusing, IMO.

I feel this is ok in Ruby, since Ruby is already mutable state hell, but I think this approach doesn’t fit well in a language like Rust.


#3

The difference is that in Rust you could make it a lot safer. If you are making invalid assumptions and trying to run an iterator for shared data across a yield-point (for example) the compiler would forbid it and you’d get a nice error message.

It’s understood that the coroutines are logically running in parallel and things will happen behind your back in the shared state, but only at well-defined points in the code, which the compiler can verify.

I suppose if you are thinking of a set of changes that need to be applied atomically to the shared state, but they are accidentally spread out in the code across a yield point, then yes that could be confusing and this proposal wouldn’t protect from that.

However if getting a lock on the shared structure was explicit (or more verbose), then perhaps the bracketing would be clearer and the risk of programmer error in this kind of spread-out atomic operation less. The lock would still just be logical and compile down to nothing through the static yield-point analysis.

To me, Rust’s style of static checking could fit this pattern very well, verifying at compile-time the static checking that I’m already doing in my head when writing this kind of code.


#4

I think shared mutable state whose ownership is constantly flying around between coroutines (which are being suspended and resumed in arbitrary order based on external events) would both be hard to shoehorn into Rust’s lifetime system and also in general has properties I dislike, because it makes modeling state confusing.

How do you even envision this working? Would the value be moved each time there’s a context switch between coroutines? How would that even fit into Rust’s lifetime system?

However if getting a lock on the shared structure was explicit (or more verbose)

How would that even work? Every piece of state would have to have a “lock”, and that lock would have to be known by the system scheduling/resuming coroutines so if a coroutine tries to acquire a “lock” on the data and it’s already held, the coroutine is suspended?

This seems to be all of the pain of building multithreaded programs in languages like C/C++/Java without any of the benefits.

I think before this proposal could even be considered, you’d need a lot less handwavy description of the ownership of the shared mutable state that’s flying around.

That said, I have sunk some 4 years of my life into a system like this, which I think is arguably the most mature system that works this way (Celluloid has been downloaded over 7 million times), and in my opinion I think it’s going against the grain of the way Rust models state in concurrent programs.


#5

I think perhaps I am not explaining myself clearly enough.

It is exactly the same as the existing Rust borrow checker. For example if a mutable reference is used by an iterator, then it can’t also be used for something else. Each yield point is effectively ‘something else’, because other (unknown) code runs at that point until the coroutine resumes, and that other code might need the shared state, so that would conflict. All I am suggesting is extending Rust’s borrowing rules with some knowledge of coroutines and yield points. (By yield point, I mean either a yield (i.e. a switch to another stack), or a call to some function that might yield, even indirectly.)

Rust’s existing borrowing rules are effectively a lock which exists only logically in the compiler, and results in no lock/unlock code in the resulting executable. Similarly this proposal could be seen as coroutines asking for a lock on the shared state, and then releasing the lock before the next yield, but actually because no other coroutine can possibly run in that thread between one yield and the next, the compiler can prove that no-one else could contend on that lock so no lock/unlock code is emitted. Like the existing borrowing rules, where there is a risk of a lock contention, the compiler refuses to compile the code in question and gives an error message. So if I try to keep an iterator alive over a yield point, the compiler would reject that.

So there is never any contention on the ‘lock’, because the compiler doesn’t let a ‘lock’ be held over a yield point. There is no runtime overhead. No runtime checks are required.

Also, there is no runtime system required. The coroutines can manage themselves. They are effectively just a memory allocation, and can be managed like other memory allocations, although cleanup requires some knowledge of the stack data structure. (This is required if the coroutine is never resumed and the reference keeping it alive is dropped.) Yields always go to some specific other coroutine, often with a payload. There is no scheduler.

Coding in assembly this is all quite a natural thing to do. Creating a stack is trivial, as is switching to/from it. So when I say this is lightweight, I know that from the assembly level. It all gets conceptually weighed down by the abstractions built on top of it, but if we don’t attempt pre-emption, and we don’t attempt goroutines, it is all very very simple and natural.

However, details of syntax and how best to express this to the compiler, I’m not totally sure about.

Perhaps we need to be able to create a structure with a CoroutineShareble trait. References to a structure with this trait follow the special rules regarding borrows over yield-points. Otherwise passing data to a created coroutine is like passing to another thread (i.e. move or copy). But a reference to a CoroutineShareable structure could be passed unchanged to a coroutine and in that way mutation is possible from either the main stack or a coroutine stack, but with no borrows allowed over yield-points.

That is perhaps the most simple expression of it, but to make the grouping of operations on the shared data more explicit to avoid the spread-out atomic operation risk mentioned above, perhaps some additional syntax could be devised, e.g. some kind of start/end bracketting which won’t compile if there is a yield-point within the bracketted region.


#6

Even if you had CoroutineShareable data, I think the semantics are a bit wonky. You run into a situation where any call can potentially suspend the coroutine and mutate data behind your back. Unless you build in a “locking” system as you propose… but now you have coroutines which are blocked on external events requesting exclusive access to data, so within a single-threaded event loop you now have the potential for internal activities to deadlock on each other.

I guess my opinion is that even if the borrow checker were extended to be aware of this in a way that actually makes sense, the result is probably not the best way to compose concurrent systems.

This is kind of harkening to the libgreen model of concurrency which, likewise, had wonky semantics and also required Rust build in an abstraction layer around what in the system you propose would be a coroutine suspend point.


#7

I guess what I’m saying is, so long as the suspend/resume points are always explicit, it can work. But it relies on always being aware of where this is happening.


#8

This is what I mean by yield-point tracking. Either the yield instruction has to be built-in or the library routine that does it needs to be marked specially for the compiler to recognise. Then any function which contains a call to yield is marked as a ‘yield-risk’, and also any functions which call it, etc. So it is a bit like pure/impure functions except in terms of yielding. The compiler has to track this internally.

The alternative would be to mark it explicitly in a function definition, and refuse to compile a function not marked as a “yield-risk” that calls yield or a “yield-risk” function.

Either way, in a segment of code the compiler will know where the yield-points are so can do the special borrowing analysis. So it is not that any call can change stuff behind your back, just certain ones.

To me it is not like green threads. It is just a way of reorganizing sequential event-handling code over several stacks to make the flow easier to understand.

I can see that you are still not convinced, though, but thanks for responding to give me the opportunity to explain the idea more clearly.


#9

I’m just thinking whether this could fit Rust’s base tree-like ownership/borrowing model, i.e. no backrefs. Ownership of coroutines can be tree-like. Also resume/yield pairs can be tree-like with only a temporary backref whilst a resume call is in progress. But passing data on the resume and returning data on the yield is harder for me to visualise. It is not yet clear to me whether this would fit Rust’s borrow-checking. The yield may come from a place lower down on the called coroutine’s stack, or higher up, or on another branch. I think it would need prototyping.

The model I have for running coroutines is that of resume/yield pairs. A ‘higher-up’ coroutine X makes a resume call for ‘lower-down’ coroutine Y. It passes control to the top of Y’s stack, passing a temporary backref to X to allow Y to return control when it next yields. Y runs until a yield at which point it passes control back to X. So that segment of execution of Y runs in the context of the resume call made in X.

So all that is tree-like and well under control (in terms of the ref back to X for example), but I’m unsure about the handling of the arguments and return values of the resume call. (The arguments to resume in X become the return values of the yield in Y, and the arguments of the following yield in Y become the return values of the resume in X.) At this point my head explodes.

Since I see this as equivalent to certain types of event handling without coroutines, perhaps I need to try writing both versions in Rust and see how borrowing works in the non-coroutine version and whether that can be applied to the coroutine version. (Also perhaps I should try another project in Rust first to get enough experience of the current idiom.) So I’ll come back to this if it gets any clearer to me.