How were stack sizes determined for green threads?


#1

There are echoes from a long time ago that mention that concurrent Rust had a hot split problem, that segmented stacks were unwieldy, and that in general procs were hitting pretty serious limits in throughput because stack management was the most intractable part of m:n threading.

My question is for those of you that were around and cognizant of these decisions: for the longest duration of green threads’ existence, how were stack sizes determined? What was the implementation strategy for individual procs’ stacks?

Couldn’t asynchronous processes have a max stack size statically determined? So that “growing” is never an issue?


#2

I can’t remember how the old segmented stack algorithm worked, but stack sizes can only be statically determined for acyclic call graphs.

Go has had very similar trashing issues with its implementation, which they resolved by rewriting their garbage collector to be fully precise, allowing them to essentially realloc stacks and adjust all the pointers. That’s not a feasible option in a non-managed language like Rust.


#3

EDIT: I’m sorry, you are right. This does not work for cyclic call graphs.

ORIGINAL: I wrote this on reddit, but it’s relevant so I’ll post it here, too:

"… I was thinking about the need for segmented stacks and it occurred to me that if you make the careful restriction that 1) procs be 100% cooperative and 2) all cooperation and inversion of control happen inside of the lexical scope, then it is very simple to allocate a stack only for the bindings that need to persist during a scheduler yield, as well as being able to compile the proc to a very simple state machine. Let me explain.

Consider the following theoretical asynchronous function:

proc do_it(r: Receiver<i32>) {
    let a = 2;
    let b = 5;
    let c = do_more(a, b);
    if condition() {
    }
    else loop {
        let d = r.recv();
        some_fun(d);
    }
}

I imagine that under “old Rust”, any one of do_more, condition, r.recv(), and some_fun could have triggered a scheduler yield! Obviously there is almost no cheap way to allocate stacks in advance and so segmented stacks are needed. However, imagine that all yields had to be lexical:

proc do_it(r: AsyncReceiver<i32>) {
    let a = 2;
    let b = 5;
    let c = do_more(a, b);
    if condition() {
        ;
    }
    else loop {
        match r.try_recv() {
            Ok(d) => { some_fun(d); yield; },
            Empty => { yield; continue; },
            Closed => { break; }
    }
}

Not only do you have a finer control over when each green thread suspends (in this case, it’s in the loop after a single receive), but we can determine the max size of the persistent stack we need: about 26 bytes, sans alignment. We know this because the functions that are called can’t possibly trigger a yield, so their results can be stored on the worker thread’s stack, while bindings like a, b, c, and r, and the procedure’s state, can be stored in the procedure struct itself. I am of course assuming a compilation strategy that turns do_it into its own state machine, much like C#'s async/await or the stateful crate/plugin for Rust. …"