So I've been wondering about something. From time to time, it is very convenient in the compiler to use recursive algorithms -- trees tend to lend themselves to that, after all. Usually it's better to structure such algorithms with an explicit queue or stack, but that can sometimes be quite awkward, particularly since the borrow checker generally favors real recursion. For those scenarios, I've been thinking that perhaps we ought to integrate support for something like the stacker crate, by @alexcrichton:
stacker basically lets you insert manual stack growth checks and switch over to a larger stack if you would exceed the one you have. An example where this would be useful is MIR construction, which currently uses recursion to track scoping. This leads to stack overflow on very large examples like #29466.
The downsides of using stacker that I can see are:
Currently, if you have switched to another stack segment, and you panic, it aborts, rather than shutting down cleanly. This can be fixed with catch_panic, iiuc.
We'd have to port this bit of assembler hackery over to new platforms in order to run the compiler. Seems doable.
It can be less memory efficient to use the program stack in this fashion, since you have to keep around the whole stack frame, whereas if you use an explicit stack you'll often pop up just the data you really need. Using stacker might encourage us down this road.
Iām with Manish on this one. Mostly because the call stack keeps both instruction pointer, saved registers, and the state being recursed on on the stack, whereas using an explicit stack can often get away with leaner data because no return information needs to be stored, and the āaddressā is often inferrable from the rest of the state.
Also while borrowck may favor recursion, LLVMās optimization passes are predominately geared towards imperative code, AFAIK.
a library in stdlib could provide the functionality.
This is analogous to how setjmp is provided as a library function in C. Or if you prefer, call-with-current-continuation in Scheme. It simply may not require much or any primitive language change. One might want nicer sugar for the feature, but we have yield reserved so that seems okay.
(Caveat: I have not given any consideration to how the
borrow-checking rules could be impacted by such a feature. I am
assuming it's a solvable problem at worst.)
and a library on crates could use the intrinsics to implement coroutines.
I don't think LLVM optimizations are really relevant here. I'm not really talking about small passes that can be readily encoded using a queue, more about the structure of complex passes like type-checking, MIR construction, etc, which naturally want to follow various tree structures implicit in the code. Using recursion allows us to (among other things) easily return values to callers and so forth, which can make the code significantly more readable.
Basically I guess the question (for me) is whether you want a pre-traversal or something more like a post-traversal. That is, if you have work to do on the way back up the tree, it is significantly more painful to encode using a heap-based data structure.
This sounds very similar to the problem found in rustc, and my thoughts would be:
The non-recursive version is definitely not as readable or easy to debug than the recursive version, itās just more difficult to reason about sometimes. This is somewhat contained as resolution is only a small part of Cargo, whereas typeck/MIR construction are pretty big portions of rustc.
The easy solution of ārun on a big stackā never ran into a problem.
Hacks to manually reduce stack usage basically donāt work.
Overall the winnings earned vs time spent ratio is much better in my opinion to leave the code as-is and figure out a way to get more stack. In the end this may not always be enough, but it worked well enough for Cargo and I suspect itās all thatās really needed for rustc.
What is the problem with just increasing the stack segment size? The memory usage should be the same compared to āallocating more stackā on the heap. Is changing the stack size difficult on some platforms?
This sort of roughly corresponds with my feeling. It's clear that using an explicit stack is more efficient. But often this explicit stack comes at a price: harder to read code, effort to refactor, etc.
Anyway, I think the best thing is that I will do some experiments both ways. I want to see if there is an easy way to isolate out the part of MIR construction that is overflowing just now and use an explicit stack. However, even if I do that, it will just mean that the code is "no worse" than the rest of the compiler, most of which uses recursive algorithms. I assume that if you write a torture test with a deeply nested match, for example, you will eventually blow the stack in typeck and elsewhere. MIR happens to behave differently here because it uses recursion to model the scopes, and each let introduces a new scope (most other parts of the compiler handle lets more imperatively).