Tail-call elimination: the third rail. I figure there should be a warning sign discussing this topic; it’s been beat to death a number of times, so how dare I revive it yet again. But please. Forget about Rust, the language, for a second, and think about stack frames.
Consider a long sequence of tail calls: f() -> g() -> h() -> j() -> k()… there are two possibilities, either the sequence repeats or it doesn’t (Law of the Excluded Middle, On Interpretation, Aristotle, ca 350 BC). However, if the sequence is long enough that TCE actually matters, it obviously repeats, because the last time I wrote 6174 different functions that all individually tail-called each other was… never. Because nobody writes code like that – okay, so this isn’t quite rigorous – we know that at some point the sequence repeats and calls the same function over and over; furthermore, because that function (and its descendants) call at most a finite group of functions a finite number of times, like this: a() -> b() -> a() -> c() -> b() -> a() …; we call it the “state-machine”.
In the C calling convention, whoever started this chain has to figure out how to end it, which means that the calling function is going to deallocate the stack space that this chain is using when it finishes. In principle this is hard, because the caller doesn’t know how much stack space the final callee needs. But it does know enough for us to do TCE if all of the functions use the same amount of stack space to store their arguments, which is an optimization currently implemented by both gcc
and llvm
as “sibling-call optimization”.
Now, of the functions in the state-machine, each requires a certain size of stack arguments, and this means that one of them requires the largest stack-argument-size, because any finite set has a supremum. It therefore is possible for all of the other functions in the state-machine to also use this amount of space, which is only a very slight overallocation in any case, if only the caller knew in advance what sort of party it was really attending (the functional kind, with immutable glowsticks and pattern-matching ecstasy or something just bear with me here dammit). As before, if all functions in the chain use the same amount of stack space (for the function signature), llvm
can perform tail-call elimination.
So consider a couple of functions like this:
func a(foo: int) -> int { foo > 5 ? b(foo, 1) : 6 }
func b(foo: int, bar: int) -> int { foo + bar }
func main(ohcrapiforgotthesignatureofmain) { //do some stuff a(sna) //do some other stuff }
We can imagine the stack like this:
(in a)
main | foo >
(in b)
a | foo | bar>
Clearly, if we tail-call here, main() doesn’t know what to do. But we can also write it like this:
(in a)
main | 0x00 | foo>
(in b)
main | foo | bar>
Now even if main() is uniquely responsible for dereferencing everything it doesn’t matter; main() was going to pop two variables off the stack whether or not a() had tail-called b().
We can achieve this effect by padding the signature of any function which wants to make a tail-call, and we can calculate it all at compile-time: if a function returns with a become
statement, we can calculate the tree of it’s possible tail-call targets (a() become
s b() become
s c() or d(), e.g.), and pad the signature to the size of the largest callee. Here’s some relatively simple code that solves this problem for some made-up functions.
But what if a larger function wants to become
smaller? For the most part, this is trivial: the becomee will just put its args in the top slots and ignore the extra, which is automatically deleted by the original caller whenever the tail-call chain ends. So if b(foo, bar) become
s a(foo) then:
(in b)
main | foo | bar >
(in a)
main | foo (ignored) | foo >
a() doesn’t need to know that it has too much stack space, and main() will have no problem freeing it anyway. Thus, tail-calls work with cdecl, whether we’re going to become
bigger or smaller. With this method, we restrict the use of become
to functions that are compiled in the same compilation unit as the caller (due to requisite signature padding), and we expect the compiler to do some extra work for us to export a meaningful .h
file:
int a(char __pad[4], int foo);
int b(int foo, int bar);
But wait, you said, LLVM only does sibling-call optimization if the function signatures have matching types!
This objection is valid. In principle, types are a lie: one could simply replace the signature of any affected function with an array of u8[]
and recover the original types inside the function without doing any actual computation. In practice, this is probably not feasible. On the other hand, it’s certainly conceivable that LLVM could be convinced to add the relevant functionality, since it’s not really costing anything.
But please don’t think I’m saying that tail-call elimination should be implemented, much less by 1.0, or even that tail-call elimination is feasible to implement, which is a decision that should be made by someone who has some experience with the Rust codebase (i.e. not me), only that it can be implemented, which is really the interesting technical question, to me, anyway. Also, I might have missed something. ^,~