Pre-RFC: explicit proper tail calls


#1

This is a pre-RFC for explicit proper tail calls via become.

My motivation for this is that during work on a Scheme VM in Rust, I learned that tail calls are the best way to implement jump tables, but Rust doesn’t guarantee proper tail calls, so this would crash in debug mode.


Introduction

Many languages support proper tail calls, which means that a call that does nothing after the call and returns the return value of the callee consumes no stack space. These are mostly functional languages, such as Scheme, ML variants, and Haskell. Other languages that support proper tail calls include ECMAscript 6 (though not supported by most implementations) and Lua. Perl supports an explicit tail call (via the syntax goto &sub).

This proposal is like Perl, in that tail calls are explicit.

Motivation

Proper tail calls allow for iteration to be expressed as tail-recursive loops. They also eliminate the stack space penalty of functions that simply call another function with slightly different arguments. Finally, they allow jump tables to be expressed as arrays of functions, which can speed up interpreter loops.

Syntax

A proper tail call is written using the become keyword. This keyword must precede a function or method call or an overloaded operator invocation.

Examples

/// Infinite loop
pub fn infinite_loop() -> ! {
   become infinite_loop()
}

/// Iterate over a slice recursively
pub fn recursive_slice_iteration(x: &[u8]) {
   do_something(x[0]);
   become(recursive_slice_iteration(x[1..]))
}

Semantics

A become statement is typed like a return statement. That is, it is considered to diverge, and the argument of become must have the same type as that of the function calling become.

The execution and lifetime semantics of a become statement are as if the following occurred:

  1. The lifetimes of all objects that are not being passed by value to the callee end, as if their variables went out of scope. Destructors are called for objects that implement Drop. Note: this is an observable change in behavior (other than reduction in stack usage) if the destructor has observable side-effects.
  2. All objects that are being passed to the callee by value are moved into temporary variables, even if they implement Copy. Their original variables go out of scope.
  3. Execution is transferred to the callee, with the temporary variables as arguments.

This implies that:

  • it is forbidden for a tail-called function to be have a reference to the immediate caller’s stack frame (lifetime error).
  • It is permitted to pass arguments by value to a tail-called function, even if they are local variables and even if they have destructors.

There are additional restrictions on tail-called functions. These are not necessary for soundness, but are a consequence of the implementation details of Rust.

  • A tail-called function must not be extern (except for extern "rust" or extern "rust-intrinsic"), since tail-calls are incompatible with the standard C calling convention.
  • If tail-called function is in a crate to which the calling crate will be dynamically linked, the call must not be made through the PLT, due to LLVM limitations. I suspect this is because the linker-generated trampolines will mishandle the code. Therefore, the compiler will generate code as if the GCC option -fno-plt was given (which is a good idea anyway). This may require “mangling” the assembler generated by LLVM to rewrite @function to @object (it did for GHC). This is a robust transformation and is used in GHC’s production-quality LLVM backend.
  • Tail calls are not supported on MIPS and System Z (at least per the LLVM docs). This is a bug and will be fixed by patches to LLVM. Tail calls are also not supported on GPU backends; this may be harder to fix (I have no idea).

Implementation

Tail calls map directly to the corresponding support in LLVM. Rust uses the fastcc calling convention, which supports tail calls. If Rust changes calling conventions, tail calls would need to be implemented for the new calling convention.

An optional part of this includes switching to -fno-plt code generation. I prefer this solution, since it is faster and avoids PLT-related security bugs, but it is not mandatory except for actual tail calls.

Alternative approaches

It would be possible to forbid any local variables that implement Drop from being in scope (except those passed to the callee by value) when a tail-call is made. This would avoid a silent semantics change when become is replaced by return (or visa versa), but would likely require a large number of manual drop calls.


#2

Why is this necessary? I would prefer become to be a statement, similar to return, that can appear anywhere and be followed by a semicolon. E.g., I would like this to work:

pub fn some_func(data: DataType, processor: Option<SomeType>) -> u32 {
    if let Some(p) = processor {
        become p.some_method(data);
    } else {
        become default_func(data);
    }
}

This seems very similar to an early return from the compiler’s point of view: things that are being retained (return value or arguments) are stored in the proper location, all other values are dropped, and control leaves the function. Thus, it would seem very odd to me if one could perform an early return but not an early become.

I also don’t see why this is required. As long as the calling convention has the callee pop arguments, it should be possible to prepare the stack and jump to a dynamic symbol just the same as if that symbol were linked statically.

It might be worth noting that supporting explicit tail calls would probably require sticking with a callee pops arguments convention of some sort, since the caller wouldn’t know how much stack to pop if the callee tail called a function taking a different set of arguments.

Overall, :+1: I’d really like to see this.


#3

The limitation regarding dynamic linking is an LLVM restriction. From LLVM docs:

On x86-64 when generating GOT/PIC code only module-local calls (visibility = hidden or protected) are supported. On ppc32/64 GOT/PIC only module-local calls (visibility = hidden or protected) are supported.

I updated the pre-RFC to note that this restriction, what I belive to be the reason for this restriction, and how to work around it.

Also the same page suggest that MIPS and System Z do not support tail calls.

Fixing these would require improvements to LLVM.

I agree that become should be a statement like return.


#4

As a polyglot developer, I don’t keep up with all the internals, so I’d appreciate some additional context, even if it is through links to previous relevant discussions.

In the Introduction, can you summarise Rust’s current use of tail calls? You say Rust doesn’t guarantee proper tail calls, but are there any circumstances in which Rust does use tail calls?

Is this a proposal to add tail calls for additional circumstances, or is the proposal that it should be possible to indicate that Rust must use a tail call?

What should be the behaviour if code contains a become statement, but tail-calls are not supported on a platform?

As an Alternative Approach, would it be possible for the Rust compiler to be more aggressive about using tail calls for returns that meet particular criteria?

Thanks, Jon.


#5

It’s a way to tell the compiler that it must use tail calls, plus some destructor magic to make this more feasible. Anything it does with destructors, though, can be mimicked in current Rust code using explicit drop calls.


#6

Rust already produces tail calls in certain cases. However, this is not done in debug mode, so one cannot rely on it for correctness. Furthermore, tail calls are not emitted when any destructors would need to be called after the tail call. Therefore, they are fragile with regards to refactoring.

The compiler could be more aggressive about tail calls. However, this would hurt stack traces and would still be fragile with regards to destructors. Under this proposal, become changes when destructors run in such a way that their mere presence does not break tail calls.


#7

Has the use of a continuation passing style trampoline been considered? I did a little searching but turns out anything containing both “rust” and “trampoline” is a difficult query :wink:

I wonder if something like http://home.pipeline.com/~hbaker1/CheneyMTA.html could be done leveraging lifetimes rather than GC


#8

This is a very bad idea for Rust. CPS-style trampolines make very poor use of CPU caches, and impose massive overheads on FFI.


#9

Thanks for the insight! Good to know


#10

How does this compare to the existing tail call RFC? It looks like we’re duplicating some of the discussion from there.


#11

Firstly, that RFC (from the little I read of it) seems to restrict users to passing primitive types, which is a severe limitation. This averts that by passing other types by value into the callee.

Also, this pre-RFC specifies rules for borrow-checking tail calls to ensure soundness.

Finally, my proposal allows for values with destructors to be live in the caller at the time the tail call is made, by changing the moment when the destructors run. This is a huge ergonomics improvement.


#12

I assume this would be emitted to LLVM musttail? If so, here are more docs I haven’t seen in this thread: http://llvm.org/docs/LangRef.html#call-instruction

Disjointed thoughts:

  • LLVM mentions “(possible bitcasted)”. Is there any value in allowing become SomethingThatGivesI32() as u32;? (I suppose not doing it now doesn’t hurt, since it can be loosened later.)
  • LLVM always says “the caller and callee prototypes must match”, though pointers need only match in address space. Does that mean the some_func above would actually get rejected due to differing arguments? Does that mean this would almost always be dealing in borrows, rather than owned things (so a number match would be sufficient instead of a type match )?
  • It’d be a shame to lose the usual implicit return syntax. What does this (trivial) function look like if forced to tail? fn fact_a(n: i32, a: i32) -> i32 { if n == 0 { a } else { fact_a(n-1, n*a) } } I assume I can’t add the become before the if since it probably can only take a function. Do I need to explicitly return a; if I use become in the else? Crazy idea: else { become(fact_a)(n-1, n*a) }. I suppose that would mean return borrow(foo)(); if you wanted to return
  • I’m very curious whether all the different drop logic is needed. Would it actually be common to have an argument or local that both needs drop and is neither consumed before the become or moved into the become? (Especially given all the restrictions this will probably have, at least at first.)

#13

Do I need to explicitly return a; if I use become in the else?

Probably not. become statements would diverge, and divergent statements can coerce to anything else. That function could be written like this:

fn fact_a(n: i32, a: i32) -> i32 { if n == 0 { a } else { become fact_a(n-1, n*a) } }

#14

Oh, thanks. I’d totally misunderstood how return actually worked.

Hmm, I guess I made a false implication from this note in the book that’s right before a return-with-a-semicolon example:

but with a semicolon, it would return () instead

Interesting that “!” and “!;” have the same type, but “i32” and “i32;” don’t. (And it’s not that Never infects everything after it; it seems “!” and “!;()” don’t have the same type either.) That’s a bit odd, but I guess not something that can be fixed now.


#15

Yeah, Never infects the entire statement or expression, but not everything after it. It’s a bit weird, but it’s added to allow C programmers a bit of leeway.


#16

This would be emitted to LLVM musttail where possible, or to tail otherwise. LLVM does actually emit a tail call for tail in most cases; the remaining cases would become bugs that should eventually be fixed.

I do think that allowing a cast (either as or std::mem::transmute – the latter only in unsafe code, obviously, and only where compilable) of the return value would be valuable, but it can wait.

I think that become should be allowed almost anywhere that return currently is. become on an expression tries to tail-call the last operator in the expression. (While somewhat unusual, this can be useful due to operator overloading.). become on an if or match statement behaves as if propagated into each arm of the statement.

Your example would be written as:

fn fact_a(n: i32, a: i32) -> i32 {
    if n == 0 { a } else { become fact_a(n - 1, n * a) }
}

I am not sure whether this code should be accepted.

fn fact_a(n: i32, a: i32) -> i32 {
    become if n == 0 { a } else { fact_a(n - 1, n * a) }
}

The main problem with accepting it is figuring out where the locals of fact_a get dropped.

The reason the drop logic is needed is code like this:

fn id<a, b>(x: a, y: &mut b) -> a {
    x
}

fn foo<a>(x: Vec<a>, y: bool) -> Vec<a> {
    if y {
        x
    } else {
        let bar = ();
        let baz = Box::new(());
        become id(x, &mut bar)
    }
}

Here:

  • passing x to id is fine, since ownership of x is transferred to id.
  • Having baz in scope is okay, since it is not used after the tail call is made. But it cannot be dropped after id returns, since foo's stack frame got clobbered. So it must be dropped before jumping to id.
  • Passing &mut bar is not okay – it refers to an address in foo's stack frame, which was overwritten by id's.

If the become had been omitted, this would have been valid code, since bar would last until foo returned. But if we declared that all locals were dropped before the become, then x would have been destroyed before being passed to id.