Should the tail recursion expression be called `become`, or something else?

scottmcm · October 7, 2024, 4:07pm

I don't like an attribute for something that changes its semantics this significantly. It changes the drop order of local variables, so it really is a different thing from a return, even if you have infinite stack space.

CAD97 · October 7, 2024, 5:27pm

And even if you don't have any drop glue in the current stack frame, become still results in all current locals/temporaries being deallocated before becoming the called function. This can even block optimizations in exclusively safe code, and is why TCO is difficult — the locals' storage must be disjoint from any other stack allocated objects. Thus LLVM must not do TCO unless it can prove that no pointer to a function local ever escapes, and therefore the program can never observe that the new stack frame overlaps the current one (e.g. via ptr comparison).

tczajka · October 7, 2024, 5:37pm

The RFC is much more restricted than I thought:

A further restriction is on the function signature of the caller and callee. The stack frame layout is based on the calling convention, arguments, as well as return types (the function signature in short). As the stack frame is to be reused it needs to be similar enough for both functions. This requires that the function signature and calling convention of the calling and called function need to match exactly.

With this restriction the verb become makes a bit more sense.

But I find this overly restrictive. Many mutually-recursive patterns with more than one function won't satisfy this.

ryanavella · October 7, 2024, 5:46pm

I think I must be missing something obvious here. A function's local storage is always disjoint from other stack frames, no? And isn't it already UB for a function's locals to escape the lifetime of said function?

Vorpal · October 7, 2024, 7:22pm

I was about confused about the need for matching signatures as well. Really as long as the return type match, that should be all that is needed. (Probably same async/non-async as well though? Haven't really thought through how or if TCO works with async.)

However reading the RFC it seems to indicate a lot of the restrictions are to make the initial implementation easier and the scope smaller. Those restrictions could presumably be lifted down the line.

One limitation as I understand it from reading the GH issue is that the extern "Rust" calling convention uses caller cleanup. That would indeed be more restrictive, and the limitations make sense. I think you would need to match both the number of arguments in registers and the size of the area containing spilled arguments on stack in that case. And I have no idea who is responsible for calling drop in that case. That might be really gnarly. Callee-cleanup is simpler for tail recursion purposes for sure.

(I guess the main advantage to caller cleanup would be that it is required for C style varargs functions? Neither seems safer than the other when you have mangled symbols. Without that caller cleanup is probably a bit safer, in the face of dynamic linking, though mismatching signatures can still cause a lot of screwup regardless.)

CAD97 · October 7, 2024, 7:31pm

"Escape" means that a pointer was stored somewhere. If you call f(&local) and f is opaque to the optimizer (i.e. it's not inlined), optimizations must conservatively assume the pointer may have been stashed somewhere and future opaque calls may inspect that pointer. As long as they don't do any accesses through said pointer, lifetimes do not matter.

If your function ends with return g(…), then the allocated object for local is still live during the call to g (your function hasn't returned yet), and every other allocated object must not have an address which overlaps local. This prevents TCO, which would reuse the stack space that local is stored in. If your function ends with become g(…), however, then the allocated object for local is deallocated before any locals of g are allocated, thus permitting TCO to occur, even if f is a fully opaque arbitrary function.

Identical function signature (or an ABI-compatible one) is required for TCO to be possible in many/most modern calling conventions (other than tailcc, which is designed for TCO).

Caller cleanup is also slightly faster overall since you can get away with less stack manipulation (reusing argument stack space between/over multiple calls).

Vorpal · October 7, 2024, 7:45pm

I got nerd sniped into a deep dive on first the wikipedia page, followed by LLVM lang ref. There does seem to be a few more specialised calling conventions that support this: ghccc, swifttailcc and cc 11 (HiPE).

Strange that there is no Rust calling convention on that page. What is the difference between extern "C" and extern "Rust" then, and where is it handled?

CAD97 · October 7, 2024, 8:33pm

extern "Rust" lowers to using the "C" ABI. As of current, IIRC, the difference is that we splat types with "Scalar" and "ScalarPair" ABI (≤2 pointer-sized bits of data) into scalar arguments at the C ABI level, whereas the C ABI always passes every user-defined type (i.e. non-scalar) by reference. There might also be a difference on whether our by-ref arguments are caller-copied or callee-copied — IIRC ours are caller-copied (so callee can always just use by-ref args in the place passed), but I don't recall how the C ABI is. It may differ per target, even; the Rust ABI just lowers to use a regular pointer at the LLVM level, not byref.

TL;DR rustc turns extern "Rust" signatures into extern "C" for codegen in some unspecified way.

If we want to invest in e.g. a custom Result ABI we'll probably need a custom rustcc instead of using ccc. Unless we can reuse the swiftcc support, perhaps.

Vorpal · October 7, 2024, 8:48pm

That is an interesting idea. The question is if it would be worth it performance wise vs maintenance. And if there are any other interesting optimisations that could be done apart from Result. Would probably take a lot of experimentation to figure this out, and I guess support needs to be added to core LLVM C++ code (i.e. you can define the ABI directly via whatever bindings Rust uses for LLVM). Another question is what to do about alternative backends. As I understand it, cranelift at least wants to be ABI compatible with LLVM (so you can mix and match per crate).

If Rust want to do this, it should probably be done before there are too many official alternative backends with ABI compatibility ambitions (in order to not make it too complex).

ryanavella · October 7, 2024, 8:52pm

Let me see if I get this straight. Silly example for demonstration purposes:

fn evil() -> i32 {
  let x = 0;
  f(&x);
  become g();
}

One way for f to smuggle the local is to convert the &i32 into a *const i32 and stuff it in a static somewhere, perhaps an AtomicPtr<i32>. If f is not trivially inlinable, LLVM has no easy way to tell if g inspects the smuggled value, so it can not perform TCO.

But in Rust this requires writing a questionable unsafe block in at least one location. So presumably we could hint to LLVM that it is safe to perform TCO? I am interested in counterexamples, as I know my example is oversimplistic.

scottmcm · October 7, 2024, 9:18pm

About musttail from LLVM Language Reference Manual — LLVM 20.0.0git documentation

In addition, if the calling convention is not swifttailcc or tailcc:

All ABI-impacting function attributes, such as sret, byval, inreg, returned, and inalloca, must match.

The caller and callee prototypes must match. Pointer types of parameters or return types may differ in pointee type, but not in address space.

(So it's possible that we could allow some mismatches, like transparent types and &-vs-&mut-vs-NonNull, but that's not needed for MVP.)

Another part of the experiment is, IIRC, an extern "rust-tail" ABI which would be tailcc in LLVM and thus have much looser restrictions on which signatures are allowed.

afetisov · October 8, 2024, 12:06am

Attributes on expressions are not stable, and AFAIK not on any track to stabilization. This is not only restrictive, but would also lead to pretty dumb errors:

fn foo() {
    // Ok
    #[tail] return;
}

fn bar() {
    // error[E0658]
    #[tail] return
}

fn quux() {
    match () {
        // error[E0658]
        () => #[tail] return,
    }
}

lordan · October 8, 2024, 1:00pm

Adding a random brainstorming suggestion that doesn't require a new keyword:

fn foo(x: f64) -> f64 {
    prepare();
    continue in sin(x)
}

binarycat · October 8, 2024, 6:42pm

it's worth noting that become is already a reserved keyword.

elidupree · October 9, 2024, 10:50pm

For what it's worth, as a random user of Rust, I feel like it is fine for it to be an attribute, because, well… I don't know the rules of drop order of local variables anyway! If I'm writing code that depends on drop order of local variables, you can bet I'll be dropping them explicitly. I would go so far as to say that anyone who's relying on drop order would be best to write explicit drops, for code readability.

The implicitness of drops makes me think of them similarly to optimizations – "so what if the behavior of my code can be changed invisibly, it's only about stuff I hopefully wasn't relying on anyway"

scottmcm · October 9, 2024, 11:03pm

It's less about the order between the variables and more about what's available in the tail call.

You do things like

let x = ...;
return foo(&x)

all the time, and know that's fine.

But if you

let x = ...;
become foo(&x)

instead you'll get an error, so the difference is very much in your face -- even if you're not trying to do anything tricky.

elidupree · October 9, 2024, 11:35pm

Hmm, I see – although I still don't find it particularly surprising for an attribute to cause a compile error, considering things like #[derive] giving an error if you attach them to something that isn't suitable for them, for any number of reasons. This example doesn't bother me because there's nothing "tricky" here – I just get an immediate error that presumably will explain exactly what the problem is, there isn't a change to the meaning of my code that could cause me to make a mistake.

The only "tricky" thing I've thought of is if you were using something like a mutex guard "manually" (i.e. for controlling access to something that's not inside it) and were depending on it to still be locked during the tail call (but that's the "depending on drop order" that I was thinking of in my last post – you're better off using coding practices that would make that mistake impossible anyway)

tczajka · October 10, 2024, 11:25am

Attributes can already change semantics more significantly. For example:

#[cfg(feature = "abc")]
return 5;

Compile-time errors are the best kind of errors.

CAD97 · October 10, 2024, 9:07pm

The difference is that attributes like #[cfg] and proc macro attributes are limited to changing the behavior of the code item or statement which they decorate. become changes behavior "outside" of the return, at least using a usual understanding of how setting the result value of the function and exiting the function scope(s) compose together. Dropping any in-scope locals isn't done as part of return, it's part of the scope exit, which is "outside of" the return.

You essentially can't^[1] even desugar the drop order effects of become to surface level code at all, because there's no capability to end the storage lifetime of impl Copy arguments before exiting the function.

You might be able to with an inner function, but I'm not in the right mindspace to figure out if that's ever observably different in subtle ways. Plus optimization will probably suffer from going through the pseudo trampoline. ↩︎

RalfJung · October 11, 2024, 12:34pm

Presently, we make no guarantee at all about when a program has a stack overflow. That doesn't make all Rust programs "useless". So I think you are exaggerating here.

We can easily say that even with become/tail calls, we still make no guarantee about stack usage, but you are making it a lot easier for the compiler to reduce stack usage so it's much more likely to work out in your favor -- and arguably it is a compiler bug if the stack explodes from tail calls, even if it's "just" a quality-of-implementation bug, not a soundness/fail-to-uphold-hard-semantic-guarantee bug.

Topic		Replies	Views
Pre-RFC: explicit proper tail calls language design	16	14103	March 25, 2019
Did Rust ever consider cactus stacks?	34	2657	January 29, 2023
A new intrinsic: `can_panic_unwind` libs	26	1875	January 29, 2020
Proposal: Bounded continuations - non-local control flow without the mess language design	39	5186	March 25, 2019
Formal specs and optimizations in general language design	89	5494	January 8, 2021

Should the tail recursion expression be called `become`, or something else?

Related topics