I don't like an attribute for something that changes its semantics this significantly. It changes the drop order of local variables, so it really is a different thing from a return, even if you have infinite stack space.
And even if you don't have any drop glue in the current stack frame, become
still results in all current locals/temporaries being deallocated before becoming the called function. This can even block optimizations in exclusively safe code, and is why TCO is difficult โ the locals' storage must be disjoint from any other stack allocated objects. Thus LLVM must not do TCO unless it can prove that no pointer to a function local ever escapes, and therefore the program can never observe that the new stack frame overlaps the current one (e.g. via ptr comparison).
The RFC is much more restricted than I thought:
A further restriction is on the function signature of the caller and callee. The stack frame layout is based on the calling convention, arguments, as well as return types (the function signature in short). As the stack frame is to be reused it needs to be similar enough for both functions. This requires that the function signature and calling convention of the calling and called function need to match exactly.
With this restriction the verb become
makes a bit more sense.
But I find this overly restrictive. Many mutually-recursive patterns with more than one function won't satisfy this.
I think I must be missing something obvious here. A function's local storage is always disjoint from other stack frames, no? And isn't it already UB for a function's locals to escape the lifetime of said function?
I was about confused about the need for matching signatures as well. Really as long as the return type match, that should be all that is needed. (Probably same async/non-async as well though? Haven't really thought through how or if TCO works with async.)
However reading the RFC it seems to indicate a lot of the restrictions are to make the initial implementation easier and the scope smaller. Those restrictions could presumably be lifted down the line.
One limitation as I understand it from reading the GH issue is that the extern "Rust"
calling convention uses caller cleanup. That would indeed be more restrictive, and the limitations make sense. I think you would need to match both the number of arguments in registers and the size of the area containing spilled arguments on stack in that case. And I have no idea who is responsible for calling drop
in that case. That might be really gnarly. Callee-cleanup is simpler for tail recursion purposes for sure.
(I guess the main advantage to caller cleanup would be that it is required for C style varargs functions? Neither seems safer than the other when you have mangled symbols. Without that caller cleanup is probably a bit safer, in the face of dynamic linking, though mismatching signatures can still cause a lot of screwup regardless.)
"Escape" means that a pointer was stored somewhere. If you call f(&local)
and f
is opaque to the optimizer (i.e. it's not inlined), optimizations must conservatively assume the pointer may have been stashed somewhere and future opaque calls may inspect that pointer. As long as they don't do any accesses through said pointer, lifetimes do not matter.
If your function ends with return g(โฆ)
, then the allocated object for local
is still live during the call to g
(your function hasn't returned yet), and every other allocated object must not have an address which overlaps local
. This prevents TCO, which would reuse the stack space that local
is stored in. If your function ends with become g(โฆ)
, however, then the allocated object for local
is deallocated before any locals of g
are allocated, thus permitting TCO to occur, even if f
is a fully opaque arbitrary function.
Identical function signature (or an ABI-compatible one) is required for TCO to be possible in many/most modern calling conventions (other than tailcc, which is designed for TCO).
Caller cleanup is also slightly faster overall since you can get away with less stack manipulation (reusing argument stack space between/over multiple calls).
I got nerd sniped into a deep dive on first the wikipedia page, followed by LLVM lang ref. There does seem to be a few more specialised calling conventions that support this: ghccc
, swifttailcc
and cc 11
(HiPE).
Strange that there is no Rust calling convention on that page. What is the difference between extern "C"
and extern "Rust"
then, and where is it handled?
extern "Rust"
lowers to using the "C"
ABI. As of current, IIRC, the difference is that we splat types with "Scalar" and "ScalarPair" ABI (โค2 pointer-sized bits of data) into scalar arguments at the C ABI level, whereas the C ABI always passes every user-defined type (i.e. non-scalar) by reference. There might also be a difference on whether our by-ref arguments are caller-copied or callee-copied โ IIRC ours are caller-copied (so callee can always just use by-ref args in the place passed), but I don't recall how the C ABI is. It may differ per target, even; the Rust ABI just lowers to use a regular pointer at the LLVM level, not byref
.
TL;DR rustc turns extern "Rust"
signatures into extern "C"
for codegen in some unspecified way.
If we want to invest in e.g. a custom Result
ABI we'll probably need a custom rustcc instead of using ccc. Unless we can reuse the swiftcc support, perhaps.
That is an interesting idea. The question is if it would be worth it performance wise vs maintenance. And if there are any other interesting optimisations that could be done apart from Result. Would probably take a lot of experimentation to figure this out, and I guess support needs to be added to core LLVM C++ code (i.e. you can define the ABI directly via whatever bindings Rust uses for LLVM). Another question is what to do about alternative backends. As I understand it, cranelift at least wants to be ABI compatible with LLVM (so you can mix and match per crate).
If Rust want to do this, it should probably be done before there are too many official alternative backends with ABI compatibility ambitions (in order to not make it too complex).
Let me see if I get this straight. Silly example for demonstration purposes:
fn evil() -> i32 {
let x = 0;
f(&x);
become g();
}
One way for f
to smuggle the local is to convert the &i32
into a *const i32
and stuff it in a static
somewhere, perhaps an AtomicPtr<i32>
. If f
is not trivially inlinable, LLVM has no easy way to tell if g
inspects the smuggled value, so it can not perform TCO.
But in Rust this requires writing a questionable unsafe
block in at least one location. So presumably we could hint to LLVM that it is safe to perform TCO? I am interested in counterexamples, as I know my example is oversimplistic.
About musttail
from LLVM Language Reference Manual โ LLVM 20.0.0git documentation
In addition, if the calling convention is not
swifttailcc
ortailcc
:
- All ABI-impacting function attributes, such as sret, byval, inreg, returned, and inalloca, must match.
- The caller and callee prototypes must match. Pointer types of parameters or return types may differ in pointee type, but not in address space.
(So it's possible that we could allow some mismatches, like transparent types and &
-vs-&mut
-vs-NonNull
, but that's not needed for MVP.)
Another part of the experiment is, IIRC, an extern "rust-tail"
ABI which would be tailcc
in LLVM and thus have much looser restrictions on which signatures are allowed.
Attributes on expressions are not stable, and AFAIK not on any track to stabilization. This is not only restrictive, but would also lead to pretty dumb errors:
fn foo() {
// Ok
#[tail] return;
}
fn bar() {
// error[E0658]
#[tail] return
}
fn quux() {
match () {
// error[E0658]
() => #[tail] return,
}
}
Adding a random brainstorming suggestion that doesn't require a new keyword:
fn foo(x: f64) -> f64 {
prepare();
continue in sin(x)
}
it's worth noting that become
is already a reserved keyword.
For what it's worth, as a random user of Rust, I feel like it is fine for it to be an attribute, because, wellโฆ I don't know the rules of drop order of local variables anyway! If I'm writing code that depends on drop order of local variables, you can bet I'll be dropping them explicitly. I would go so far as to say that anyone who's relying on drop order would be best to write explicit drops, for code readability.
The implicitness of drops makes me think of them similarly to optimizations โ "so what if the behavior of my code can be changed invisibly, it's only about stuff I hopefully wasn't relying on anyway"
It's less about the order between the variables and more about what's available in the tail call.
You do things like
let x = ...;
return foo(&x)
all the time, and know that's fine.
But if you
let x = ...;
become foo(&x)
instead you'll get an error, so the difference is very much in your face -- even if you're not trying to do anything tricky.
Hmm, I see โ although I still don't find it particularly surprising for an attribute to cause a compile error, considering things like #[derive]
giving an error if you attach them to something that isn't suitable for them, for any number of reasons. This example doesn't bother me because there's nothing "tricky" here โ I just get an immediate error that presumably will explain exactly what the problem is, there isn't a change to the meaning of my code that could cause me to make a mistake.
The only "tricky" thing I've thought of is if you were using something like a mutex guard "manually" (i.e. for controlling access to something that's not inside it) and were depending on it to still be locked during the tail call (but that's the "depending on drop order" that I was thinking of in my last post โ you're better off using coding practices that would make that mistake impossible anyway)
Attributes can already change semantics more significantly. For example:
#[cfg(feature = "abc")]
return 5;
Compile-time errors are the best kind of errors.
The difference is that attributes like #[cfg]
and proc macro attributes are limited to changing the behavior of the code item or statement which they decorate. become
changes behavior "outside" of the return
, at least using a usual understanding of how setting the result value of the function and exiting the function scope(s) compose together. Dropping any in-scope locals isn't done as part of return
, it's part of the scope exit, which is "outside of" the return
.
You essentially can't[1] even desugar the drop order effects of become
to surface level code at all, because there's no capability to end the storage lifetime of impl Copy
arguments before exiting the function.
You might be able to with an inner function, but I'm not in the right mindspace to figure out if that's ever observably different in subtle ways. Plus optimization will probably suffer from going through the pseudo trampoline. โฉ๏ธ
Presently, we make no guarantee at all about when a program has a stack overflow. That doesn't make all Rust programs "useless". So I think you are exaggerating here.
We can easily say that even with become
/tail calls, we still make no guarantee about stack usage, but you are making it a lot easier for the compiler to reduce stack usage so it's much more likely to work out in your favor -- and arguably it is a compiler bug if the stack explodes from tail calls, even if it's "just" a quality-of-implementation bug, not a soundness/fail-to-uphold-hard-semantic-guarantee bug.