I have thought about this rough category of problems a lot, so going to use this as an opportunity to get all of my thoughts in one place.
The semantics of Rust are pretty clear that arguments are passed by-value. At least that's my opinion and afaik also the opinion of the rest of UCG. Specifically, this means the code below must print two different values:
fn callee(b: Big) {
println!("In callee: {:p}", &b);
}
let b = Big::new();
println!("In caller: {:p}", &big);
callee(b);
If we were to "transparently" pass parameters by reference, we might print the same value both times, which would be a miscompilation.
What this means is that our ABI must have a copy in either the callee or the call site. Applied to the above example, either the callsite callee(b)
must be responsible for making a copy, or we must make a copy at the entry of the callee
function.
Where we make this copy has an effect on what we can optimize. If copy is made in the callee, optimizations in the caller cannot be used to remove the copy, which might lead to a missed optimization (and vice versa). We can come up with examples where either choice leads to a missed optimization.
Copying in the Callee
fn callee(b: Big) {
println!("In callee: {:p}", &b);
}
callee(B::new());
Here we must make a copy in the callee because the call site might look like the first example above.
Copying in the Caller
fn callee(_: Big) {}
let b = Big::new();
println!("In caller: {:p}", &big);
callee(b);
Here we required to make a copy at the call site because the callee might look like the first example above.
The second article above is basically claiming that copying in the callee is better than copying at the call site. The article does acknowledge in the update that this can also be a pessimization in some cases, but still claims that copying in the callee is better. That claim appears to me to be entirely unsubstantiated - it might be true, but I have no idea.
The problem is even more interesting than this though. We can also come up with an example like this:
fn inner(_: Big) {}
fn outer(b: Big) {
println!("In outer: {:p}", &b);
inner(b);
}
outer(Big::new());
The ideal number of copies to make in this example is zero. There is no real need to ever copy. However, if we consistently adhere to either of the above ABIs, this is unachievable. We must always make a copy in outer
, either for the inner(b)
call site or for the argument.
Fortunately, ABIs are not that important in Rust anyway. The vast majority of function calls in Rust can be statically resolved and we can pass additional metadata about the function. The only cases where we can't do this are function pointers, dynamic dispatch, and linking strategies that do not allow us to pass metadata (dylibs, stable rlibs, etc.). None of these are terribly common.
What this means is that we should instead look to infer the right ABI on a per-function basis. Essentially, consider a functions like inner
or outer
above. The optimizer would begin by assuming that the callee (ie the function being optimized) is responsible for making the copy, and try to elide the copy. If it fails to do this, the responsibility to make the copy is punted to the call site, which will also get a chance to elide the copy.
By doing this, we can achieve zero copies in the most recent example above. The ABI for inner
would be such that the callee has to make the copy (which is immediately elided because it's unnecessary) and the ABI for outer
would be such that the call site has to make the copy (which is also elided due to being unnecessary).
The article mentions the possibility of multiple entry points for a function depending on whether the copy needs to happen or not. Unfortunately I don't know of any codegen backend that has support for functions with multiple entry points. I had the same idea a while back though, and fortunately there's a trick we can use to implement this: Tail calls. Specifically, imagine that we decide that by ABI, copies must always be made in the callee, and consider a function like this:
fn foo(b: Big) {
println!("{:p}", &b);
}
Making the copy in the callee explicit, we'd codegen that function like this:
fn foo(b: &Big) {
let real_b = *b;
println!("{:p}", &real_b);
}
Alternatively, we could do this:
fn foo(b: &Big) {
let real_b = *b;
foo_inner(&real_b);
}
fn foo_inner(real_b: &Big) {
println!("{:p}", real_b);
}
The foo_inner
call can be codegened as a tail call, ie just a jmp. This means we do not have the cost of an extra function call. In effect, the combination of foo
and foo_inner
is a function with multiple entry points - one at foo
and one at foo_inner
.
The caller is now free to decide whether to call foo
or foo_inner
directly, if it decides that it does not care about the function call.
Interestingly, this "extra function with a tail call" strategy is actually nearly isomorphic to the "ABI inference" thing I suggested above. Specifically, imagine two minor changes: 1. We only generate the "wrapper function" if it is actually necessary, ie the copy can't be optimized out in the callee. 2. We slap #[inline(always)]
on the wrapper function.
By always inlining the wrapper, we effectively move the obligation to perform the copy to the call site (which now has the copy in the inlined body). The call site is now free to optimize out the copy (the copy). This is exactly equivalent to what would have happened if we had inferred a "callee is responsible" ABI.