It's actually enabling allocation elision which takes extra effort; LLVM's default is of course to respect that functions do what they do. The global allocator symbols (__rust_alloc
, __rust_dealloc
, __rust_alloc_zeroed
, and __rust_realloc
) are specially handled to be annotated as an alloc family in LLVM.
The specific relevant LLVM Function Attributes are "alloc-family"="__rust_alloc"
(identifies what set it's a part of), allockind("alloc,aligned,uninitialized")
(identifies which function it is and initialization state) and allocsize(0)
(sets a minimum number of allocated bytes returned when nonnull).
Source comments also say that the Rust fork of LLVM 14 and earlier are patched to recognize the symbols and optimize them like malloc
etc., but I think later LLVM versions just use the function annotations now.
... actually from a quick test, it looks like we might actually not put those annotations on the symbols when defined locally via #[global_allocator]
? I'm divining behavior by looking at post-optimization LLVM IR so it might be all sorts of misleading, though, and absolutely shouldn't be relied upon. They definitely get added when using an unknown and/or known-default #[global_allocator]
, though. You'd need to ask T-compiler knowledge for better information here. I think this changed somewhat in 1.71 (notably, __rust_no_alloc_shim_is_unstable
).
I've yet to see Rust/LLVM actually replace a Box
allocation with a stack allocation. It's unfortunately quite a bit more difficult than you'd initially hope because of our panic/unwinding semantics (lack of a forward progress guarantee) and calling the dynamic handle_alloc_error
handler. Stack space is somewhat limited, and LLVM often seems loathe to relocate memory manipulated by pointer (likely due to C++ish address sensitivity concerns).
The theoretically simplest case which doesn't even check allocation failure, just unsafely assumes it was successful, still performs an actual allocation. Using a safe box instead optimizes almost identically, just with an extra check for null and conditional call to handle_alloc_error
; no unwinding pads are needed, as this example is carefully controlled to be known not to unwind except by handle_alloc_error
.
That being said, I should reiterate that Rust/LLVM absolutely can and does completely elide allocations. It'll even do it when leaking success-checked allocations of more than the entire address space.
It's solely replacement that I haven't actually observed being done.
A middle ground semantic that might be more agreeable than "Global
allocations are arbitrarily replaceable" might be "Global
allocations are removable, but if observed, they were provided by actually calling the allocator." This keeps most (if not all) of the optimizations that we currently do in practice, but is significantly more difficult to (specify and) provide.
The huge one is actually eliding allocations when constant folding code. Any concrete example will look silly, since it could just not use a box, but the reason inlining is the optimization is that it cuts through abstractions to expose exactly that kind of silly no-op code for other peephole optimizations to clean up.
A canonical example would be something like { let a = Box::new(2); let b = Box::new(2); *a + *b }
. If allocations are removable, this can be folded down to just a constant 4
. If allocations aren't removable, the best you would be able to do becomes at a low level roughly
{
let a = __rust_alloc(4, 4);
if a.is_null() { handle_alloc_error(4, 4); }
let b = __rust_alloc(4, 4);
if b.is_null() { handle_alloc_error(4, 4); }
__rust_dealloc(b, 4, 4);
__rust_dealloc(a, 4, 4);
4
}
and this is also assuming that handle_alloc_error
doesn't ever unwind (perhaps a reasonable restriction, but not one which the compiler currently exploits).
"Zero overhead abstraction" exists because the optimizer is able to strip out all of the extra unnecessary ceremony introduced. Opaque arbitrary external functions (e.g. allocation and handle_alloc_error
) are the perfect barrier to code stripping/folding optimizations.
One optimization which we don't do enough of yet, but is enabled by replaceable allocation, is in place initialization. Allowing Box::new(T::new())
to construct T in place requires turning
create T on stack (maybe unwind)
create heap allocation
maybe handle alloc error
move T from stack to heap
into
create heap allocation
if (alloc error) {
create T on stack (maybe unwind)
handle alloc error
} else {
(on unwind: delete heap allocation)
create T directly to heap (maybe unwind)
}
This requires inserting an allocation that might not otherwise exist. LLVM doesn't like doing this for obvious reasons, but the current draft opsem (allocation is not an observable event) does permit this transformation. It's certainly not perfect (stack space still needs to be available for if the allocation fails), but it is a known desirable.