One more thing, related to the question of how much benefit can actually be derived from compiler optimizations depending on various aspects of the memory model.
Something about Rust thatâs long bothered me is that you sometimes end up using âuselessâ types like &i32
or &&T
or even &Box<T>
or &&[T]
. Sometimes this happens because of generics, other times because of encapsulation (i.e. you donât really have &Box<T>
but &U
, where U
is an opaque struct whose private data currently consists of a Box). By useless I mean that unless (a) you care about pointer identity or (b) youâre using the kind of unsafe mutate-behind-the-compilerâs-back shenanigans alluded to in the blog post, there is no runtime benefit to using a pointer rather than passing a copy of the underlying data. So it would be nice if the compiler could automatically transform the former to the latter. Sometimes this can already happen as a side effect of inlining, but not always. Even with mutable references, in some cases it would be advantageous to transform f(&mut x)
into x = f(x)
, keeping the value in registers.
In Nikoâs example, being able to defer the load until later in the function would be nice, but eliminating the load altogether would be nicer. Indeed, in practice there would often be little or no actual benefit to doing the former, depending on how much other register pressure existed at various points. In other cases, such as if something later in the function did its own load of *v
and the compiler could merge the two, thereâd be more of a clear win, but then some potential memory model rules exist that would allow that but not the original transformation. Fully de-pointerizing v only works if the reference is forbidden to change value behind the compilerâs back at any point during its existence - at least during its actual runtime lifetime - even long before any code actually loads from it. (It canât work at all for cells, but the compiler already special-cases those.) And I think itâs at least possible that it could provide a significant practical win.
But wait, what about pointer identity? If you have the value, you could always make a new temporary pointer if required, but you canât get the original pointer. I once thought perhaps references shouldnât have pointer identity guarantees for this reason⌠but that ship has sailed; those guarantees are unambiguously provided by stable Rust. And there are other things that donât work without the original pointer, like a fn foo(t: &T) -> usize
thatâs secretly implemented as { t as *const T as usize }
- canât use a new temporary if it escapes.
Well, one possibility is providing both the pointer and the value, which is expensive but might still be advantageous in some cases. But more useful is basing it on interprocedural optimization. All you need is for the compiler to calculate a flag per reference function argument that means âI never cast this to a raw pointer, pass it along to another function with the flag, or do anything else unusual, so just give me the valueâ.
Significantly, this can even work across crates due to Rustâs compilation model: the flag could be stored in crate metadata. C canât even do this across source files in a single project without LTO, and it fundamentally canât do it across dynamically linked libraries because itâs expected that the library implementation can change without recompiling clients. In the future Rust might want to make similar guarantees about a stable ABI, but there are so many performance caveats w.r.t. inlining and generics that I expect stable-ABI items or crates will be explicitly marked somehow, preserving the ability to optimize in the common case.
By the way, I was wondering whether similar flags could be used to replace memory model assumptions: to accept a need for interprocedural optimization (yuck, I know, but UB is also yuck), but make it relatively ubiquitous. But I think this doesnât really work (even without de-pointerization), since with a conservative memory model itâs hard to predict what kinds of operations could be dangerous in the first place. Also, itâs probably too brittle for core optimizations: trait objects and function pointers break all such static reasoning, for example.