As I described in the comparison with LLVM linked above, Cretonne’s intermediate representation attempts to span the same abstraction levels that LLVM spans with 4 different representations. To achieve this, Cretonne uses a basic data structure consisting of an SSA data flow graph with a simple type system, and a layout of instructions in extended basic blocks with few constraints. This fabric is then populated with instructions from a quite large instruction set which spans many abstraction levels. At a given abstraction level, only a subset of the instructions are used, but many instructions (like basic arithmetic) are used at multiple abstraction levels.
Sharing a fabric and common instructions minimizes the memory and allocation bandwidth needed to switch abstraction levels since most instructions can simply stay the same. Incidentally, the concept of an IR fabric is also the origin of the Cretonne name.
WebAssembly subset
WebAssembly instructions can be mapped directly to a subset of the Cretonne instructions. This subset is sandbox-safe and has no undefined behavior at all. A WebAssembly load
instruction is mapped to a Cretonne heap_load
instruction which includes a bounds check:
v10 = heap_load.i16 v1, 4
This instruction will either return the i16
value loaded from v1+4
in the wasm heap, or it will produce a trap if the address is out of bounds. The trap is not optional or undefined like it is in C.
Cretonne internally transforms this instruction to a lower abstraction level:
v2 = heap_addr v1, 6
v10 = load.i16 v2, 4
Here, heap_addr
performs a bounds check and translates the wasm heap offset in v1
to a native address in v2
. The load
instruction corresponds to a native load instruction, and is not part of the sandbox-safe subset. (Neither is heap_addr
since it leaks the location of the wasm heap). The heap_addr
instruction is then further expanded to code that checks that the range [v1;v1+6[
are valid heap offsets and so on.
Unsafe optimizations
In the expanded code, should we delete the load.i16
instruction if the loaded value is unused? This is the kind of optimization that can lead to surprises, as Chris Lattner describes. Two things to consider:
- It would be perfectly safe to delete the
load.i16
instruction because the heap_addr
bounds check is right there, and it is guaranteed to never return an address that would trap.
- Cretonne is a low-level code generator, so it expects that obviously redundant code has already been optimized out by higher-level optimizers. Therefore, the benefits of deleting loads like this are very small.
Initially, Cretonne won’t have optimizations like this, but I agree with the strategy @brson lays out above of gradually introducing annotations and transformations that enable higher-level optimizations. For example, load instructions could get a notrap
flag which asserts that the address is known to not cause a trap:
v2 = heap_addr v1, 6
v10 = load.i16 v2, 4, notrap
The notrap
flag would give the optimizer permission to delete an unused load. When Cretonne is used as a backend for the Rust compiler, presumably load
instructions would be injected directly, not via a safe head_load
version as wasm does. You then have the option of adding the notrap
flag or not, and the choice affects the tradeoff between performance and predictable behavior when unsafe Rust code violates its constraints.
This is not fundamentally different from the various annotations supported by LLVM. The important difference is that we don’t start out with the whole set of C-derived undefined behaviors, but that we can introduce annotations and optimizations gradually and evaluate their tradeoffs as we go.
Stretching the fabric
I think Cretonne’s IR fabric could support mid-level optimizations like those performed on LLVM IR, even if that is not initially in scope for the project. The larger type system in LLVM doesn’t inform the optimizers much, except perhaps in the ability to distinguish pointers from integers. Cretonne could easily add a single address
type if needed.
As @eddyb mentioned above, Cretonne uses integer indexing and side-tables via the EntityRef
trait and the EntityMap
data structure. This means that it is also possibly to add things like use-lists if that is deemed necessary for the higher-level optimizations. The low-level passes don’t need them, so they can simply be discarded before code generation.
I imagine mid-level optimizations could be implemented in a separate crate from the low-level Cretonne code generator, sharing the IR fabric. There are still use cases for Cretonne that don’t need this type of optimization.
The inliner loop
Should inlining happen on MIR or the mid-level Cretonne IL? Two points to consider are:
- Inlining heuristics are usually driven by code size estimates of the callee function. This works best if the callee has already been optimized as much as possible, after its own calls have been inlined. This can be achieved by traversing the call graph in a bottom-up order, inlining and optimizing before moving on to the next function. In languages like Rust and C++, multiple levels of function calls can collapse into almost no code, making it very difficult to estimate code size without preceding optimizations.
- Optimizations following inlining can expose more opportunities for inlining because indirect calls are turned into direct calls by constant propagation. For Rust, this applies to method calls on trait objects. It is beneficial to retry inlining following optimizations that discover direct calls.
Both of these considerations imply that inlining works best if it is followed by optimizations on the same intermediate representation. In particular, it is not optimal to perform inlining on rustc’s MIR and then immediately lower to the next IR.
If Rust has cases where inlining is required for correctness, as the #[inline(always)]
attribute is hinting, it could make sense to inline those calls on MIR and let the following passes deal with heuristic-driven inlining where the purpose is performance.