As I described in the comparison with LLVM linked above, Cretonne's intermediate representation attempts to span the same abstraction levels that LLVM spans with 4 different representations. To achieve this, Cretonne uses a basic data structure consisting of an SSA data flow graph with a simple type system, and a layout of instructions in extended basic blocks with few constraints. This fabric is then populated with instructions from a quite large instruction set which spans many abstraction levels. At a given abstraction level, only a subset of the instructions are used, but many instructions (like basic arithmetic) are used at multiple abstraction levels.
Sharing a fabric and common instructions minimizes the memory and allocation bandwidth needed to switch abstraction levels since most instructions can simply stay the same. Incidentally, the concept of an IR fabric is also the origin of the Cretonne name.
WebAssembly instructions can be mapped directly to a subset of the Cretonne instructions. This subset is sandbox-safe and has no undefined behavior at all. A WebAssembly
load instruction is mapped to a Cretonne
heap_load instruction which includes a bounds check:
v10 = heap_load.i16 v1, 4
This instruction will either return the
i16 value loaded from
v1+4 in the wasm heap, or it will produce a trap if the address is out of bounds. The trap is not optional or undefined like it is in C.
Cretonne internally transforms this instruction to a lower abstraction level:
v2 = heap_addr v1, 6
v10 = load.i16 v2, 4
heap_addr performs a bounds check and translates the wasm heap offset in
v1 to a native address in
load instruction corresponds to a native load instruction, and is not part of the sandbox-safe subset. (Neither is
heap_addr since it leaks the location of the wasm heap). The
heap_addr instruction is then further expanded to code that checks that the range
[v1;v1+6[ are valid heap offsets and so on.
In the expanded code, should we delete the
load.i16 instruction if the loaded value is unused? This is the kind of optimization that can lead to surprises, as Chris Lattner describes. Two things to consider:
- It would be perfectly safe to delete the
load.i16 instruction because the
heap_addr bounds check is right there, and it is guaranteed to never return an address that would trap.
- Cretonne is a low-level code generator, so it expects that obviously redundant code has already been optimized out by higher-level optimizers. Therefore, the benefits of deleting loads like this are very small.
Initially, Cretonne won't have optimizations like this, but I agree with the strategy @brson lays out above of gradually introducing annotations and transformations that enable higher-level optimizations. For example, load instructions could get a
notrap flag which asserts that the address is known to not cause a trap:
v2 = heap_addr v1, 6
v10 = load.i16 v2, 4, notrap
notrap flag would give the optimizer permission to delete an unused load. When Cretonne is used as a backend for the Rust compiler, presumably
load instructions would be injected directly, not via a safe
head_load version as wasm does. You then have the option of adding the
notrap flag or not, and the choice affects the tradeoff between performance and predictable behavior when unsafe Rust code violates its constraints.
This is not fundamentally different from the various annotations supported by LLVM. The important difference is that we don't start out with the whole set of C-derived undefined behaviors, but that we can introduce annotations and optimizations gradually and evaluate their tradeoffs as we go.
Stretching the fabric
I think Cretonne's IR fabric could support mid-level optimizations like those performed on LLVM IR, even if that is not initially in scope for the project. The larger type system in LLVM doesn't inform the optimizers much, except perhaps in the ability to distinguish pointers from integers. Cretonne could easily add a single
address type if needed.
As @eddyb mentioned above, Cretonne uses integer indexing and side-tables via the
EntityRef trait and the
EntityMap data structure. This means that it is also possibly to add things like use-lists if that is deemed necessary for the higher-level optimizations. The low-level passes don't need them, so they can simply be discarded before code generation.
I imagine mid-level optimizations could be implemented in a separate crate from the low-level Cretonne code generator, sharing the IR fabric. There are still use cases for Cretonne that don't need this type of optimization.
The inliner loop
Should inlining happen on MIR or the mid-level Cretonne IL? Two points to consider are:
- Inlining heuristics are usually driven by code size estimates of the callee function. This works best if the callee has already been optimized as much as possible, after its own calls have been inlined. This can be achieved by traversing the call graph in a bottom-up order, inlining and optimizing before moving on to the next function. In languages like Rust and C++, multiple levels of function calls can collapse into almost no code, making it very difficult to estimate code size without preceding optimizations.
- Optimizations following inlining can expose more opportunities for inlining because indirect calls are turned into direct calls by constant propagation. For Rust, this applies to method calls on trait objects. It is beneficial to retry inlining following optimizations that discover direct calls.
Both of these considerations imply that inlining works best if it is followed by optimizations on the same intermediate representation. In particular, it is not optimal to perform inlining on rustc's MIR and then immediately lower to the next IR.
If Rust has cases where inlining is required for correctness, as the
#[inline(always)] attribute is hinting, it could make sense to inline those calls on MIR and let the following passes deal with heuristic-driven inlining where the purpose is performance.