Global Mir Pass

LLVM has the concepts of ModulePass and FunctionPass.

bool runOnModule(Module &M)
bool runOnFunction(Function &F)

The FunctionPass is similar to the MIRPass. It works at the function-level. The ModulePass has a global view. It sees all globals and functions.

LLVM is building a new inliner as a module pass with ML and PGO. It will be more flexible in the order in which it can inline functions. There is also a new function specialization module pass.

I am wondering whether there is interest in a GlobalMIRPass. Similar to LLVM's ModulePass, it has a global view. It would enable new optimizations on MIR.

I assume that's only true for everything in a codegen unit? And even with codegen-units=1 it'd still just be a single crate?

ModulePass is an LLVM term. A module is similar to a translation unit in C or C++.

For Rust, I would imagine that a GlobalMIRPass sees a single crate.

If you have global variable:

const FOO: i32 = 5;

You can replace all loads with 5 with a GlobalMIRPass.

In complex cases, you could prove that a global is only used by one function.

I don't think so. It is pretty much impossible to have global passes without breaking incremental compilation or making it utterly ineffective. MIR passes exist mostly to make the backend churn through less IR which in turn improves compilation speed. Breaking incremental compilation is completely counter to that.

It could be restricted to non-incremental builds? Since that's the default for release builds could still be useful.

I believe there are two independent factors to consider:

  • I believe IPO on MIR can be more precise and powerful than the generic LLVM optimisations.
  • If global MIR passes break incremental build, then they are double useful for release mode.

Independent of this, GlobalMIRPass could still track what changed.

Note that you still can use incremental for release builds, though it does necessarily preclude some optimization.

Depending on what exactly (cg unit) global passes see, they are at least possible to incrementalize with careful recording of inputs to only rerun on changed input.

E.g. the global const load substitution could be structured such that it reruns on all code when the const changes but otherwise only reruns on changed code (and salsa-style incrementalize such that unchanged output doesn't rerun later derivations).

But this example is a poor one; uses of a const can (and I think may already sometime) be inlined with procedure-local analysis because the value is const-known already.

Did I peaked enough interest for a GlobalMirPass?

The MirPass works on a Body. The rustc dev book believes that it works on Mir.

  • What is the the type for the GlobalMirPass?
  • How to integrate global pass into the pass pipeline?

Tasks:

  • I would crate a GlobalMirPass trait inspired by the MirPass.
  • I would write a small no-op IPO pass for reference.
  • I would integrate the no-op pass into the pass pipeline.

Any help would be appreciated.

Does that need a global pass, though? It sounds basically identical to inlining, which already works today.

1 Like

I think it would be helpful to provide some specific examples of where a global MIR optimization pass would be useful. Writing and maintaining correct MIR optimizations is complicated and, at least for right now, the additional complexity is really only worthwhile if they provide improved compiler throughput, better code quality or some combination thereof.

It sounds like you're interested in improving the quality of optimized code. Are there specific cases where LLVM currently generates low quality code? What optimizations would need to be implemented to improve them and why do the optimizations need to be global?

2 Likes

I am mostly interested in optimisation for the release mode. I am not the guy writing the IPOs. I want to enable people to write IPOs.

My main motivation is that optimisations (IPO or not) on MIR are more precise and powerful than generic LLVM optimisations. In LLVM function parameters are either by value or a pointer. In Rust, you have by value, borrowed, mutable borrowed, and maybe pointers. This distinction becomes only valuable with global optimisations.

LLVM also has metadata which allows encoding some of those properties into the IR. rustc will generate aliasing metadata for references in recent versions (godbolt).

1 Like

This information is given to LLVM (e.g. for its IPO). Specifically, the current translation is (using the eventually-default opaque pointers mode) [godbolt]

mir llvm ir
by_val(_1: i32) @by_val(i32 %_1)
by_ref(_1: &i32) @by_ref(ptr noalias noundef readonly align 4 dereferenceable(4) %_1)
by_ref_mut(_1: &mut i32) @by_ref_mut(ptr noalias noundef align 4 dereferenceable(4) %_1)
by_ref_cell(_1: &UnsafeCell<i32>) @by_ref_cell(ptr noundef align 4 dereferenceable(0) %_1)[1]
by_pin(_1: &(i32, PhantomPinned)) @by_pin(ptr noalias noundef readonly align 4 dereferenceable(4) %_1)
by_pin_mut(_1: &mut (i32, PhantomPinned)) @by_pin_mut(ptr noundef align 4 dereferenceable(4) %_1)
by_pin_cell(_1: &(UnsafeCell<i32>, PhantomPinned)) @by_pin_cell(ptr noundef align 4 dereferenceable(0) %_1)[1:1]
by_raw_ptr(_1: *mut i32) @by_raw_ptr(ptr %_1)
by_nonnull_ptr(_1: NonNull<i32>) @by_nonnull_ptr(ptr noundef nonnull %_1)
by_nonscalar_val(_1: Data) @by_nonscalar_val(ptr noalias nocapture noundef dereferenceable(N) %_1)

I may have missed some combinations that translate to different LLVM IR attributes. This mapping may change in the future, and tighter bounds (e.g. readnone and a lot of function-level attributes) can be used/inferred if a function body is present (I used extern to suppress this).


  1. changed to !dereferencable(0) in rust-lang/rust#98017 ↩︎ ↩︎

4 Likes

That's great, but the difference between a local and global inliner should be obvious. I still believe that IPO on MIR could help. Getting the infrastructure into rustc should not take much effort. Then we can decide whether the new passes have an impact on performance.

The thing is... it's not. Inlining itself is a highly local decision.

There is a split between top-down inlining (don't lose semantic guarantees lost by transformations) and bottom-up inlining (inlined chunk is smaller thus a better candidate for inlining; includes discovered consts), but both are still localized decisions rather than global decisions.

I think the disconnect may be that from within procedure X, asking "is called procedure Y suitable for inlining" is a local decision and not a global one. You could get slightly better results by doing loop_until_fixpoint { inlining_pass; all_other_optimization; }, but this still doesn't require structuring as IPO.

Note that even e.g. global value numbering typically isn't an IPO; the "global" there refers to working accross multiple SSA basic blocks, not being interprocedural.

I agree that GVN is not IPO. I disagree with the inliner. Have a look at my first post. LLVM is designing a new global inliner with ML and PGO. You need to know the call graph, the sizes of the functions, and maybe the hot and cold functions.

More interesting are probably global constant propagation or global dead code elimination.