So, once PR #28748 lands, MIR construction will be enabled unconditionally. Now seems like a great time to talk about how we can expedite and manage the transition from the existing AST to MIR. The way I see it, there are a few primary questions:
- Organization
- When should we construct MIR, and show should we store it?
- What passes should be ported to MIR?
- How to port trans, in particular?
- How to do dynamic drop?
Organization
As you can see there is lots of independent work below. I plan to move these work items into checkboxes on the MIR tracking issue. I will try to work my way through, but others can sign up if there is something you’d like to work on.
When to construct MIR and where do we store it?
I think we ought to construct MIR as late as possible, but we will have to construct a MIR for each fn and store it. I had at some point hoped that we could perhaps build the MIR for one function at a time and throw it away in between fns, but that doesn’t work with monomorphization in trans I fear.
The other question is whether MIR must co-exist with HIR/AST. I would also love to be able to throw away the HIR at some point, but I think that’s probably not on the short term roadmap. Certainly we can’t do that until we’ve ported all the passes that currently use HIR. But it’s a reason to reshuffle things to try and group the ones that use MIR at the end.
What passes should be ported to MIR?
This is the more interesting question. I went briefly through the passes in the compiler and this is what I found:
- Better suited to HIR:
- lint (also uses AST) – because it cares about the precise syntax used by user etc
- typeck and predecessors – because we can’t build MIR w/o it
- check_const – because MIR doesn’t really deal w/ constants, but see below
- stability – because it is sensitive to the precise path that was used to reach various items
- privacy – because it is sensitive to the precise path that was used to reach various items
- Better suited to MIR:
- rvalue checking – because identifying rvalues is trivial in MIR
- dead-code – because MIR is simpler to traverse and it is more evident when something is being used
- reachability – as above
- match checking – because I think we do something basically equivalent during MIR construction; see below
- effect – this is trickier, because the scope of unsafe regions is not preserved in MIR right now; however, identifying derefs and casts is very easy, so I feel like the code would be more robust if impl’d on MIR. Therefore, I think we should add information to track unsafe regions to MIR.
- intrinsicck – because it is easier to identify calls to transmute
- trans – but of course
There were two cases listed above that are worth calling out and discussing a bit more:
check_const. This pass today has many functions:
- it checks that the initializers for statics and consts are constant expressions, as well as some other contexts
- if the comments are to be trusted, checks that their types meet various criteria
- identifies constant expressions appearing throughout the code that are candidates for hoisting
For the time being, MIR doesn’t really know much about const or static items and constant initializers, so I imagine that this should continue to operate on HIR. (For example, the values of those initializers are still stored in HIR expressions.)
However, goal #3 I would prefer to fold into a MIR constant folding pass.
check_match. This checks for exhaustiveness and unused cases. I think this work could be done during MIR construction as well. The basic idea is that when we are generating code for a match, we are also (in effect) enumerating all the possible cases. If the match is not exhaustive, this will manifest as reaching a point where we have no remaining arms, but there are still uncovered cases. Similarly, unreachable patterns manifest as patterns that are never matched.
How to port trans?
For most of the passes above, I think we can just port them wholesale. Trans is a bit more complex, since there are so many parts to it: optimizations, debuginfo, etc. I think what might make sense is to work on getting a “spike” through where we add an option -Z mir-trans
and work on getting some simple tests working. That can be landed. We can then incrementally improve it until all thinks work. We will also want to do tests to try and measure performance of the generated code.
This might be a good time to build up an “Are We Fast Yet?”-like infrastructure for measuring performance of the generated code. I’m very interested in suggestions for “real world” benchmarks that we ought to test, as well as microbenchmarks.
How to do dynamic drop?
I’d like to try and articulate my plan for dynamic drop separately, but I think it could be done very well on MIR. Basically the MIR now inserts drops conservatively. The idea would be to go through and refine those drops to only drop the things that need to be dropped. We can also make drop take a boolean flag parameter (which is sometimes the constant true
) to represent the tracking flags. I figure we will do this after the safety checks have been done.
Thoughts?