Transitioning to MIR


#1

So, once PR #28748 lands, MIR construction will be enabled unconditionally. Now seems like a great time to talk about how we can expedite and manage the transition from the existing AST to MIR. The way I see it, there are a few primary questions:

  1. Organization
  2. When should we construct MIR, and show should we store it?
  3. What passes should be ported to MIR?
  4. How to port trans, in particular?
  5. How to do dynamic drop?

Organization

As you can see there is lots of independent work below. I plan to move these work items into checkboxes on the MIR tracking issue. I will try to work my way through, but others can sign up if there is something you’d like to work on.

When to construct MIR and where do we store it?

I think we ought to construct MIR as late as possible, but we will have to construct a MIR for each fn and store it. I had at some point hoped that we could perhaps build the MIR for one function at a time and throw it away in between fns, but that doesn’t work with monomorphization in trans I fear.

The other question is whether MIR must co-exist with HIR/AST. I would also love to be able to throw away the HIR at some point, but I think that’s probably not on the short term roadmap. Certainly we can’t do that until we’ve ported all the passes that currently use HIR. But it’s a reason to reshuffle things to try and group the ones that use MIR at the end.

What passes should be ported to MIR?

This is the more interesting question. I went briefly through the passes in the compiler and this is what I found:

  • Better suited to HIR:
    • lint (also uses AST) – because it cares about the precise syntax used by user etc
    • typeck and predecessors – because we can’t build MIR w/o it :slight_smile:
    • check_const – because MIR doesn’t really deal w/ constants, but see below
    • stability – because it is sensitive to the precise path that was used to reach various items
    • privacy – because it is sensitive to the precise path that was used to reach various items
  • Better suited to MIR:
    • rvalue checking – because identifying rvalues is trivial in MIR
    • dead-code – because MIR is simpler to traverse and it is more evident when something is being used
    • reachability – as above
    • match checking – because I think we do something basically equivalent during MIR construction; see below
    • effect – this is trickier, because the scope of unsafe regions is not preserved in MIR right now; however, identifying derefs and casts is very easy, so I feel like the code would be more robust if impl’d on MIR. Therefore, I think we should add information to track unsafe regions to MIR.
    • intrinsicck – because it is easier to identify calls to transmute
    • trans – but of course

There were two cases listed above that are worth calling out and discussing a bit more:

check_const. This pass today has many functions:

  1. it checks that the initializers for statics and consts are constant expressions, as well as some other contexts
  2. if the comments are to be trusted, checks that their types meet various criteria
  3. identifies constant expressions appearing throughout the code that are candidates for hoisting

For the time being, MIR doesn’t really know much about const or static items and constant initializers, so I imagine that this should continue to operate on HIR. (For example, the values of those initializers are still stored in HIR expressions.)

However, goal #3 I would prefer to fold into a MIR constant folding pass.

check_match. This checks for exhaustiveness and unused cases. I think this work could be done during MIR construction as well. The basic idea is that when we are generating code for a match, we are also (in effect) enumerating all the possible cases. If the match is not exhaustive, this will manifest as reaching a point where we have no remaining arms, but there are still uncovered cases. Similarly, unreachable patterns manifest as patterns that are never matched.

How to port trans?

For most of the passes above, I think we can just port them wholesale. Trans is a bit more complex, since there are so many parts to it: optimizations, debuginfo, etc. I think what might make sense is to work on getting a “spike” through where we add an option -Z mir-trans and work on getting some simple tests working. That can be landed. We can then incrementally improve it until all thinks work. We will also want to do tests to try and measure performance of the generated code.

This might be a good time to build up an “Are We Fast Yet?”-like infrastructure for measuring performance of the generated code. I’m very interested in suggestions for “real world” benchmarks that we ought to test, as well as microbenchmarks.

How to do dynamic drop?

I’d like to try and articulate my plan for dynamic drop separately, but I think it could be done very well on MIR. Basically the MIR now inserts drops conservatively. The idea would be to go through and refine those drops to only drop the things that need to be dropped. We can also make drop take a boolean flag parameter (which is sometimes the constant true) to represent the tracking flags. I figure we will do this after the safety checks have been done.

Thoughts?


#2

I would like privacy checks to be triggered by resolve and typeck and use the HIR map to a greater extent.
That would also allow us to use it in typeck to skip private fields and methods during autoderef.


#3

Makes sense to me, though it seems a bit orthogonal to MIR.


#4

Right, what I mean is that by the time typeck completes, there are no more privacy checks to be done.
Possibly the same can be done for stability?


#5

This reminds me of another bit of MIR work that I would like to do, which is to “re-type-check” the MIR as a sanity check. This should be much simpler than before since there would be no inference to be done.


#6

rvalue checking – because identifying rvalues is trivial in MIR

Absolutely

dead-code – because MIR is simpler to traverse and it is more evident when something is being used reachability – as above

Nice, but these is pretty stable (so low value).

match checking – because I think we do something basically equivalent during MIR construction; see below

The algorithms are somewhat different, and patterns will be cleaned-up in the HIR work - may be better to separate them.

effect – this is trickier, because the scope of unsafe regions is not preserved in MIR right now; however, identifying derefs and casts is very easy, so I feel like the code would be more robust if impl’d on MIR. Therefore, I think we should add information to track unsafe regions to MIR.

this is a rather trivial pass, and it rather depends on lexical nesting, so I am not sure.

intrinsicck – because it is easier to identify calls to transmute

The utility of this pass is dubious because of associated types - we should probably make it a lint (but we need ABI-accessing lints for that).

trans

of course


#7

You mean, because we don’t know what monomorphizations we will need before trans? We could add a separate monomorphization pass. Although, in the MIR world, such a pass would be an MIR pass, so I guess that doesn’t work out. It’s possible we might want such a pass for other reasons, though; for example, it would probably produce better diagnostics.


#8

If we tie the code generation to this check, it has the advantage that they will not get out of sync. But I’ll take a stab at it perhaps and see how well it works. I’ve not made up my mind. The main challenge is constructed a human-readable counter-example, but that seems fairly straightforward as well – when we reach the point in the algorithm where we detect a missing case, we have a path and a test outcome (e.g., enum variant).


#9

Yes, that’s what I mean. It is plausible that we could just do monomorphization expansion as a pre-pass and make a complete list of what we will need, but I’m not sure it’s worth the trouble.


#10

The main advantage to doing it on the MIR is the fact that all derefs are clean and explicit. It might actually be nicer to just run it on the HAIR that is used as input to the MIR, which also shares this property (eventually, I’d like to have typeck produce actual HAIR).


#11

The main advantage to doing it on the MIR is the fact that all derefs are clean and explicit. It might actually be nicer to just run it on the HAIR that is used as input to the MIR, which also shares this property (eventually, I’d like to have typeck produce actual HAIR).

Derefs? only raw pointer derefs are dangerous, and these are explicit. Method calls are checked via the method map. Otherwise, we only need to use expr_ty_adjusted (we don’t - https://github.com/rust-lang/rust/issues/28776 - but the potential for screw-ups is endless anyway).


#12

Ideally, I think they should support autoderef. I’m not really sure why they don’t except…that they don’t. (Though at some point the fact that it would have been very hard to enforce safety in that case was a factor, but the current setup makes it easier to do.)


#13

Woah woah woah; are yous seriously suggesting raw pointers auto-deref?

Keep in mind they have several methods.


#14

Ideally, I think they should support autoderef.

That would be very scary - even within an unsafe block, I prefer to keep raw pointer operations explicit.


#15

Yes, I was. You don’t want that? I’m surprised. I find it so annoying to write code like (*x).foo.

True. So does Rc, though (for better or worse).

Well, I don’t agree, but it doesn’t really matter. If I ever make a concrete proposal, we can argue about it there. I was mostly just trying to argue that we want to be as clear as possible about identifying unsafe operations. I do agree that the lexical scoping of unsafe blocks is not a good fit for MIR, but clearly identifying what operations take place IS a good fit.

In any case, * is no longer the only potentially unsafe operator. Per RFC 1240, taking the address of a field of a packed struct is also unsafe. To make this check work properly for match statements seems like it will require something like the EUV, which I expect to go away in favor of MIR.


#16

Rc has no methods outside of trait impls: https://doc.rust-lang.org/nightly/std/rc/struct.Rc.html


#17

Huh. OK, I stand corrected. :slight_smile: I actually checked rustdoc before writing that though, not sure what I was looking at. Maybe a trait impl.


#18

Or perhaps Arc: http://doc.rust-lang.org/std/sync/struct.Arc.html#method.downgrade though I see those are unstable, which seems good.


#19

To make this check work properly for match statements seems like it will require something like the EUV, which I expect to go away in favor of MIR.

We really should just make our HIR pattern story better, and then we could use just the HIR (+ visit_pat).

Anyway, I don’t want the MIR to be involved with lexical scoping. Unsafety checking is a lexical operation.


#20

Perhaps. Do you have a more specific proposal in mind? Figuring out just what is being borrowed does require at least a bit of calculation over the HIR, as mem_categorization does today.

I understand this feeling and I agree that in general lexical scoping is not a good fit for the MIR. That said, MIR DOES do a good job of exposing the kind of low-level details effect checking wants to see.

Overall my feeling is that effect checking lies somewhere in the middle. So we can either stretch HIR down, and do more safety analysis on HIR that I might prefer to do, or bend MIR a bit up, and track either lexical scopes or the unsafety setting. I guess the right answer will depend on just what HIR winds up looking like.