MIR Specification


#1

Hello,

  • Is there any MIR specification out there? No!
  • Is it possible to get a MIR textual output out of rustc?

Why?

I have seen around more than once how desirable it is to have a GCC backend for rustc and I agree. As a GCC contributor I am happy to have a start at it, however my rust-fu is at a minimum. With this I would hope to get to learn more about rust and help the community.

I have twice in the past shown interest in helping but got no feedback so my intention here is to be pro-active. I think, given the current compiler design as I understand it, that we need to pick up rust at MIR and convert MIR into GCC internal representation. This seems to be the feasible way to do it. However, I really need to understand first in detail what is MIR and get a feel for it.

So, let me rephrase my initial questions:

  • where are the MIR structures inside rustc and where does the conversion to LLVM IR happen?
  • is MIR changing on a regular basis?
  • is it possible to get rustc (if not done yet) to output MIR in textual form and read it back again?

Any thoughts on this, suggestions, offers to mentor on the rust side would be very helpful.


#2

The definition of MIR is in rustc::mir. The conversion to LLVM IR happens in what has been known rustc_trans for many years, though it will soon be renamed rustc_codegen_llvm or something like that, reflecting that it’s becoming one of multiple backends.

Speaking of which, rustc is in the process of being reorganized to more easily support multiple backends, but it’s not done yet. These changes will be required for a high-quality GCC backend (or any non-LLVM backend) so for the time being you may notice various important bits (e.g., layout computation, until https://github.com/rust-lang/rust/issues/45226 is fixed) being in LLVM-specific code. I hope it’s far enough along that you won’t be outright blocked from compiling non-trivial code.

Some aspects are currently in flux, but mostly in areas that don’t concern backends (more related to analyses like borrow checking). However, MIR is 110% an implementation detail of rustc, so it’s certainly possible for any part of it to change at a moment’s notice.

This would be extremely controversial to say the least. It doesn’t help that (AFAIK) nobody has described a good use case for this. Using a textual form does not provide any more stability than using the rustc libraries. Furthermore, MIR is not standalone – most importantly, it reuses the entire trait and type system of Rust but does not have facilities for defining types and traits.

If your intent is to be able to work with MIR without rebuilding rustc every time you change something, note that there’s a better solution for that: link to the rustc libraries to build a custom driver, like miri or rlsl.

I highly recommend joining the #rustc IRC channel and asking questions. I’ve learned a lot just from lurking there and occasionally asking for clarifications. While I can’t commit to full on mentoring, I’m more than happy to answer questions when I’m around, and in my experience the other folks in there are as well :slight_smile:


#3

I know very little about rustc internals, but if a GCC backend does materialize, I’ll certainly be using it :slight_smile:


#4

MIR seems to me to be a good candidate for eventual promotion from implementation-detail to stable, specified interface language. This would be useful not only for supporting different toolchain backends (as mentioned already), but for interpreters (wouldn’t it be good if MIRI had a stable interface? Wouldn’t a Rust Jupyter kernel be great?) and even for facilitating the development of new languages with certain Rust-like behaviors. (MIRI + simple, limited syntax = new scripting language with ownership semantics and guaranteed data-race safety, I think?)


#5

As we will soon have a LTS (Rust 2018) version, I believe this is a good time to freeze MIR for this version at least, and also provide a comprehensive language specification. This will allow third party development and keep the mainstream development the same way we have today.


#6

Is Rust 2018 edition intended to be LTS? That’s a big commitment and I thought it was explicitly not intended to be LTS.


#7

Not necessarily. There is an RFC for a LTS release however.


#8

Why would it be controversial? LLVM has been doing it for its own IR for ages. OP didn’t assert that its purpose would be stability. A human-readable, first-class, serializable form would be very useful and good for compiler developers’ convenience.

@OP: please note that the top-level Mir struct already implements Debug, RustcEncodable and RustcDecodable. You may find that this fact could solve at least part of your problem.


#9

Couldn’t agree more. Furthermore, a standard binary format would also be VERY* useful for backend developers.


#10

I’m not sure in what contexts you want to extract such output, but if you are on the nightly compiler and are willing to utilize a (very) unstable feature, you can use -Z dump-mir=<function name>.

E.g. -Z dump-mir=main will dump the MIR for the fn main function and for every block of code associated with compiling fn main (e.g. const-promoted expressions, closure bodies, etc), before and after each MIR transformation pass, into a subdirectory named mir_dump/. Many files are generated, using a filename like rustc.main-{{closure}}.000-004.UniformArrayMoveOut.after.mir (the general format is something like rustc.<function-name>[-<kind>].<pass-number>.<pass-name>.(before|after).mir)


#11

(however we have no facility to read back in the textual MIR dumps… I have often privately mused about trying to provide some sort of asm.js like feature where one would write MIR input expressed via a subset of Rust, but I’ve not gotten past the musing stage of such thinking…)


#12

Relatedly, when I was working on some MIR codegen changes, trying to figure out what my MIR should look like, it would have been much easier if the rust compiler was able to skip the rust->MIR phase and read a MIR “source” instead, so that I could have tried the changes as MIR “source” changes, rather than first figure out how to generate that MIR and realize that doesn’t work. Especially considering how long it takes to build the compiler.


#13

Right. This is what I thought. The compiler should really be a “staged” compiler:

Rust -> HIR -> MIR -> LIR? -> LLVM IR -> Machine code

we should be able to stop at any stops above, have the output collected and reviewed/adjusted/corrected, then continue the later steps.