Testing MIR optimizations


#1

So @scottcarr was asking me how to test the work he’s doing on MIR optimizations. It’s clear that we need some kind of harness. Any ideas on just how to do it? One thought is to combine the ability to dump MIR into a text file with something like LLVM’s FileCheck, as I think @nagisa once suggested. This seems… potentailly fine, but maybe there is a better way to go about it. (Anybody up for building such a thing? I’d be happy to work with mentoring, though I’m currently on vacation and thus working very limited hours.)

(Of course, adding more performance-oriented benchmarks would also be a good thing, as well as building up a better test runner.)


#2

To me, writing and checking the tests in MIR is the quickest path to getting up and running.

It will be a maintenance job to update the tests whenever some aspect of the MIR changes, but I don’t see any way to avoid that. If we write the tests in rust, then that just means a wider category of rustc changes break the tests (HIR generation, HIR->MIR trans, etc).

We would need (to add?) some option to rustc that takes MIR, runs some transformation, and outputs MIR. Then check the output MIR with FileCheck against our expectation.

There is definitely potential for some smarter scheme, but I don’t know what it is :slight_smile:


#3

Hmm, this does seem best, but also much harder than what I originally thought, which would start with rust code, and check (some subset of) the resulting MIR. I am not sure I want to add a MIR parser and so forth to rustc … at least not yet.


#4

Maintaining serialization code with lossless roundtrips is really annoying and error-prone, unless you specifically use a library that uses a strict format to generate both directions from one specification. This is true even if the data is just a bunch of atoms in a JSON file, and much worse with a custom human-readable format which contains as much information as MIR.

Writing the tests in Rusts would not only avoid this work, but I would argue that it results in better tests as well. While it does mean the tests also cover the entire first half of the compilation pipeline, anything which takes in a whole module, runs a whole optimization process on it, and then inspects the resulting module, is already an integration test. Increasing the scope of such integration tests means more work, yes, but also more confidence about the application as a whole. More concretely, it provides some assurance that the optimization actually fires on real MIR generated from real Rust code — and ultimately that’s what we care about.

LLVM is in a very different situation: It’s its own independent project, it powers a lot of quite different frontends, and it already has stable and round-tripping serialization. For LLVM it makes sense to write tests that go from IR to IR. MIR is rustc-only, so different tradeoffs apply.

However, to make such tests more self-documenting, and to make it easier to tell why a test failed (a change in the earlier passes, or in the pass being tested), the test format should include a dump of the MIR before the pass under test ran, and perhaps the list of passes to run beforehand should be specified. Something like (MIR from play.rlo, hence the <anon>):

#![crate_type="lib"]
pub fn foo(x: i32) -> i32 {
    x
}
--- pre: SimplifyCfg SomeOtherPassRunningBefore ---
fn foo(arg0: i32) -> i32 {
    scope 1 {
        let var0: i32;                   // "x" in scope 1 at <anon>:2:12: 2:13
    }
    let mut tmp0: i32;

    bb0: {
        var0 = arg0;                     // scope 0 at <anon>:2:12: 2:13
        tmp0 = var0;                     // scope 1 at <anon>:3:5: 3:6
        return = tmp0;                   // scope 1 at <anon>:3:5: 3:6
        goto -> bb1;                     // scope 1 at <anon>:2:1: 4:2
    }

    bb1: {
        return;                          // scope 1 at <anon>:2:1: 4:2
    }
}
--- post: MoveForwarding ---
fn foo(arg0: i32) -> i32 {
    scope 1 {
        let var0: i32;                   // "x" in scope 1 at <anon>:2:12: 2:13
    }

    bb0: {
        var0 = arg0;                     // scope 0 at <anon>:2:12: 2:13
        return = var0;                   // scope 1 at <anon>:3:5: 3:6
        goto -> bb1;                     // scope 1 at <anon>:2:1: 4:2
    }

    bb1: {
        return;                          // scope 1 at <anon>:2:1: 4:2
    }
}

This probably requires some normalization, like the UI tests.


#5

There isn’t currently a mechanism to selectively run MIR passes, right?

If we’re going to use FileCheck, we should follow the convention of embedding the command needed to produce the test output in the test (which eliminates the need to specify the passes).

IMHO, the “dump of the MIR before” should be a separate test. The rust -> original MIR code should have it’s own test :wink:

EDIT: or maybe you can have multiple tests in the same FileCheck test? That’d probably be better.

By the same logic, maybe we ought to compile all the way down to LLVM IR? Maybe that will be easier since FileCheck is presumably designed to test LLVM IR?


#6

Yes, good point. (Edit: What I mean is, this seems like a good idea even if we don’t literally use the FileCheck tool)

I am torn. On the one hand, yes, it tests a completely different thing. On the other hand, the test is meaningless if the MIR fed into the pass isn’t as expected. If during test execution the “pre” MIR isn’t as expected, then that’s a test failure, but of a different category than unexpected “post” MIR. As an analogy, consider a run-pass test that doesn’t even compile vs. one that compiles and runs but fails in an assert!(). Specifying the “pre” MIR doesn’t serve to test the HIR->MIR mapping and the previous passes, it just serves to disambiguate when a test failure is due to unrelated changes in the previous stages of the compiler and when it’s due to the pass under test.

In the workflow I imagine, one would simply copy&paste the output of rustc -Z dump-mir into a file to create a new test. As with UI tests, a change in the output of -Z dump-mir (whether pre or post) would raise an alarm and a human should compare the new output with the previous output and, if the result is sensible, update the file storing the expected MIR. Perhaps one doesn’t even need the “pre” MIR then — it shouldn’t make a big difference for maintenance though. If the generated MIR changes in a way that’s unrelated to the test case, then the output MIR will probably also change, simply because it’s unrelated to the effect of the optimization pass.

I am much less sure about this. The LLVM IR will probably always have much more lines than the MIR, so there’s more line noise as well. Moreover, LLVM is a lot bigger and a lot more complex and not fully under our control so it’s more likely that the resulting IR randomly changes in irrelevant ways. There are already codegen tests that inspect the LLVM IR, but they look like this:

// CHECK-LABEL: @add
#[no_mangle]
pub fn add(x: f32, y: f32) -> f32 {
// CHECK: fadd float
// CHECK-NOT fast
    x + y
}

It may be possible to write similar tests for MIR. The search for specific sub strings might solve the problem of tests needing attention whenever unrelated details change. But I don’t know how effective these tests are — it seems they get touched very rarely, which indicates stability, but at a glance they don’t seem to cover a lot of details either.


#7

I suspect we’d be better off just reproducing FileCheck, tbh. It’s not like it’s all that complex, and it would help with portability and so forth. Basically it just seems easier to write the rust code than to write the grungy annoying code to actually shell out and invoke Filecheck – unless there are some prior tests we can just copy from that are using it via compiletest?


#8

Yes. Please don’t use non-rust tools unless there’s no other way.


#9

I started a PR: https://github.com/rust-lang/rust/pull/34715

For building the tests and dumping the MIR I’m just using the existing compiletest infrastructure.

For the comparison, that’s still TBD.


#10

I have one small test working. You can view it here: https://github.com/scottcarr/rust/blob/mir-test/src/test/mir-opt/simplify_if.rs

The test format is a slightly modified version of @rkruppe’s suggestion. The test format is:

(arbitrary rust code)
// END RUST SOURCE
// START $file_name_of_some_mir_dump_0
//  $expected_line_0
// ...
// $expected_line_N
// END $file_name_of_some_mir_dump_0
// ...
// START $file_name_of_some_mir_dump_N
//  $expected_line_0
// ...
// $expected_line_N
// END $file_name_of_some_mir_dump_N

All the test information is in comments so the test is runnable.

For each $file_name, compiletest expects [$expected_line_0, …, $expected_line_N] to appear in the dumped MIR in order. Currently it allows other non-matched lines before, after and in-between. I am curious if people think that is too permissive? Maybe non-matched lines in-between shouldn’t be allowed?

Lines match ignoring whitespace, and the prefix “//” is removed of course. It also currently strips trailing comments – partly because the full file path in “scope comments” is unpredictable and partly because tidy complains about the lines being too long.

compiletest handles dumping the MIR before and after every pass for you. The test writer only has to specify the file names of the dumped files (not the full path to the file) and what lines to expect. I added an option to rustc that tells it to dump the mir into some directly (rather then always dumping to the current directory).

I think this is a good start at least. It’s not clear to me that FileCheck’s more advanced features would be that useful to us (ex: CHECK-NOT, CHECK-DAG, etc). Especially considering we don’t have tests that use those features.


#11

I think that the most important part of MIR optimization checking is actually to test MIR that we do not know how to generate from Rust code. Checking optimizations is nice and all, but optimization heuristics can change and cause churn. OTOH, we want to ensure the optimizations are sound with all MIR, not only MIR we can figure out how to generate.


#12

How would we do that?


#13

That’s a good point. Nailing down “all MIR” may be a bit tricky — no use in making up testcases that break invariants which are consciously maintained in the compiler — but in any case, real code might ultimately result in weird MIR that’s hard to generate from a short and sweet test case.

However, that insight doesn’t solve the practical obstacles (needing to have a MIR parser and keeping it correct and up-to-date) mentioned before. Do you have ideas on that?


#14

My intent is that we would test edge cases, even if we don’t know how to write Rust code that reaches them directly.

Yeah, this requires a MIR parser. Hopefully, we won’t have too many MIR tests for that to matter. I think FileCheck-style verification on the pretty-printed output is fine.


#15

It seems to me that we probably (eventually) want both the ability to write Rust tests (as @scottcarr has done) and to directly input MIR. The latter might make sense to wait until MIR is a touch more stable, since there is a certain amount of infrastructure that will be required.


#16

GHC has a flag called -dcore-lint which runs the Core typechecker after each optimization pass.

I thought that Rust has a similar flag for MIR, but I can’t find it. If it doesn’t, one would be helpful.

Once lifetime checks are complete, any further optimizations should maintain lifetime soundness, so running a lifetime checker on the MIR should also help catch bugs.


#17

GHC has a flag called -dcore-lint which runs the Core typechecker after each optimization pass. I thought that Rust has a similar flag for MIR, but I can’t find it. If it doesn’t, one would be helpful.

Currently Rust runs type checker(TypeckMir) unconditionally once in the middle.


#18

Have there been thoughts on finishing the lattice-based dataflow framework for MIR?