Enabling MIR by default

nikomatsakis · June 6, 2016, 1:47pm

So PR 34906 proposes to enable MIR by default, at least on nightly builds. As the comment says, the correctness problems that we know of are addressed, as are the known compile-time and performance regressions. The goal then is to gain experience on Nightly and, if we encounter problems that we cannot fix before the next release cycle, to fallback to the older backend.

I’d like to get feedback from from folks on the core and tools team as well as to what criteria they would like to see satisfied before enabling MIR, even in this provisional manner. So cc @alexcrichton, @nrc, @aturon, @brson, at minimum, as well as @eddyb, @pnkfelix, @nagisa, @arielb1, @dotdash (no way to cc teams in discuss, right?).

It seems to me that there are two criteria on which we might evaluate our readiness: correctness and performance. We’ve used crater extensively for correctness but that has known blind spots.

When it comes to performance, I think there has been less active investigation. Anyway, originally I wanted to take some time to run more careful benchmarks. I did a bit of work attempting to create a “rust benchmark suite” based on the benchmarks listed in https://github.com/rust-lang/rust/issues/31265, however it’s not ready yet. It should be possible though to just run some of those benchmarks in a “one-off” fashion locally, and I will do that at least as a sanity check. It would also be worth running some servo code (cc @pcwalton – suggestions?)

alexcrichton · June 6, 2016, 2:13pm

I’m pretty excited by this and am quite enthusiastic about the idea of turning on MIR by default soon. I feel like MIR has seen a lot of attention both in terms of perf fixes recently as well as bug fixes, and the makes me confident that we can polish it off as issues come up. I definitely wouldn’t expect this to be an absolutely smooth transition over the first week or so (as there’s likely to be new bugs), but it seems like the compiler team is doing a great job of cleaning them out quickly.

To me it seems ok to leave MIR as nightly-only for awhile and we may want it to miss the next beta train if it’s not ready. The risk here is that if we skip the beta train for MIR (e.g. rust 1.11 nightly has MIR by default but rust 1.11 beta does not) then we may have regressions in old trans which aren’t detected until later. The risk seems low, however, as it seems we haven’t found an old trans bug in awhile (even with MIR trans making quite a bit of progress).

The other risk to me is that this can have a major impact on the nightly-only community, perhaps blocking upgrades for a long time. The major user here is Servo, but this also ecompasses a large portion of the serde ecosystem. I don’t really know of a great way to reduce the risk here, but we can do simple things like:

Land a patch in Servo with RUSTFLAGS=-Z orbit
Compile a large Piston project (or such) with -Z orbit (maybe even something like Diesel here as well)
Land a patch to Cargo with -Z orbit by default, while this should be tested by cargotest in the main repo I doubt that propagates the right flags so it’d be good to just double check.

If we don’t do that then the risk is still somewhat small in the sense that it’s just nightly and we can always revert, but that would hopefully make the transition go much more smoothly.

Also really awesome work to everyone working on MIR! The wins we’re starting to see are quite impressive even this early on!

nikomatsakis · June 6, 2016, 2:16pm

Yeah I was wondering about this, and kind of came to the same conclusion. Anyway it seems like a risk we'll have to run at some point.

dotdash · June 6, 2016, 2:27pm

I generally support the idea of switching over rather sooner than later, but there is one thing about which I’m not sure whether we should do before or after the switch, and that’s reducing the number of temporaries. I’m honestly not sure whether the situation improved since the dev-sprint, but temporaries (used to) increase stack usage quite considerably, so some programs might run into stack overflow quite a bit earlier than before when using a non-optimized build. So this might be a problem when there are large-ish arrays on the stack.

Other than that, @alexcrichton’s suggestions seem pretty good to me.

nagisa · June 6, 2016, 3:01pm

For optimised builds with the static drops PR in place there should be no difference anymore (i.e. the LLVM’s alias analysis can figure that out itself). As far as debug builds go… we might want to selectively and unconditionally enable LLVM’s alias optimisation regardless of optimisation level as a temporary measure if necessary (i.e. until I get around getting my own dflow + alias optimisation up and running)?

I’m strongly in favour of enabling this by default regardless of one-off performance concerns, because, as https://github.com/rust-lang/rust/issues/33828 proves, we should be as fast in runtime and faster at compiling in the common case.

nikomatsakis · June 6, 2016, 3:58pm

I did some performance measuring. Specifically I ran the rust-doom benchmark and a Rust raytracer. I wanted to do a few more but this is all I got running so far. =) In these results, MIR is always faster by a little bit:

           | MIR                 | No MIR              | % faster | 
doom 1     | 511,960,319 ns/iter | 536,614,163 ns/iter | 4.6% 
doom 2     | 634,623,332 ns/iter | 674,688,434 ns/iter | 5.9%
raytracer* | 3.38s               | 3.416s              | 1%

* average of 3 runs, all of which were very very close

(I’ll be pushing these benchmarks to a repo soon.)

bstrie · June 6, 2016, 8:07pm

I love the idea of having the Servo team try compiling with orbit to tease out any perf regressions.

@nikomatsakis, are these numbers compilation times or runtime benchmarks? It would be great to have both.

djc · June 6, 2016, 8:11pm

The blog post wrote:

Once we’re able to run Crater with MIR enabled and see no regressions across the entire crates.io ecosystem, we’ll turn it on by default (or, you’ll forgive a terrible (wonderful) pun, launch MIR into orbit).

Has that already happened?

leonardo · June 6, 2016, 8:47pm

From some of my experiments (on 64 bit Windows), I’ve seen that using -Z obit increases compilation times (15%? 20%?), increases binary size, and reduces the run time a little. I don’t give exact numbers because they vary for different programs.

eddyb · June 6, 2016, 9:25pm

Sadly, the crater run found some regressions https://gist.github.com/nikomatsakis/88ce89ed06ef7f7f19bfd1e221d7f7ec - working through them atm.

nikomatsakis · June 7, 2016, 12:22am

Those were runtime benchmarks. I can get compilation time.

nikomatsakis · June 7, 2016, 12:23am

Yes, although we just did a re-run and found some new problems (some of which @eddyb has already fixed today). Mostly what we're discussing here is what else to require beyond crater.

nikomatsakis · June 7, 2016, 12:24am

Interesting. I wonder if windows behaves differently in this respect. One thing that we may have to address before we land is the stack usage, which is currently not as well optimized with MIR as it is with normal trans.

sanxiyn · June 7, 2016, 12:55am

Probably obvious, but one risk is breaking Windows. MIR trans makes significant changes to Windows-only code paths, so there will be MIR+Windows only bugs. Apparently one such bug is already reported: https://github.com/rust-lang/rust/issues/34119

Since crater runs on Linux, crater can’t catch MIR+Windows only bugs. On the other hand, I have no useful idea how to address this. On the third hand, probably seeing what breaks and fixing them ASAP is enough.

aturon · June 7, 2016, 3:23am

@alexcrichton has a nascent crater-like script that I believe can be used pretty easily on other platforms. Definitely seems worthwhile!

leonardo · June 7, 2016, 7:29am

This is a little example program (solves Euler Problem 102):

extern crate smallvec;

fn contains_origin(a: (i32, i32), b: (i32, i32), c: (i32, i32)) -> bool {
    let area = a.0 * (b.1 - c.1) + b.0 * (c.1 - a.1) + c.0 * (a.1 - b.1);
    let sign = area.signum();
    let s = (c.0 * a.1 - a.0 * c.1) * sign;
    let t = (a.0 * b.1 - b.0 * a.1) * sign;
    s > 0 && t > 0 && area.abs() > s + t
}

fn main() {
    use std::fs::File;
    use std::io::{BufReader, BufRead};
    use smallvec::SmallVec;

    let e102 =
        BufReader::new(File::open("p102_triangles4096.txt").unwrap())
        .lines()
        .map(|line| {
            let r = line
                    .unwrap()
                    .split(',')
                    .map(|p| p.parse().unwrap())
                    .collect::<SmallVec<[_; 6]>>();

            contains_origin((r[0], r[1]),(r[2], r[3]),(r[4], r[5]))
        })
        .filter(|&c| c)
        .count();

    println!("{}", e102);
}

The cargo.toml contains:

[dependencies]
smallvec = "0.1.7"

[profile.release]
opt-level = 3
debug = false
rpath = false
lto = true
debug-assertions = false
codegen-units = 1

If I compile it with: cargo rustc --release --bin e102 -- -C target-cpu=native

The compilation time is about 3.6 seconds, the binary size is 312 kb, the run-time is about 1.00 seconds.

If I compile it with: cargo rustc --release --bin e102 -- -C target-cpu=native -Z orbit

The compilation time is about 3.9 seconds, the binary size is 314 kb, the run-time is about 1.08 seconds.

The p102_triangles4096.txt file is the concatenation of 4096 copies of the p102_triangles.txt file in the Euler problem 102:

https://projecteuler.net/project/resources/p102_triangles.txt https://projecteuler.net/problem=102

So in this case with -Z orbit the compilation time is bigger, the binary is bigger and the run-time is bigger.

eddyb · June 7, 2016, 7:56am

MIR has “more correct” drop behavior when unwinding, as opposed to the status quo, which could easy account for 2kB (likely a lot of it in libstd itself).

The runtime is interesting, although LTO might be obscuring some details.
Are those timings deterministic? IO being involved makes me a bit worried.
You could try BufReader::new(include_bytes!("p102_triangles4096.txt")), I think that may work.

dotdash · June 7, 2016, 8:46am

Just to make sure, is this with a recent nightly that contains non-zeroing drop?

leonardo · June 7, 2016, 9:13am

I am using rustc 1.11.0-nightly (12238b984 2016-06-04)

leonardo · June 7, 2016, 9:21am

I have an efficient PC, with SSD, the timings are sufficiently deterministic.

Now I've tried with just include_bytes!("p102_triangles4096.txt") (wihtout BufReader), the timings are 1.07-1.08 seconds with -Z orbit, and 0.98-0.99 without. About 8-10% slower with -Z orbit.

I have larger programs that with -Z orbit compile 20-30% slower and have binaries about 5-10% bigger.

Topic		Replies	Views
Testing MIR optimizations compiler	18	4296	March 25, 2019
MIR progress report -- when to open a PR? compiler	6	2804	March 25, 2019
Plan to test parallel rustc announcements	30	4552	March 19, 2020
[MIR] compiler-plugins for custom mir-passes compiler	11	5349	March 25, 2019
Measuring compiler performance tools and infrastructure	4	1563	March 25, 2019

Enabling MIR by default

Related topics