RFC: --emit check?

It's always bugged me that --emit metadata produces different output depending on what else is being emitted - that is, check builds vs pipelined builds.

I've run into this in a very concrete way while trying to implement manual, brute-force pipelining to increase build parallelism for a non-Cargo build system. That is, separately invoke rustc --emit metadata to produce inputs for dependent libraries, and rustc --emit link for the final link.

(Yes, this is strictly less efficient than a pipelined build as the frontend work is duplicated. I have the limitation that I can't get intermediate results until the action is complete, but the problem I'm solving is that there isn't enough parallelism available to fill all the cores so the duplicate work shouldn't matter.)

This nearly works except that rustc crashes with:

error: internal compiler error: compiler/rustc_mir/src/monomorphize/collector.rs:826:9: no MIR available for DefId(18:3 ~ bar[8787]::bar)

I can work around this with -Zalways-encode-mir=yes, but of course that's unstable.

So back to the original question: what if we had --emit check to generate minimal metadata suitable for check builds, and make --emit metadata unconditionally produce full mir-enabled metadata?

This would be backwards compatible (with old invokers not knowing about check getting a perf hit from metadata, but no functional loss). Forwards compatibility would require cargo check to see if rustc supports --emit check (perhaps streamlined by adding rustc --print emits?).

And it would also restore the property that artifacts produced by --emit are independent of the other artifacts, so that invoking rustc once or multiple times is just a performance difference, not a functional one.

Or is there some other solution I've overlooked? Another option would be to keep the existing behaviour and stabilize -Zalways-emit-mir, but I feel that's less clean (but definitely simpler to implement).

1 Like

Quick prototype: GitHub - jsgf/rust at emit-check

Does anyone know the performance impact of always-emit-mir?

The comment on should_encode_mir says:

/// Computing, optimizing and encoding the MIR is a relatively expensive operation.
/// We want to avoid this work when not required. Therefore:
/// - we only compute `mir_for_ctfe` on items with const-eval semantics;
/// - we skip `optimized_mir` for check runs.

but I haven't measured it myself.

It's possible that if the artifacts from cargo check can be reused for a later cargo build (ie unblock building dependents without needing to use rustc pipelining at all) then the cost is amortized, but I suspect that cargo check latency is more important, and there's many more invocations of cargo check than cargo build.

I'm doing a comparison of cargo check on cargo itself - the difference seems pretty small:

Benchmark #1: rm -rf target/debug; ./target/release/cargo -Zrustc-check check
  Time (mean ± σ):     19.190 s ±  0.275 s    [User: 74.783 s, System: 9.683 s]
  Range (min … max):   18.761 s … 19.756 s    10 runs
 
Benchmark #2: rm -rf target/debug; ./target/release/cargo  check
  Time (mean ± σ):     20.806 s ±  0.206 s    [User: 75.121 s, System: 8.611 s]
  Range (min … max):   20.451 s … 21.098 s    10 runs
 
Summary
  'rm -rf target/debug; ./target/release/cargo -Zrustc-check check' ran
    1.08 ± 0.02 times faster than 'rm -rf target/debug; ./target/release/cargo  check'

It's not nothing, but hardly 2x faster or anything. Only one example; I'm sure there are some more extreme cases.

These are the corresponding cargo changes: GitHub - jsgf/cargo at rustc-emit-check