Problem running rustc-perf on local machine

bonega · September 26, 2021, 6:25pm

I am doing some experimentation with discriminants and want to run a lot of perf-runs without bothering people in a PR.

Problem is that I get wildly different results for running on my machine vs running on https://perf.rust-lang.org/

I have a Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz, nothing unusual.

My results are of low variance between runs on my local machine.

Basically my workflow is:

./x.py build --stage2
cd ~/git/rustc-perf
./target/release/collector bench_local ~/git/rust/build/x86_64-unknown-linux-gnu/stage2/bin/rustc <COMMIT_HASH>

The results are different enough that they are worthless for making any changes. Any idea?

CAD97 · September 26, 2021, 9:21pm

Any time you're measuring perf: absolute measurements don't really matter, what matters is relative changes in measurements.

(instcount measurements flatten this rule somewhat, but I think it still applies.)

So if your local measurements have the same "shape" as the reference measurements (that is, relative measurements between benchmarks are roughly the same), I wouldn't worry about the absolute measurements being different locally. Instead, capture your reference baseline locally, then compare against that.

bonega · September 27, 2021, 8:11am

Thanks for your answer. The problem is that the relative comparison is different.

I am comparing how many benches had improved instruction metric for UPSTREAM_COMMIT vs MY_CHANGED_COMMIT.

I will try and do a reference run for an existing perf-run and report back.

Mark_Simulacrum · September 27, 2021, 12:46pm

perf.rust-lang.org benchmarks builds produced by rustc's CI, specifically the x86_64-unknown-linux-gnu target. For that target, our CI currently does several things that a local build may not match (and likely won't by default); I think this is a mostly complete list:

PGO for rustc
ThinLTO + PGO for LLVM (if you use download-ci-llvm = true on x86_64-unknown-linux-gnu, you likely get most of the benefits here)
std is built with codegen-units=1

All of these will definitely make perf's instruction counts and absolute numbers differ from what you see locally. I wouldn't try to reproduce the above locally -- local benchmarking, particularly e.g. with cachegrind (which is less sensitive to environmental differences and noise), should give a fairly decent proxy for what you'll see as a relative change on perf. This is not always true -- for example, PGO can mean that your loop/condition reordering or whatever was already applied by LLVM -- but in the general case, locally you should be able to reproduce results fairly well. If you can't, we may not be able to do anything but we'd like to hear about it -- feel free to drop by #t-compiler/performance on Zulip and ask questions if something isn't working as you expect.

bonega · September 29, 2021, 7:17pm

Thanks for your great answer. I will repeat local bench for next PR and see if I have something actual to report.

system · December 28, 2021, 7:17pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Test HW Config on perf.rust-lang.org tools and infrastructure	9	1107	March 25, 2019
What is perf.rust-lang.org measuring and why is "instructions:u" the default? compiler	13	3904	July 15, 2019
Measuring compiler performance tools and infrastructure	4	1557	March 25, 2019
Help Needed: corpus for measuring runtime performance of generated code compiler	34	5768	March 25, 2019
Compile time performance changes for 1.8	4	1850	March 25, 2019

Problem running rustc-perf on local machine

Related topics