Not a compiler team member but I've been bitten by the poor accuracy of perf
once already. Currently, a 20% difference in memory consumption (max-rss) and 10-15% difference in cycles is within the margin of error (this data is from a pull-request that changed nothing). This is too high. At the very least perf
should do three runs and show an error estimate (min, max, mean, delta) or similar.
Otherwise we end up making performance decisions based on "noise".
Ideally perf would include benchmarks that go beyond benchmarking the compiler and include other applications (servo, tokio servers, standard hpc benchmarks ported to rust: SPEC, Graph 500, HPC Challenge, ...). Otherwise we might end up with a language that is great for writing fast compilers, but not that great for everything else.