Measuring compiler performance

nikomatsakis · March 16, 2017, 8:58pm

So, sadly, the perf.rust-lang.org website has been broken for quite some time (basically since rustbuild landed). @Mark_Simulacrum has done some awesome work (which they can describe) building up a new system that uses the pre-built binaries produced by travis. This offers the promise of very precise info about which PR triggered a regression.

I’d like to spark a discussion on two topics:

What are the minimum steps we can take to get some kind of results available again?
- I really dislike having no measurements
- Is it just a matter of needing to get the old server up and going again, or what?
How should we structure our test suite?

@Mark_Simulacrum and I have had quite a few conversations on the second point and it seems like a good idea to get broader input.

My current take is that there are roughly four kinds of measurements I would like:

Compilation times that target very specific parts of the compiler and workflows
- this includes incremental flows, i.e., build from scratch, apply diff, etc
Regression tests for known performance issues (kind of the same)
“Real-world” tests that correspond to frozen versions of actual crates
- this includes incremental flows, i.e., build from scratch, apply diff, etc
Performance of generated code
- not currently measured at all; obviously somewhat different from compilation time, but perhaps can share infrastructure

Our current set of compilation-time benchmarks has grown somewhat organically and includes a smattering of the above categories. Perhaps we should carefully review them?

I made a brief stab at set of runtime benchmarks as well but that never quite got off the ground. Some more suggestions for entries there would be helpful.

Thoughts?

leonardo · March 16, 2017, 9:09pm

Those benchmarks also need to include the maximum RAM memory used during the compilation, and the size of the resulting binaries.

80-90% of the times I update the Nightly compiler (64 bit Windows, and I update it daily when possible) I see the binaries grow a little (some months ago the binaries shrunk hugely in one update, and we’re very far from regaining that lost binary weight, so it’s not a big problem).

michaelwoerister · March 17, 2017, 1:23pm

I think there is value in measuring all four of those categories.
There should be some pruning of current test cases, we have two versions of regex in there, for example.
The visual presentation of performance numbers could be a lot better. More clearly separating the above categories would probably go a long way towards addressing concerns about mixing them.

One more thing to keep in mind is that Travis nightlies have debug assertions and LLVM assertions turned on, I think. This skews measurements a bit.

retep998 · June 12, 2017, 10:15pm

I’m still holding on to the hope that winapi gets added to the set of crates being measured.

nikomatsakis · March 25, 2019, 8:28am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Help Needed: corpus for measuring runtime performance of generated code compiler	34	5833	March 25, 2019
Incremental Compilation Beta compiler	37	30402	March 25, 2019
Help us benchmark incremental compilation!	48	12302	March 25, 2019
Interesting Rust/web project - rework the compiler performance website	4	1827	March 25, 2019
What is perf.rust-lang.org measuring and why is "instructions:u" the default? compiler	13	3961	July 15, 2019

Measuring compiler performance

Related topics