Help Needed: corpus for measuring runtime performance of generated code


Hey, maintainer here. I think it’s great that you’re using for this. I’m taking a bit of a break from active development for the moment but I’d be happy to assist with this work from the side. Let me know if there’s anything I can help with.

integrate perf record into criterion runners (this may take a bit of effort) so that perf.rlo can display more granular metrics than just runtime

This seems like it would be useful for others as well. I don’t know much about perf specifically; can that be done in-process, or would it require running the benchmark in a sub-process?

sort out a stable JSON format (I don’t know if the criterion authors expect to keep the current files they write as a stable format)

The JSON formats are deliberately unspecified and open to change, unfortunately. They’ve already changed at least once since 0.1.0. It might be possible to support this use-case more cleanly by allowing the user to provide a custom report that would receive the data directly. There’s already something like that internally, but the API isn’t really ready for public use.

make criterion able to write data files and reports to a configurable directory

This one should be pretty easy; I think all of the necessary code is there for it, I just didn’t want to stabilize an API for it until I needed to.


ping - @dikaiosune, I’m quite lost as to the status. Should we make a kind of tracking issue here? We’re hitting “Yet Another” case where it’d be really nice to have some insight into the effects of various changes on runtime performance (in this case, it is @michaelwoerister’s clever PR to reuse generics from dependencies, which offers a tidy win on compilation time, but comes at the cost of being able to do less inlining – for now, we are limiting to debug builds, but …).


Sorry for the radio silence! Work’s been a bit intense.

There’s a bit more work to do before I think it would be a good idea to start collecting metrics on perf.rlo (and I still don’t have a clear idea of how to make the data more navigable). In the meantime, it would be pretty straightforward to clone the repository onto a benchmark machine (bare metal preferred) and run it. I would recommend the following:

  1. set a rustup override for the benchmark directory for the “base” toolchain (before the changes)
  2. run cargo bench, save the results
  3. set a rustup override for the benchmark directory with a custom toolchain from the PR
  4. run cargo bench again, and criterion should tell you on the command line if there’s a measurable difference compared to the base run

I haven’t run the entire suite yet myself so I don’t know how long it’ll take, but I’d estimate a couple of hours. Happy to help here or on IRC if someone wants to help set this up.


I don’t think we can afford a couple hours of runtime on the current perf collector, so we’d need to consider finding another dedicated server or narrowing down the quantity of benchmarks we run before we take that step.


I’ll try to make progress on some PRs to criterion in the near future.

@Mark_Simulacrum I’ll try setting up a physical box, I have a spare with pretty solid performance.