Pre-RFC: Stabilize `#[bench]`, `Bencher` and `black_box`


I wonder if we could future-proof the Bencher type to be more abstract and use the already-working trick of overriding the test crate to provide alternate benching strategies. At least if we do start thinking about stabilizing Bencher we should see how much of its interface we can get away with locking down. The current interface seems quite specific, and I don’t have much conception myself of requirements for generic benchmarking.


I’m not keen on stabilizing more hard-coded annotations in rustc. Sure they make things easier now, but they will amount to extra unneeded complexity that never goes away once we have a beautiful custom test harness solution.

It’s already awkward that #[test] and #[bench] are always defined regardless of what crates one links and how one is compiling—that means we can’t backwards compatibility solely define them in some library. For, it would most elegant to make all testing stuff (runtime and macros) come from single multi-phase test crate, or test and test-macros pair of crates.


The plugins-based approach has the advantages that test harness are not treated in a special in way by the compiler, and that you can use multiple test harnesses per crate.


In theory, sure. But it’s quite hypothetical at the moment since there’s not a concrete design, particular for plugin-based benchmarking.


I personally feel that we should take @brson’s initial proposal and move forward with that, stabilizing Bencher in std::test (I’d be ok eliding black_box, it often isn’t unnecessary due to how the closure works).

I agree with @nikomatsakis that the high-order bit here is that the benchmarking support we stabilize needs to feel built-in. That means you shouldn’t have to edit Cargo.toml or add extern crate annotations to get it working. Even reaching for Bencher is a great stretch over how #[test] works.

I’ve got lots of reservations about what Bencher actually does, but the interface is so minimal today that I think we can easily stabilize it and then enhance it over time with more bells and whistles in a backwards-compatible way.

I should also emphasize that replacing the test crate with your own is super unstable. This relies on the exact interface that libtest in-tree has today as the compiler will generate structures that libtest has. If you’re relying on --extern test=... shadowing the built-in test crate I would consider that a bug in the compiler that we didn’t properly gate it and I would also want to reserve the right to break it. The precise structures in libtest change over time, and there’s no reason they should be defacto stable.


I filed a bug about swapping the test crate:


IMO [bench] and others should not be stabilized. It doesn’t make sense to add “magic” to rustc vs. getting it moved out to a crate. I would expect that there should be a way to get something working in a crate with macros 2.0 which should be coming soonish.


I do agree that the long-term path forward is to lean hard on macros 2.0 for custom test frameworks. But I believe that making macro-based custom benchmarks work via macros 2.0 will still require redesign within the Rust test harness (there is nowhere currently to macroize the reporting features of #[bench]) in order to properly integrate with cargo test. Also ISTR that last time I asked nrc about macros 2.0 I came away with the impression that the needs of custom test frameworks were not fully accounted for yet. In other words, custom test frameworks are not going to automatically fall out of macros 2.0 - there is additional work to do to make it fit together.


I was under the impression that black_box(_) was the thing keeping us from stabilizing. Otherwise bencher is a mostly good enough solution building on stable macros 1.0. Swapping those out with an proc_attr_macro seems within reach.


One thing that would make things a bit more ergonomic while waiting for this to be stabilised is if it was possible to do #[cfg(bench)] or similar, akin to #[cfg(test)] which would make the annotated block only be compiled when running cargo bench. (alternatively a way to use cfg to detect if using nightly, but that would presumably be more complex.) This would make it easier to add benchmarks for internal things like rustc has. Conveniently, annotating something with #[cfg(bench)] with the current compiler will ignore the annotated block.


If you are using cargo anyways, it’s advisable to put benchmarks in the benches folder and switch to nightly before benchmarking.


Can’t you just use the same nightly that corresponds to the stable version? There are many reasons to use a Nightly compiler for development. clippy is one, although I guess you’re supposed to use the latest nightly with it.


That came up as a question yesterday in a course I gave: do we release a nightly compiler directly corresponding to a stable compiler?


This requires the functions that are benchmarked to be exported publicly though. My suggestion for a #[cfg(bench)] was to make it easier to benchmark internal functions that are not meant to be exported while we are waiting for benchmarking on stable.


AFAIK nightlies always come from the master branch, so the closest would be the nightly just before beta is branched off. This won’t exactly match the eventual stable release, as there are often additional patches that get backported from master to beta in the process.

If you must match the stable compiler exactly, you can cheat: Setting RUSTC_BOOTSTRAP=1 will enable the same unstable features as nightly would. This is unsupported, of course, only meant for building rustc.


That was what I thought, but I think it makes sense to cut them.

I know about that trick, but you really can’t recommend that for production use.


Well, I wouldn’t recommend nightly for production use either, so… :shrug:

/me hopes for bench stabilization


We do regularly recommend nightly for use around development, such for things as rustfmt and clippy and your benchmarks are not your production program. This is a source of insecurity.


It is not wise to run benchmarks on nightly if you run production on stable. Both the production binaries and benchmarks should be compiled with the same compiler otherwise we further risk benchmark results not being aligned with production (even more so than they already are).


Totally agree — that’s why the crate bencher exists, so that it’s possible to measure and validate performance fixes for stable releases. Now it would be great if someone had the time to make a better benchmark runner for stable… :slightly_smiling_face: