Pre-RFC: Stabilize `#[bench]`, `Bencher` and `black_box`


One thing I’d love to see with both tests and benches is easier parameterization. Macros are a solution, but are limited in expressiveness.

Past, present, and future for Rust testing

I have seen related issues benchmarking hashmap changes and I think it could be related to this


Another unrelated point: The current benchmark runner is underequipped for micro benchmarks.

We should be looking at JMH which is probably the gold standard in microbenchmark harnesses.


While I am in favor of aiming to improve the benchmarking harness, I think we focus on getting something stable and usable first, as long as we feel there is room to expand it. Taking a quick glance at the JMH examples, it seems like there would be plenty of room to do so – either by further configuring #[bench], offering new methods on the Bencher callback, or extending the binary to support alternate execution modes.


Stabilizing this is likely a measurable step towards reducing reliance on nightly (which IS a good thing judging by the 2017 roadmap). The current #[bench] and friends are simple enough that they could easily be customizable/expanded in the future.


Since procedural macros are finally slowly getting stable, alternative test harnesses or benchmarkers should probably simply be procedural macros.

In other words if you want to use the default bencher you use #[bench], and if you want to use a third party library you add a dependency to awesome_library and you use #[awesome_library_bench].

Past, present, and future for Rust testing

Last time Alex and I worked through a custom test framework design, we landed on punting test definitions entirely to procedural macros, likely compiling down to today’s #[test] fns. For benchmarks there would possibly need to be some extensions to the test crate’s APIs since it itself is responsible for running today’s benchmarks and isn’t extensible to other approaches.


Or perhaps

use awesome_library::bench; // shadow the default bench macro 

fn foo() { ... }

though I’ve not thought that through at all =)


I wonder if we could future-proof the Bencher type to be more abstract and use the already-working trick of overriding the test crate to provide alternate benching strategies. At least if we do start thinking about stabilizing Bencher we should see how much of its interface we can get away with locking down. The current interface seems quite specific, and I don’t have much conception myself of requirements for generic benchmarking.


I’m not keen on stabilizing more hard-coded annotations in rustc. Sure they make things easier now, but they will amount to extra unneeded complexity that never goes away once we have a beautiful custom test harness solution.

It’s already awkward that #[test] and #[bench] are always defined regardless of what crates one links and how one is compiling—that means we can’t backwards compatibility solely define them in some library. For, it would most elegant to make all testing stuff (runtime and macros) come from single multi-phase test crate, or test and test-macros pair of crates.


The plugins-based approach has the advantages that test harness are not treated in a special in way by the compiler, and that you can use multiple test harnesses per crate.


In theory, sure. But it’s quite hypothetical at the moment since there’s not a concrete design, particular for plugin-based benchmarking.


I personally feel that we should take @brson’s initial proposal and move forward with that, stabilizing Bencher in std::test (I’d be ok eliding black_box, it often isn’t unnecessary due to how the closure works).

I agree with @nikomatsakis that the high-order bit here is that the benchmarking support we stabilize needs to feel built-in. That means you shouldn’t have to edit Cargo.toml or add extern crate annotations to get it working. Even reaching for Bencher is a great stretch over how #[test] works.

I’ve got lots of reservations about what Bencher actually does, but the interface is so minimal today that I think we can easily stabilize it and then enhance it over time with more bells and whistles in a backwards-compatible way.

I should also emphasize that replacing the test crate with your own is super unstable. This relies on the exact interface that libtest in-tree has today as the compiler will generate structures that libtest has. If you’re relying on --extern test=... shadowing the built-in test crate I would consider that a bug in the compiler that we didn’t properly gate it and I would also want to reserve the right to break it. The precise structures in libtest change over time, and there’s no reason they should be defacto stable.


I filed a bug about swapping the test crate:


IMO [bench] and others should not be stabilized. It doesn’t make sense to add “magic” to rustc vs. getting it moved out to a crate. I would expect that there should be a way to get something working in a crate with macros 2.0 which should be coming soonish.


I do agree that the long-term path forward is to lean hard on macros 2.0 for custom test frameworks. But I believe that making macro-based custom benchmarks work via macros 2.0 will still require redesign within the Rust test harness (there is nowhere currently to macroize the reporting features of #[bench]) in order to properly integrate with cargo test. Also ISTR that last time I asked nrc about macros 2.0 I came away with the impression that the needs of custom test frameworks were not fully accounted for yet. In other words, custom test frameworks are not going to automatically fall out of macros 2.0 - there is additional work to do to make it fit together.


I was under the impression that black_box(_) was the thing keeping us from stabilizing. Otherwise bencher is a mostly good enough solution building on stable macros 1.0. Swapping those out with an proc_attr_macro seems within reach.


One thing that would make things a bit more ergonomic while waiting for this to be stabilised is if it was possible to do #[cfg(bench)] or similar, akin to #[cfg(test)] which would make the annotated block only be compiled when running cargo bench. (alternatively a way to use cfg to detect if using nightly, but that would presumably be more complex.) This would make it easier to add benchmarks for internal things like rustc has. Conveniently, annotating something with #[cfg(bench)] with the current compiler will ignore the annotated block.


If you are using cargo anyways, it’s advisable to put benchmarks in the benches folder and switch to nightly before benchmarking.


Can’t you just use the same nightly that corresponds to the stable version? There are many reasons to use a Nightly compiler for development. clippy is one, although I guess you’re supposed to use the latest nightly with it.