Pre-RFC: Stabilize `#[bench]`, `Bencher` and `black_box`


I was under the impression that black_box(_) was the thing keeping us from stabilizing. Otherwise bencher is a mostly good enough solution building on stable macros 1.0. Swapping those out with an proc_attr_macro seems within reach.


One thing that would make things a bit more ergonomic while waiting for this to be stabilised is if it was possible to do #[cfg(bench)] or similar, akin to #[cfg(test)] which would make the annotated block only be compiled when running cargo bench. (alternatively a way to use cfg to detect if using nightly, but that would presumably be more complex.) This would make it easier to add benchmarks for internal things like rustc has. Conveniently, annotating something with #[cfg(bench)] with the current compiler will ignore the annotated block.


If you are using cargo anyways, it’s advisable to put benchmarks in the benches folder and switch to nightly before benchmarking.


Can’t you just use the same nightly that corresponds to the stable version? There are many reasons to use a Nightly compiler for development. clippy is one, although I guess you’re supposed to use the latest nightly with it.


That came up as a question yesterday in a course I gave: do we release a nightly compiler directly corresponding to a stable compiler?


This requires the functions that are benchmarked to be exported publicly though. My suggestion for a #[cfg(bench)] was to make it easier to benchmark internal functions that are not meant to be exported while we are waiting for benchmarking on stable.


AFAIK nightlies always come from the master branch, so the closest would be the nightly just before beta is branched off. This won’t exactly match the eventual stable release, as there are often additional patches that get backported from master to beta in the process.

If you must match the stable compiler exactly, you can cheat: Setting RUSTC_BOOTSTRAP=1 will enable the same unstable features as nightly would. This is unsupported, of course, only meant for building rustc.


That was what I thought, but I think it makes sense to cut them.

I know about that trick, but you really can’t recommend that for production use.


Well, I wouldn’t recommend nightly for production use either, so… :shrug:

/me hopes for bench stabilization


We do regularly recommend nightly for use around development, such for things as rustfmt and clippy and your benchmarks are not your production program. This is a source of insecurity.


It is not wise to run benchmarks on nightly if you run production on stable. Both the production binaries and benchmarks should be compiled with the same compiler otherwise we further risk benchmark results not being aligned with production (even more so than they already are).


Totally agree — that’s why the crate bencher exists, so that it’s possible to measure and validate performance fixes for stable releases. Now it would be great if someone had the time to make a better benchmark runner for stable… :slightly_smiling_face:


Is there an rfc/repo/thread for discussing what should end up in standard lib?


but black_box isn’t even the important part. Benchmarks should avoid relying on it, it also prevents some wanted optimizations. Most benchmarks can use just the black_box invocation that the bencher library does already on the value returned by the closure given to Bencher::iter.


I mostly agree. Alas, most benchmarks are not enough, and it often takes some detailed analysis to find out what optimizations were applied, and if those are in the spirit of the benchmark. Sometimes it is easier to test a theory by applying black_box in the same location in multiple benchmarks and see the relative performance change.


There a few things missing from Bencher that are essential for a bare bones micro benchmarking library.

  • ability to tell the bencher that the closure that is going to run involves N iterations (or ops). This makes the output of the benchmark much easier to compare. A motivating example is benchmarking a sort function: we want to benchmark sorting arrays that fit in l1, l2, l3 caches but what we want to report is how much time we spent to sort per element. This is much more meaningful number than reporting how much time it takes to sort the array that fits in l1 cache.

  • any kind of memory usage stats:

    • the first and easy number to track is number of allocations per iteration
    • another number is peak memory usage

NOTE: Unlike CPU time which can be normalized by dividing by number of ops, I don’t have a good suggestion on how to do the same for memory usage unless we let the bencher know the number of elements.

  • value and type parametrization. Right now this is done through macros in a very inside-out way. An example is here: A lot of this would be easier if the bencher (or #[bench] attribute) allowed parametrization. Parametrization can take two forms:
    • type parametrization: I want to run the same benchmark for HashMap, MyHashMap, ThisOtherHashMap.
    • value parametrization: I want to run the same benchmark for the combination of N different populations and K different element sizes.

It would be nice if type and value parametrizations can be composed: I want to run the same benchmark for the combination of N populations and K different element sizes across L different implementations of the container and D different distributions (uniform, zipf, whatever).

Some of the above can be implemented on top of the current API with enough macros but I think they are so important in writing benchmarks that warrant first class support for ease of use/ergonomics.

Sample code as TLDR and starting point for discussion:

// size becomes a runtime param, ballast is a constant
#[bench(size=(1000..1_000_000).step_by(1000), ballast=[0, 4, 8, 16, 32, 64]]
fn lookup_hit(b: &mut Bencher, size: usize, ballast: usize) {
  let m = generate_map(size, [0u8; ballast]);
  let mut k = &|k,v| k).collect::Vec<_>();
  let i = k.iter();
  b.iter(|| {
    if i.is_none() { i = k.iter(); }

The above should generate proper names for the benchmark:

lookup_hit/size=1000/ballast=0  ...  Xns
lookup_hit/size=1000/ballast=4  ...  Xns
lookup_hit/size=1000/ballast=8  ...  Xns
lookup_hit/size=2000/ballast=0  ...  Xns

Benchmarking sort:

#[bench(size=[1000, 10_000, 1_000_000])
fn sort(b: &mut Bencher, size: usize) {
  let v = (0..size).map(rng().gen::<u32>()).collect::Vec::<_>();
  // Explicitly tell bencher this iteration involves `size` ops.
  b.iter_n(size, || {

My rust foo is not powerful enough for type parametrization. My ideas so far involve macro like invocations - perhaps there is a way to do this without macros?

Past, present, and future for Rust testing

Opened an RFC


I was wondering, should black_box really be part of test/bench?

black_box can (and is) useful for other uses, wouldn’t it be a better fit for core::mem? (Or somewhere else in core)


Yeah, @nagisa suggested a move to core on the RFC thread: I never thought about it before but it seems perfectly sensible to me.


Oh great, I missed that one. Thanks :slight_smile: