Pre-RFC: Stabilize `#[bench]`, `Bencher` and `black_box`

There a few things missing from Bencher that are essential for a bare bones micro benchmarking library.

  • ability to tell the bencher that the closure that is going to run involves N iterations (or ops). This makes the output of the benchmark much easier to compare. A motivating example is benchmarking a sort function: we want to benchmark sorting arrays that fit in l1, l2, l3 caches but what we want to report is how much time we spent to sort per element. This is much more meaningful number than reporting how much time it takes to sort the array that fits in l1 cache.

  • any kind of memory usage stats:

    • the first and easy number to track is number of allocations per iteration
    • another number is peak memory usage

NOTE: Unlike CPU time which can be normalized by dividing by number of ops, I don’t have a good suggestion on how to do the same for memory usage unless we let the bencher know the number of elements.

  • value and type parametrization. Right now this is done through macros in a very inside-out way. An example is here: https://github.com/jonhoo/ordsearch/blob/master/src/lib.rs. A lot of this would be easier if the bencher (or #[bench] attribute) allowed parametrization. Parametrization can take two forms:
    • type parametrization: I want to run the same benchmark for HashMap, MyHashMap, ThisOtherHashMap.
    • value parametrization: I want to run the same benchmark for the combination of N different populations and K different element sizes.

It would be nice if type and value parametrizations can be composed: I want to run the same benchmark for the combination of N populations and K different element sizes across L different implementations of the container and D different distributions (uniform, zipf, whatever).

Some of the above can be implemented on top of the current API with enough macros but I think they are so important in writing benchmarks that warrant first class support for ease of use/ergonomics.

Sample code as TLDR and starting point for discussion:


// size becomes a runtime param, ballast is a constant
#[bench(size=(1000..1_000_000).step_by(1000), ballast=[0, 4, 8, 16, 32, 64]]
fn lookup_hit(b: &mut Bencher, size: usize, ballast: usize) {
  let m = generate_map(size, [0u8; ballast]);
  let mut k = &m.map(|k,v| k).collect::Vec<_>();
  rng().shuffle(vec.as_mut_slice());
  let i = k.iter();
  b.iter(|| {
    if i.is_none() { i = k.iter(); }
    m.get(&i.next().unwrap())
  })
}

The above should generate proper names for the benchmark:

lookup_hit/size=1000/ballast=0  ...  Xns
lookup_hit/size=1000/ballast=4  ...  Xns
lookup_hit/size=1000/ballast=8  ...  Xns
...
lookup_hit/size=2000/ballast=0  ...  Xns
...

Benchmarking sort:

#[bench(size=[1000, 10_000, 1_000_000])
fn sort(b: &mut Bencher, size: usize) {
  let v = (0..size).map(rng().gen::<u32>()).collect::Vec::<_>();
  // Explicitly tell bencher this iteration involves `size` ops.
  b.iter_n(size, || {
    v.as_mut_slice().sort();
    v
  });

My rust foo is not powerful enough for type parametrization. My ideas so far involve macro like invocations - perhaps there is a way to do this without macros?

3 Likes