Is 6 the right B for BTtree? how to benchmark?


#1

The docs for BTreeMap state

However in the future we would like to further explore choosing the optimal search strategy based on the choice of B, and possibly other factors.

I’d like to start some of that discussion.

Currently B = 6 and we use linear search. In the Python world good performance has been found with 100 < B < 10000 and binary search. (and depth limited to 2)

I’d like to do some benchmarking to ground the decision on what should be the default. But benchmarking Binary search is not always straightforward. I.E. I am jumping in somewhat over my head.

(Note, also this link)

What evidence do the libs team want to see? What pitfalls do I need to watch out for in benchmarking? Am I redoing work someone else is doing?


#2

I believe this was fairly heavily benchmarked when configurability of the B value was removed before 1.0 and it turned out to not really matter much if at all. I can’t find that PR right now but it would be worth tracking down for background.


#3

This is the PR where const B: usize = 6; was added.

I have not found benchmarks of changing the size of B, nor changing to binary or adaptive search. Links of course welcome.


#4

I have been doing some research. https://schani.wordpress.com/2010/04/30/linear-vs-binary-search/ is releavent.


#5

There is an interesting variation of binary search, where you lay out elements of the array in a cache friendly manner. That is, if text book binary search will look at the middle element first, and than at .25 and .75 quantiles (which may be far away cache wise), this variation will look on the first and second or third element: http://bannalia.blogspot.ru/2015/06/cache-friendly-binary-search.html


#6

The old with_b constructor used a runtime-level B, there was some doubt whether it would be compatible with future extensions of BTreeMap (generic parameter-selected B?).


#7

Yes, The move from a runtime-level B to a constant was important for many performance improvements. Definitely a win! I hope we can get with_b back when we get type level numbers, but we can have that conversation when they arrive. In the meantime I just wonder if 6 is the best choice, and on what evidence we can decide.

I love how open rust is to enthusiastic respectful new contributors. I’d love to help do the benchmarking to decide these things. I just want to make sure that I am being respectful of the team’s time, by bringing up evidence that they will find useful.