Even though it's the easiest part, I struggled days to implement a prototype linking the root to the first leaf node only. Performance:
>cargo benchcmp old new --threshold 5
name old ns/iter new ns/iter diff ns/iter diff % speedup
btree::map::find_rand_100 18 17 -1 -5.56% x 1.06
btree::map::find_seq_10_000 40 42 2 5.00% x 0.95
btree::map::first_and_last_0 31 19 -12 -38.71% x 1.63
btree::map::first_and_last_100 43 35 -8 -18.60% x 1.23
btree::map::first_and_last_10k 75 41 -34 -45.33% x 1.83
btree::map::insert_rand_10_000 43 46 3 6.98% x 0.93
btree::map::iter_mut_1000 3,936 3,483 -453 -11.51% x 1.13
btree::map::iter_mut_100000 473,845 439,215 -34,630 -7.31% x 1.08
btree::map::iter_mut_20 84 58 -26 -30.95% x 1.45
btree::map::range_unbounded_unbounded 2 3 1 50.00% x 0.67
btree::set::clone_100 1,874 2,031 157 8.38% x 0.92
btree::set::clone_100_and_clear 1,882 1,984 102 5.42% x 0.95
btree::set::clone_100_and_into_iter 1,866 1,960 94 5.04% x 0.95
btree::set::clone_10k 199,110 210,662 11,552 5.80% x 0.95
btree::set::clone_10k_and_clear 198,142 212,531 14,389 7.26% x 0.93
btree::set::clone_10k_and_drain_all 290,860 307,580 16,720 5.75% x 0.95
btree::set::difference_random_100_vs_10k 2,706 2,472 -234 -8.65% x 1.09
btree::set::intersection_100_neg_vs_100_pos 26 24 -2 -7.69% x 1.08
btree::set::intersection_100_neg_vs_10k_pos 35 30 -5 -14.29% x 1.17
btree::set::intersection_100_pos_vs_100_neg 27 24 -3 -11.11% x 1.12
btree::set::intersection_100_pos_vs_10k_neg 31 29 -2 -6.45% x 1.07
btree::set::intersection_10k_neg_vs_100_pos 30 28 -2 -6.67% x 1.07
btree::set::intersection_10k_neg_vs_10k_pos 32 30 -2 -6.25% x 1.07
btree::set::intersection_10k_pos_vs_100_neg 30 28 -2 -6.67% x 1.07
btree::set::intersection_10k_pos_vs_10k_neg 33 30 -3 -9.09% x 1.10
btree::set::intersection_random_100_vs_10k 2,465 2,281 -184 -7.46% x 1.08
What microbenchmarking chooses to say this time, mostly corresponds to theory: first_and_last_* benchmarks become less hindered by map size. The neg/pos intersection benchmarks show the same because they basically do the same thing (i.e. get min and max of the set with positive ints and of the set with negative ints, compare and quickly conclude the intersection must be empty).
I'll try to extend this to the last leaf node, but being a moving target it might run away from me. An alternative, not great idea I had was that the common BTreeMap iterators (which are DoubleEndedIterator). don't need to precalculate front and back iterators, they can decide on demand which side to kick off from and see when they reach either the other end or the other iterator if both sides were used. That way, and with the first leaf node stored, if you iterate forward, you don't waste anything. But the extra complexity and asymmetry scare me.
To avoid upping the memory footprint of empty maps, the leaf node pointer(s) could be stored in a special kind of root node. I think that comes down to keeping the existing LeafNode
at height 0, InternalNode
only at height 1 (i.e. one level above the leaves) and, once the tree grows bigger, introducing a new node type with leaf edge pointers in addition to edges.