BTreeMap constant time first/last iterators

How can you, or anyone, create a derivative alternate version and benchmark it against the current version without forking the current version?

FYI, Rust policy is to have a minimal std library, with many derivative alternatives for disjoint specialized needs. You can search both forums for discussion of that topic; it comes up again and again from people with experience in ecosystems with large standard libraries.

1 Like

This isn't very specialized - it is literally the behavior of the default tree map in STL. You're overreacting to such a minor (very normal) change, acting like it something that is unheard of or useless to 90% of the people, when is actually better default behavior. This isn't a large change, does not increase the interface size. You're blowing this way out of proportion.

3 Likes

This will be my last post on this topic.

There is no way that the existing std-library version of any routine will be changed without a trial implementation of the proposed updated version and comparative benchmarking across a sample of use cases. That is the process even for the infrequent fixes to discovered UB; it certainly will be the process for any less-essential changes.

2 Likes

We already did it for the HashMap. At October 2018 some people wrote an alternative implementation of the hashmap as an external crate, called hashbrown. They made an excellent performance comparison with proper benchmark so we can agree that it would be reasonable to include it into the stdlib. And now the std::collections::HashMap depends on that crate.

5 Likes

Did you try to implement my proposal? I don't think it will perform "horrible" although there are ways to improve.

Anyway, if you want ultimate performance, you should write your customized data structure. B-tree is very customizable data structure and has many variations. But std cannot because those have different advantages and disadvantages, depending on the problem size, access patterns (how much proportion is insert operation?), and computational architectures. I don't know what exactly is your requirement.

Choosing right data structure is important before doing minor tuning. Another data structure which can be used depending on your usage is double-ended priority queue. I don't have information about practical implementation of it but a heap is typically perform more than 2x better to BSTs.

How did you find that line offensive? I don't see it.

Folks, there's no need to say that std cannot have these sorts of changes.

The right way forward here is to make a PR with a performance analysis. That will take time and careful work to do right, and will require driving consensus. I think anyone should be encouraged to do this. I don't think there's much more to discuss here. Either someone does it or they don't.

Let's please avoid further meta discussion. Everyone should chill a bit here. Before responding, consider whether you're meaningfully advancing the discussion forward.

19 Likes

This entire conversation is about iteration and you cannot iterate over a heap. I know this stuff pretty well, so I know the trade offs I'm making.

We need a rust data structures project out libstd that (1) designs a trait hierarchy for data structures, analogous to the iterator trait hierarchy, and (2) builds data structure with (2a) more performance tuning opetions, and (2b) a more easily forkable code base for people who need even more niche optimizations.

Although not a blocker, but we'd want trait delegation here, both because users would commonly delegate data structure traits, and because we expect inherent traits to arise from delegation.

(See Libcollections traits: a standard interface for all collections and Collection Traits, Take 2 for previous discussions on collection traits.)

1 Like

Even though it's the easiest part, I struggled days to implement a prototype linking the root to the first leaf node only. Performance:

>cargo benchcmp old new --threshold 5
 name                                         old ns/iter  new ns/iter  diff ns/iter   diff %  speedup
 btree::map::find_rand_100                    18           17                     -1   -5.56%   x 1.06
 btree::map::find_seq_10_000                  40           42                      2    5.00%   x 0.95
 btree::map::first_and_last_0                 31           19                    -12  -38.71%   x 1.63
 btree::map::first_and_last_100               43           35                     -8  -18.60%   x 1.23
 btree::map::first_and_last_10k               75           41                    -34  -45.33%   x 1.83
 btree::map::insert_rand_10_000               43           46                      3    6.98%   x 0.93
 btree::map::iter_mut_1000                    3,936        3,483                -453  -11.51%   x 1.13
 btree::map::iter_mut_100000                  473,845      439,215           -34,630   -7.31%   x 1.08
 btree::map::iter_mut_20                      84           58                    -26  -30.95%   x 1.45
 btree::map::range_unbounded_unbounded        2            3                       1   50.00%   x 0.67
 btree::set::clone_100                        1,874        2,031                 157    8.38%   x 0.92
 btree::set::clone_100_and_clear              1,882        1,984                 102    5.42%   x 0.95
 btree::set::clone_100_and_into_iter          1,866        1,960                  94    5.04%   x 0.95
 btree::set::clone_10k                        199,110      210,662            11,552    5.80%   x 0.95
 btree::set::clone_10k_and_clear              198,142      212,531            14,389    7.26%   x 0.93
 btree::set::clone_10k_and_drain_all          290,860      307,580            16,720    5.75%   x 0.95
 btree::set::difference_random_100_vs_10k     2,706        2,472                -234   -8.65%   x 1.09
 btree::set::intersection_100_neg_vs_100_pos  26           24                     -2   -7.69%   x 1.08
 btree::set::intersection_100_neg_vs_10k_pos  35           30                     -5  -14.29%   x 1.17
 btree::set::intersection_100_pos_vs_100_neg  27           24                     -3  -11.11%   x 1.12
 btree::set::intersection_100_pos_vs_10k_neg  31           29                     -2   -6.45%   x 1.07
 btree::set::intersection_10k_neg_vs_100_pos  30           28                     -2   -6.67%   x 1.07
 btree::set::intersection_10k_neg_vs_10k_pos  32           30                     -2   -6.25%   x 1.07
 btree::set::intersection_10k_pos_vs_100_neg  30           28                     -2   -6.67%   x 1.07
 btree::set::intersection_10k_pos_vs_10k_neg  33           30                     -3   -9.09%   x 1.10
 btree::set::intersection_random_100_vs_10k   2,465        2,281                -184   -7.46%   x 1.08

What microbenchmarking chooses to say this time, mostly corresponds to theory: first_and_last_* benchmarks become less hindered by map size. The neg/pos intersection benchmarks show the same because they basically do the same thing (i.e. get min and max of the set with positive ints and of the set with negative ints, compare and quickly conclude the intersection must be empty).

I'll try to extend this to the last leaf node, but being a moving target it might run away from me. An alternative, not great idea I had was that the common BTreeMap iterators (which are DoubleEndedIterator). don't need to precalculate front and back iterators, they can decide on demand which side to kick off from and see when they reach either the other end or the other iterator if both sides were used. That way, and with the first leaf node stored, if you iterate forward, you don't waste anything. But the extra complexity and asymmetry scare me.

To avoid upping the memory footprint of empty maps, the leaf node pointer(s) could be stored in a special kind of root node. I think that comes down to keeping the existing LeafNode at height 0, InternalNode only at height 1 (i.e. one level above the leaves) and, once the tree grows bigger, introducing a new node type with leaf edge pointers in addition to edges.

7 Likes

Google's btree stores the first element in the parent pointer of the root node. However I think we can defer implementing such optimizations for now.

2 Likes

I guess nothing more recent? I suppose even doing Entry optimally for HashMap remains hard enough ala https://github.com/rust-lang/rfcs/pull/1769 and [Pre-RFC] Abandonning Morals In The Name Of Performance: The Raw Entry API so traits might only help when you've lots of code that does not require optimization.

As an aside, an optimal entries (plural) interface for a cuckoo tables looks nothing like the entry interface for a hash map. Insertion into cuckoo tables has only four-ish placement options, so you handle indexes into the table which possibly change during actual insertion:

pub struct Entries<'table,KO,V,H: CuckooHasher> {
    table: RefCell<&'table mut CuckooTable<KO,V,H>>
}
pub struct IndexFetch<'table,KO,V,H: CuckooHasher> {
    entries: Entries<'table,KO,V,H: CuckooHasher>,
    index: H::Index,
    key: H::Key,
}
pub struct IndexInsert<'table,KO,V,H: CuckooHasher> {
    fetch: IndexFetch,
    key_owned: KO,
}

As for owned vs borrowed key issues, cuckoo tables should've an owned key that appears inside the table, and a separate ephemeral aka borrowed key that appears only during hashing, and does not even appear in the cuckoo table's type.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.