BTreeMap constant time first/last iterators

jnordwick · April 7, 2020, 7:12am

One of the common operations for sorted maps/sets is looking at first or last few elements. C++ std::map has constant time complexity for begin() and end(). While BTreeMap might be better in many ways I cannot even use it because getting the first and last iterator is 2*log(n). And this isn't something easily fixed by making a wrapper with my needs because on new inserts to front or back of the collection, I would need to do another log(n) probe to get the first/last iter.

As it stands, two log(n) probes to get a front of tree iterator is way too expensive for my use. Can a constant time version be made?

(This isn't unique btreemap, i've also had re-implement my down doubly linked list because of similar issues where the idea sounds nice, but it just lacks some basic functionality to make it useful in more complex cases.)

jschievink · April 7, 2020, 10:33am

Nightly has first_key_value and last_key_value for this, but I think they're O(log n) (how would a constant-time version be implemented?). This PR improves them a bit and adds pop_first and pop_last methods.

RalfJung · April 7, 2020, 12:38pm

Potentially it could always keep pointers to the first/last element around? Updating those also has a cost, though...

Cc @ssomers

Amanieu · April 7, 2020, 12:41pm

Actually due to the way B-Trees are structured, the pointer to the first node never needs to be updated. The pointer to the last node will need to be updated when splitting and merging nodes however.

ssomers · April 7, 2020, 1:09pm

That PR just changes looks, not performance.

jnordwick · April 7, 2020, 3:28pm

The actual iterators are needed, not just the first value. And I am using nightly so I've already seen that, and it is still 2*log n to get an iterator. And I don't want to pop anything.

jnordwick · April 7, 2020, 3:35pm

Updating those also has a cost, though...

That mostly properly predicted branch (even if not) on every insert and removal is a hell of a lot cheaper than all the cache misses running down both side of the tree.

jschievink · April 7, 2020, 4:02pm

Don't forget the size increase of 2 * size_of::<usize>() for every BTreeMap and BTreeSet though

jnordwick · April 7, 2020, 4:44pm

I'd consider that more than acceptable - the gain is constant time iterator creation, min, max, left most, right most, etc.

Iterator creation is horribly expensive in the current implementation. With 10k elements, even wtih an avg branching factor of 5, you're expecting 10+ cache misses to create an iterator.

RalfJung · April 7, 2020, 9:46pm

You would consider that acceptable, yes -- but I am sure there are usecases where duplicating the size of a BTreeMap would be a disaster, but the tiny cost of 10 cache misses every now and then is entirely irrelevant.

jnordwick · April 7, 2020, 11:08pm

You would literally be adding to pointers to the top level structures, not to every node. That is dwarfed by the size of the data held by those structures. Far more people will be impacted by a 2*log(n) iterator creation that 16 more byes on a one time hit.

If they are that memory constrained they aren't going to be using btreemap (especially when there isnt any way to control the allocation strategy).

This argument is a huge reach.

kornel · April 7, 2020, 11:44pm

I'm having hard time imagining such case. It would require lots of mostly empty BTreeMaps, and then my worry would be not that it would take too much memory overall (these are just a couple of pointers), but that it would not fit in the cache — causing more cache misses. Cache misses are the major cost to worry about.

Tom-Phinney · April 8, 2020, 12:23am

So @jnordwick, why not create a modified version of BTreeMap that does what you want and benchmark it in various scenarios? Without any hard comparative data this discussion is really just hypothetical.

steffahn · April 8, 2020, 12:24am

I’m guessing a better comparison might be between:

the 2*log_5(n/11) [i.e. practically always a TINY constant] time overhead that you might only encounter in the occasional moments when you create an iterator, where in common use cases this overhead is dwarfed by the time that the actual iteration over a decent part of the whole structure takes

vs.

a repeated constant time overhead in potentially almost every operation that updates the map to keep those two pointers up to date.

I mean, I don’t know the data structure too well, the best approach would probably be to implement a different version and just benchmark. If there’s no noticeable performance regressions anywhere then, sure, maybe it is a good idea to change the implementation. Ah, and now @Tom-Phinney said the exact same thing before I finished xD

jnordwick · April 8, 2020, 5:31am

First of all, that is log time overhead, not constant and when you 10k-100k elements, it matters. but second, cache misses are how you evaluate high performance, latency sensitive run time costs. Cache hit rates are your deciding factor. You would rather do hundreds to thousands of useless operations than make a trip to memory. This it the reason bteemap was chosen over red-black or avl trees. And this is also the reason for rust going with a quadratic probe hash table. You're basically arguing against the decision criteria used for choosing these version the data structures.

Literally, cache is king.

A dozen cache misses vs. one (maybe be 2 at times) comparisons that are mostly going to be properly predicted (so 1 cycle plus the instruction slot) is such a no-brainier trade off.

Iterators are the largest reason you use a tree over a hash - you need ordering and some for of traveral (prev and next values).

I don't have the rust skills to implement this at this time. I tried to read the bteemap code and it is incredibly complex rust code (the btree algorithms aren't that complex, just all the rust-ism is beyond my level.

PoignardAzur · April 8, 2020, 8:57am

I'm not saying you're wrong, but this is a little more confrontational than necessary.

pcpthm · April 8, 2020, 11:16am

This is not what OP is asking but I think there is a way to improve the time complexity in OP's use-case by adding data structure without modifying BTreeMap itself.

For the simplicity, I only describe one-way implementation (first only). It can be extended to two-way but it will be slightly more complex.

You maintain a sorted array of length P in addition to the main B-tree data structure where P is a constant P ∊ Θ(log N) that can be tuned. For a sorted sequence of n elements, the array always contains the prefix of sequence of size min(n, P) and the remaining suffix of is maintained by the B-tree.

It can be implemented so the time complexity is O(log N) for insert/remove/find, and iterating first k elements runs in O(k) time (in particular, the first element can be accessed in O(1) time). It believe it is also fast in practice because it is just an array access. Implementation: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=2c63d0fa5dc188fc5f0e8ea0650e7227. Probably it can be improved by using SIMD instructions or some other techniques.

Tom-Phinney · April 8, 2020, 3:02pm

To paraphrase, you haven't found a crate in the Rust ecosystem that offers a version of BTreeMap that meets your needs, and you yourself don't have the needed skill with Rust to create such a derivative version. Perhaps a query for help on the Rust Users Forum might locate someone able to make the changes you want, resulting in a crates.io contribution with a new variant of BTreeMap.

jnordwick · April 8, 2020, 3:28pm

resulting in a crates.io contribution with a new variant of BTreeMap

Splitting the BTreeMap on such a small change would be counterproductive the the rust ecosystem. It is already balkanized enough. The point of bringing this up herre was to stay in the bound the current one. If my view was "screw you I'm going to write my own " I wouldn't have posted here and would have just written my own (in a very C-like style because that is about where my rust knowledge is). It is a no-brainer for 90% of performance-sensitive cases.

jnordwick · April 8, 2020, 3:32pm

All that work to avoid two pointer and an extra cmp? This is a horrible idea. This is basically building your own copy of the first and last node in the tree. Why not just keep pointers to them in the structure?

I'm getting the feeling here that most people in this discussion don't have strong performance requirements in the software they write - which is fine - but it creates a very different mindset. Asymptotic complexity isn't what matters most here (although my original post wasn't very clear on that) - it is the dozen cache misses that kill things. You get rid of that, but now I have to do way more work that the BTreeMap would be doing if it was keeping track of the first land last node itself.

Topic		Replies	Views
Interface improvements for BTreeMap libs	8	2136	October 16, 2022
Suggestion: BTreeMap/BTreeSet O(log(n)) n-th element access libs	4	2106	May 26, 2019
BTreeMap nth() improvements	1	1029	December 7, 2021
BTreeMap: searching with a predicate? libs	10	2593	March 25, 2019
Comparator for BTree(Map\|Set)s libs	9	1950	October 16, 2022

BTreeMap constant time first/last iterators

Related topics