Hello! I’ve started to implement
BTreeSet (a part of this issue), but it needs a decision to make.
There is an algorithm of splitting BTree by value using only O(log n) operations (some description is here), but to recalc sizes of trees after
split_off, it’s necessary to know exact size of each subtree.
So, there are some different solutions:
sizefield to the node structure.
I think, benefits are great.
First, it lets us to do very flexible splits and merges with only O(log n) operations. Now it’s possible to merge trees using O(log n) operations, but it is not so interesting without split.
Second, some other interesting and usefull functions need it.
kth_element or “get the number of elements between two values”. Both with O(log n) operations.
Third, as I understand (but i am not totally sure), it lets us to implement the rope in easy way. Just as BTreeMap<(), V>.
Also some optimizations of a good case in
app_end need it.
But overhead is notable (4 or 8 bytes for each node (there are about n / (1.5B) = n / 9 nodes, where n is a number of elements in the container), depending on machine word size). I think, it is possible to embed this sizes without memory overhead for 64-bit machines (due to aligning).
split_offwith O(log n + “size of one of the results”)
2.1) Split with O(log n) operations and then calculate size of one of the results.
2.2) There is another algorithm which doesn’t use the fast algorithm with the same complexity, but (as I understand) with greater constant.
split_offwith O(log n + “size of the smallest one”)
Split using O(log n) operations and then calculate sizes of the trees via 2 alternating iterators, but stop when one of the sizes is counted.
So, which solution is better?
I personally prefer the first one.
What do you think? Is this overhead okay or not?