Hello! I’ve started to implement `split_off`

for `BTreeMap`

and `BTreeSet`

(a part of this issue), but it needs a decision to make.

There is an algorithm of splitting BTree by value using only O(log n) operations (some description is here), but to recalc sizes of trees after `split_off`

, it’s necessary to know exact size of each subtree.
So, there are some different solutions:

- add
`size`

field to the node structure.

I think, benefits are great.

First, it lets us to do very flexible splits and merges with only O(log n) operations. Now it’s possible to merge trees using O(log n) operations, but it is not so interesting without split.

Second, some other interesting and usefull functions need it.
For example: `kth_element`

or “get the number of elements between two values”. Both with O(log n) operations.

Third, as I understand (but i am not totally sure), it lets us to implement the rope in easy way. Just as BTreeMap<(), V>.

Also some optimizations of a good case in `app_end`

need it.

But overhead is notable (4 or 8 bytes for each node (there are about n / (1.5B) = n / 9 nodes, where n is a number of elements in the container), depending on machine word size). I think, it is possible to embed this sizes without memory overhead for 64-bit machines (due to aligning).

- Implement
`split_off`

with O(log n + “size of one of the results”)

2.1) Split with O(log n) operations and then calculate size of one of the results.

2.2) There is another algorithm which doesn’t use the fast algorithm with the same complexity, but (as I understand) with greater constant.

- Implement
`split_off`

with O(log n + “size of the smallest one”)

Split using O(log n) operations and then calculate sizes of the trees via 2 alternating iterators, but stop when one of the sizes is counted.

So, which solution is better?

I personally prefer the first one.

What do you think? Is this overhead okay or not?