`BTreeMap<A, B>` -> `BTreeMap<A, C>`: Fast and memory efficient

Considering the following problem:

  • I have a BTreeMap<A, B>
  • I have a Fn(B) -> C
  • I want a BTreeMap<A, C>

I currently have two options, to pick from:

  • BTreeMap::from_iter(map.into_iter().map(f)) - cpu efficient but memory inefficient: since from_iter can't assume the iterator is already sorted, it consumes the entire iterator into a temporary heap allocation, then sorts it, then calls the internal bulk_build_from_sorted_iter function (which is very fast). Assuming a BTreeMap<String, String> with 10_000_000 entries on a 64bit computer, this is going to cause an additional 457MiB allocation for the Vec however.
  • new_map.extend(old_map.into_iter().map(f)) - memory efficient but cpu inefficient: this calls .insert() for each item, effectively copying them over one-by-one. This doesn't have any unexpected spikes in memory usage, but each insert is going to traverse the tree to find the correct insert position, over and over.

There was some prior discussion about exposing bulk_build_from_sorted_iter so we can make our own function that is both fast+efficient:

To avoid the discussion about "should it panic, should it be unsafe { }" (me personally would prefer "may panic" over "unsafe"), and since BTreeMap<A, B> already gives us Ord guarantees, there could be the following APIs (slightly pseudo-code-y):

  • BTreeMap::map_values<K, V, F: Fn(K) -> V>(self, F) -> Self<K, V>
  • BTreeMap::try_map_values<K, V, E, F: Fn(K) -> Result<V, E>>(self, F) -> Result<Self<K, V>, E>

This is the first time I'm proposing language improvements like this (also first post in the internals forum), let me know what you think! :sparkles:

10 Likes

:+1: for having from_sorted_iter (and agreed that it should check sortedness and panic).

And :+1: for map_values and try_map_values.

2 Likes

How do you ensure that the mapped value has the same order as the original value?

What if we just gave a sorted happy path to extend? Have it track the maximum item so it can skip the traversal in that case.

2 Likes

BTreeMap is sorted by key, so if you're only mapping values the order doesn't change.

That seems like a great potential optimization.

7 Likes

A comment from irc:

15:31 <saati> kpcyrd: with the try version the original collection gets dropped on an error?

Most of the time I'd probably be fine with dropping both collections in case of an error, this would be analogous to:

impl<A, E, V> FromIterator<Result<A, E>> for Result<V, E>
where
    V: FromIterator<A>,

Alternatively BTreeMap::try_map_values could return a Result<BTreeMap<A, C>, (E, BTreeMap<A, B>, BTreeMap<A, C>)> (error, old collection, new collection - the type complexity is slightly unfortunate though, and the value that was causing the error would be missing/lost).

Having an optimized .extend() would also work for my use-case. :slight_smile:

1 Like

This whole "sorted happy path" for binary tree insertions sounds weird to me. The tree still has depth - you still need to traverse it and rebalance it on every insertion, and for that it doesn't really matter if the items are sorted or not - it's still O(log n) per insertion. I'd understand if bulk_build_from_sorted_iter was accepting an ExactSizeIterator (and then it could easily deduce the entire structure of the tree) but it doesn't - so I although I did not look into how it works, I doubt its optimizations are as good as what map_values can achieve.

The biggest advantage of map_values is that it can just copy the structure of the original tree over to the new one. This means that it doesn't need to navigate the tree from the root on each insertion, and doesn't need to rebalance it - it'll be unbalanced during its construction and only be fully balanced at the very end.

This, BTW, mean it's worthwhile to also make & (and &mut) versions of these methods (where K: Clone, of course)

1 Like

But BTree is not a binary tree. It's a b-tree with a much higher branching factor. Binary search trees are definitely not a very good idea on today's machines that love sequential memory access and hate indirection and branches. And unpredictable branches even more.

2 Likes

Okay, but my point stands - it's still a tree and insertions still require traversal and balancing, so copying the original tree's structure should be faster.

2 Likes

With specialization this could be made to just work™, similar to Vecs.

1 Like

Can it check that f makes no changes to the key?

1 Like

It'd have to check that keys remain in the same order, i.e. that each is Ordering::Greater than the previous. When that is violated it'd have to fall back to the regular approach that allocates new nodes.

If the keys are sorted then AFAIK you can just construct a balanced tree with empty nodes and copy the keys and values into it in the correct order. No balancing needed, because you create the new tree with a balanced structure from the outset.

1 Like

It's possible to build a B-tree from a sorted iterator in O(1) time per element. You keep a pointer to the rightmost leaf and most of the time you just append to the leaf in O(1) time. It's only when the leaf is full you have to go up one level and rebalance. Even less often you have to go up 2 levels. Etc. It all amortizes to O(1) per element. You can also make it even faster and just keep appending at all levels without rebalancing, and only once at the end rebalance the single rightmost path of non-full nodes.

2 Likes

How much less often? Like O(log n) so often? That is me being glib but constructing a B tree is going to make O(n log n) so much stuff. It's a small log n, which is part of what makes B trees a very nice data structure, but it is there. Edit: Ah nvm whoops.

One could pre-allocate the entire tree structure knowing n, and I'm sure there are other tricks. Construction from a sorted iterator sounds more generally useful, and only benchmarks will tell if it is useful/fast enough.

B times less often to go up one level, B^2 times less often to go up two levels, etc. So the whole thing amortizes to 1 + (1/B) * B + (1/B^2) * B + (1/B^3) * B + ... < 1 + 1 + 1/2 + 1/4 + ... < 3 = O(1) per element.

Memory use of the whole B-tree is O(n), not O(n log n). All nodes (except root) are at least half full, after all. Most space is taken up by leaves.

3 Likes

I see no reason why we couldn't just have both, like many other stdlib APIs that give a checked and an unchecked version.

And in this case, I don't think the unchecked version would even need to be unsafe. Per the docs on BTreeMap, it already handles the case of incorrectly-sorted keys because of interior-mutability:

It is a logic error for a key to be modified in such a way that the key’s ordering relative to any other key, as determined by the Ord trait, changes while it is in the map. This is normally only possible through Cell, RefCell, global state, I/O, or unsafe code. The behavior resulting from such a logic error is not specified, but will be encapsulated to the BTreeMap that observed the logic error and not result in undefined behavior. This could include panics, incorrect results, aborts, memory leaks, and non-termination.

So using from_sorted_iter_unchecked with an unsorted iterator wouldn't be any worse than using interior-mutable values to unsort the keys after insertion.

It might still be desirable to mark it as unsafe, if only to make people think twice about correctness before calling it. I can't think of any other functions in the standard library that can set up such deferred pathological-but-sound behavior for a wide range of possible inputs.

Verifying the order is very cheap compared to, say, looking up all those keys once, so there isn't much value in the unchecked version.

I think it would need to be unsafe because BTreeSet<i32> is guaranteed to be ordered.

1 Like

It handles that case by not causing UB, but it can result in panics or other weird behaviour. From the point of view of a user of BTreeMap this is currently the case only for keys with interior-mutability, i.e. if I receive a BTreeMap<i32, i32> I expect it to always behave correctly, and a safe from_sorted_iter_unchecked would break this.

3 Likes