Interface improvements for BTreeMap

In the last months, when implementing some algebra algorithms in rust, I stumbled upon a few shortcomings of the BTreeMap interface. I will list here the issues I had and the proposed solution.

  • I need a key in order to be able to search the BTreeMap, when it might be expensive to build one. This issue has already been brought up in this post, and I think the author's proposed solution is quite feasible, and we already have a similar solution with slice's binary_search_by.
  • Sometimes I want the key order to depend on some external data. There is no way to use this external data from Ord::cmp unless I store an extra reference on every key element. This issue is easily solvable by taking a comparator object in the constructor, like C++ std::map. I saw this issue already came up on this StackOverflow question.
  • Once you search for an entry in the BTree*, you should be able to iterate from that point in both ways, and still be able to insert a new element in the found entry. My specific problem is that I needed to retrieve an element, and if not found, create the value from the previous and next value and insert it into the BTreeMap. Using BTreeMap interface, the only way to do it was with 3 independent searches with the same key: map.range(key..), map.range(..key) and map.insert(key, new_value). In comparison, with C++ std::map, I can do it with a single map.lower_bound(key) search, and from the returned iterator I can get both the lower and higher values, and insert back in O(1) by using the hint argument of map.insert(). On rust, I am not sure what would be a good interface for this, but maybe btree_map::Entry should have two methods: iter_forward() and iter_backwards(), returning iterators in both directions from the entry. Maybe the entry itself is a pointer that can be advanced. I don't know. I would like to see your ideas about such interface that is idiomatic, fits well existing interface, and is as flexible as C++ std::map.
  • Finally, on a more general side, I missed some form of unsafe interface for the std::collections. In particular, coming from C++, I missed the ability to hold an iterator for a map and change the map while the iterator still remained valid (i.e. the ability to iterate and mutate simultaneously). This may be impossible or much harder in a B-tree, because each entry is not an independently allocated node, but the general feeling is that the std::collections are not meant to be used in potentially unsafe operations, and they lack an interface for those that need to do it. But being in std and all, maybe they should.
1 Like

I don't believe this will ever change as doing that is kind of asking for race conditions. In addition, creating an iterator either takes the collection by value (i.e. .into_iter()), in which case it consumes the collection, or it takes an shared borrow (i.e. &collection), in which case it's immutable¹, or it takes a mutable borrow (i.e. &mut), in which case it can't be shared to both iterate and mutate the collection at the same time.

In essence, when using iterators that access pattern is quite problematic. However, an alternative for it is to just use a loop:

for i in start..end {
    collection[I].mutate();
} 

¹Not counting interior mutability, which is very unlikely to be desirable for iterators in general.

1 Like

Rather than storing a comparator, the more likely choice is probably something like the hashmap raw_entry API, where you provide the hash and an equality check callback, and insertion takes a hashing function.

For BTreeMap IIUC you'd just need impl Fn(&Key) -> Ordering for lookup and Key, Value, impl Fn(&Key, &Key) -> Ordering for insertion.

A stored comparator object is certainly more convenient, but providing one only when it's needed is more general and Rust has already biased for intrinsic ordering (e.g. cmp::Reverse<T>) rather than comparators.

1 Like

Regarding mutation-during-iteration: With lending iterators it would be possible for the iterator itself to provide mutation methods that update both the map and adjust the iterator state after the map update.

With the current iterator trait that seems difficult because the iterator hands out references that must remain valid for the lifetime of the iterator so the map can't be modified in a way that moves around values. A cursor API would be needed instead.

4 Likes

A cursor would also solve the problem of multiple searches with the same key, specially if it can be used as a hint for insertion. Something like map.entry_from_cursor(cursor, key), that would linear search starting from the cursor position, instead of doing a tree search.

I'd be up to mentoring a raw_cursor implementation. Open an API Change Proposal and ping me as a willing mentor (though note I don't have authority to second the proposal).

Hi @lvella

I come from C++ and I share pretty much your concerns, which I summarized in my somewhat similar post today

  1. I dislike that I have to pass the key by value to the entry API.
  2. I dislike that cannot make my comparator rely on what you call external data. Actually I stumbled upon the exact same stackoverflow question because I was building the same kind of thing.
  3. I also would find it better to avoid searching multiple times if I'm trying to perform a "fetch or add" key in one shot. In C++ one can simply combine the following to achieve that: hint=lower_bound(k) and then use the hint to insert in the right location if the key is not found insert(kv,hint)

I believe having a raw_entry|cursor API similar to the one in the HashMap (HashMap::raw_entry_mut) can actually be the answer to all 3 problems.

1 Like
2 Likes