Comparator for BTree(Map|Set)s

Hi,

I was wondering if it is possible to have a custom comparator for the ordered collections (BtreeMap|Set) where one can pass closures. I have exactly the same use case described in the question below in which I have some dictionary (essentially a vector of strings) where all my strings are stored and I want to establish an order (using a BtreeMap) over that vector using the vector indices. So I want to lookup those integer values only for comparison. To me the current available options are really non appealing

  1. Having the dictionary global and having to deal with all the hassle that comes from that.
  2. Making my key carry an additional pointer to the dictionary seems like an overkill memory wise.

Perhaps another alternative can be adding a raw_entry API to the BTreeMaps similar to that in the HashMap where one can have access to the handle where the Entry is supposed to be inserted and it is up to the caller to find that entry with some closure.

How can I implement Ord when the comparison depends on data not part of the compared items?

As I said the previous time this came up

Though I've heard thought the grapevine that the std hashmap unstable raw_entry API might be slated for removal? I think a btree raw_cursor might be slightly easier to support, though, due to the more common use case of using a custom comparison, raw lookup in the btree being simpler than in the hashmap, and being able to provide cursor functionality through the raw API not available in the standard API.

In fact I prefer a raw_entry/cursor API addition to the BTrees over a stored comparator. I already find the hashmap counterpart very useful and allows for various optimisations and it would be sad to not have it as part of the stable std finally and also surprising at the same time because it is already part of the stable hashbrown used by the standard. How can we raise some attention to the topic?

Note that hashbrown still has the freedom of API breaking changes, whereas std never gets those. Additionally, hashbrown actually has a second raw_entry API used by std which is separate from the one available to crates-io consumers.

Sure you are right about breaking changes. But honestly I do not see any reason to remove that API in the future if it becomes stable. One can utilize it for very good optimizations and from comments on the topic there are very common use cases for it. The only counter argument is that one might mistakenly break the invariants of the maps but this is already apparently possible as mentioned by the implementer of the hashmap's API in that Comment

Anyway, at least if it's removed, one has the option to use the hashbrown crate. We don't have any options currently for the Btrees.

Now that I think about it, I believe the raw_entry API will be a "safer" option than a comparator. In the use case I described in the original post, I would need to capture some reference to the dictionary in the comparator closure which will be stored inside the BTree eventually for its entire lifetime. On the other hand with the raw entry API I can just do the comparison using the dictionary only right before insertion.

I imagine the API can look like this (obviously stolen from the HashMap counterpart)

pub fn BTreeMap::raw_entry_mut(&mut self) -> RawEntryBuilderMut<'_, K, V, S>
pub fn RawEntryBuilderMut::with_cmp<F>(self, cmp: F) -> RawEntryMut<'a, K, V, S>
where
    for<'b> F: FnMut(&'b K) -> Ordering,

I would skip the builder, personally, and only offer the fully raw API. That is,

pub fn BTreeMap::raw_entry_mut<F>(&self, cmp: F) -> RawEntryMut<'_, K, V>
where F: FnMut(&K) -> Ordering;

The problem is of course defining a proper Cursor API, which is... interesting to say the least. Where does the cursor point? At an element? Between elements? Best proposal: RawEntryMut is an enum of a "not found" which semantically points between elements, and a cursor, which points at an entry in the btree.

To make things simpler we can actually just make any cursor always point to a valid existing element or null for non existent lookups. In addition we simply support 3 operations all of which take a comparator and return a cursor.

  1. lower_bound
  2. upper_bound
  3. get

with the respective mut versions.

Since cursors support insert_after and insert_before then one can use the lower|upper_bound operations to insert new entries without doing multiple lookups or even better, imagine importing a sorted vector into your BTreeMap which would translate to one lower_bound operation and sequence of insert_after.

In that approach perhaps we can keep the builder as a holder for the comparator closure? So we end up with something like that:

pub fn BTreeMap::raw_entry_mut(&mut self, cmp: F) -> RawEntryBuilderMut<'_, K, V, S, OtherK>
where
    for<'b> F: FnMut(&'b K, &OtherK) -> Ordering,


pub fn RawEntryBuilderMut::get(self, &OtherK) -> RawEntryMut<'a, K, V>


pub fn RawEntryBuilderMut::lower_bound(self, &OtherK) -> RawEntryMut<'a, K, V>


pub fn RawEntryBuilderMut::upper_bound(self, &OtherK) -> RawEntryMut<'a, K, V>

enum RawEntryMut
{
Occupied(cursor),
Vacant
}

If K and OtherK are the same types then it is just a different way sorting K from the standard one provided by Ord but of course you get the advantage of having a stateful sorting closure.

This approach is heavily inspired by pretty much what you would do in C++ in a similar situation as I pointed out here

I also stumbled upon those interesting Intrusive Data Structures but the APIs unfortunately don't accept comparator closures.

FYI, a first attempt: BTreeMap Raw Entry API · Issue #73 · rust-lang/libs-team · GitHub

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.