BTreeMap: searching with a predicate?

jneem · September 26, 2018, 2:07am

The BTreeMap::range and BTreeSet::range functions are very handy for a variety of searching tasks; for example, they can be used to find the first element larger than something, or the last element smaller than or equal to something, etc. However, these functions have a common drawback: they require the caller to be able to construct something of a type Q for which K: Borrow<Q> holds. In some cases this might be difficult, for example if your key type is (i32, SomeLargeType) and you want to search for the first key whose first coordinate is at least 5.

One solution would be to add the method BTreeMap::range_by, with the signature

  fn range_by<F: FnMut(&K) -> Ordering>(&self, f: F) -> impl Iterator<Item=(&K, &V)>

(and similarly for BTreeSet) where the function f is supposed to return Ordering::Equal for some consecutive range of values, Ordering::Less for values before that range, and Ordering::Equal for values after that range. The return value would iterate over all values for which f returns Ordering::Equal.

There’s some precedent for this in the slice type, which has binary_search_by in addition to binary_search.

What do you think? Is there a better solution?

EDIT: I forgot to say it, but I want the solution to be as efficient as BTreeMap::range. That is, it should have a one-time cost of O(log n) (where n is the size of the map), and amortized cost of O(1) for each iteration.

jjpe · September 26, 2018, 7:25am

There is. BTreeMap has an iter() method, and Iterators have a .filter() method that takes a predicate.

burntsushi · September 26, 2018, 10:58am

Unfortunately, that does not solve @jneem’s problem. An arbitrary predicate isn’t good enough. You want something that defines an ordering such that it can be implemented efficiently by traversing only a subset of the map. A btreemap iterator’s filter method cannot be implemented that way because its semantics do not require it to define an ordering.

jneem · September 26, 2018, 2:26pm

Sorry, I forgot to include the bit about time complexity in the original post, so I’ve added it.

g2p · September 26, 2018, 3:52pm

This will be slower than the current search methods (because there needs to be a binary search within a single tree node, meaning more comparisons), but not algorithmically slower since nodes are currently small (B=6 as I recall).

jneem · September 26, 2018, 4:10pm

I don’t understand why it would need to be any slower than the current range function; can you elaborate? Whatever the search strategy is for a node, can’t the same strategy be used for both range and range_by?

g2p · September 26, 2018, 4:27pm

The strategy within a node will be different. When you know the boundary key, you can just keep track of the next highest element and do a linear scan of the node, when you don’t know it you have to do a bisection, and the comparison function gets called a few times with some unpredictable branching.

jneem · September 26, 2018, 5:12pm

But you can also do a linear scan with the comparison function, no?

g2p · September 26, 2018, 5:32pm

You would just get a bunch of bools, nothing sufficiently useful to know which node to visit next.

There’s a reason the higher-level interface you propose is mostly used for binary trees. It has a performance cost on most cache-friendly structures.

jneem · September 26, 2018, 7:55pm

I asked for a function returning std::cmp::Ordering, so I’ll get a bunch of those instead of a bunch of bools. And unless I’m missing something, I do think it’s sufficient to know which node to visit next. For example, suppose I’m looking for the lower bound on my range. I do a linear search through the current node, evaluating my comparison function on every element. Then I find the first element that returned Ordering::Equal, and I recurse on the subtree to the left of that element. (I could have done a binary search instead of a linear search, but as you pointed out it would be less cache-friendly.)

Really, I think that the implementation and performance characteristics of my proposed function would be identical to those of the existing range function. I might have a try at writing an implementation on the weekend, if that would help convince you…

system · March 25, 2019, 8:30am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Interface improvements for BTreeMap libs	8	1780	October 16, 2022
Suggestion: BTreeMap/BTreeSet O(log(n)) n-th element access libs	4	1883	May 26, 2019
BTree{Map,Set} range removal libs	2	1005	April 1, 2022
Pre-RFC: `{BTreeMap, BTreeSet}::drain` with a range libs	17	1685	April 8, 2022
Comparator for BTree(Map\|Set)s libs	9	1444	October 16, 2022

BTreeMap: searching with a predicate?

Related Topics