Proposal: BinaryHeap::remove

A by-index removal method for BinaryHeap is mostly for the case where the bulk of interactions are through push/pop, (ideally) amortizing the cost of search and removal of a particular element.

You might want to look at BinaryHeap::retain. It's pretty easy to use it to do what you want.

Retain has very different asymptotic costs. Removal of a single element (i.e. decrease-key + delete-min) shouldn't take more than log n operations to fixup in a heap, whereas retain must do a full n rebuild in the worst case (not to mention the n traversal that preceeds). You might get away with llvm trickery to unroll and remove the traversal but hardly the rebuild stage.

4 Likes

Sure, but that is much more powerful than what an index-based remove would require. Also, it would have asymptotically worse performance.

EDIT: HeroicKatora beat me to the second point.

How are you finding the index of the element to remove in the first place? AFAIK, you still have to do a linear search to find the index before remove can be used. That's the reason why I suggested retain.

1 Like

Finding the index would require linear search, yes. However, if paired with removal, that could be a partial scan followed by a partial down-sift. retain would require full traversal* and a tail rebuild.

*Unless LLVM optimizes out a full traversal, which I have no idea the likelihood of.

OK, so in the worst case you would have O(n) linear search to find the element to remove, followed by down-sifting. So, from a Big-O point of view retain and remove are the same, but you're hoping that from a practical point of view that remove will get lucky more often than retain does.

So, have you implemented this and tested it on real data? I'm asking because this feels like something that seems like a good idea, but which doesn't pan out in real life.

For the record, I'm not against the idea, I just want to make sure it's worth the effort and cost of stabilization. I've wanted something like this to deal with priority queues for a while, but for speed reasons I went with a completely different approach.

Is your assertion that most binary heaps are degenerate? That doesn't match my intuition or experience at all. It can happen, certainly, but the vast majority of binary heaps I've ever dealt with were reasonably balanced, and indeed remove would have lower complexity than retain.

IMO just send a PR to add this function as an unstable function. Then after awhile see if it can/should be stabilized.

We don't really treat BinaryHeap as indexable at all right now, so that would be a bit of a paradigm shift to introduce an index-based method.

1 Like

What is “by-index removal” even supposed to mean? I feel like different people might be talking about different things here.

By-index removal: BinaryHeap::remove(&mut self, i: usize), i.e. swap the element at index i and the smallest element in the heap, restore the heap from 0..heap.len() - 1, and return the last element.

This test program is about three times faster with scan + remove as opposed to retain:

fn main() {
    let prime = 104729;
    let gen = 12;
    let mut elt = gen;

    let mut heap = BinaryHeap::<i32>::new();
    for i in 1..prime {
        heap.push(elt);
        elt *= gen;
        elt %= prime;
    }

    let start = Instant::now();
    for i in (prime / 4)..(3 * prime / 4) {
        heap.retain(|&x| x != i);
        // let mut j = 0;
        // while heap.as_slice()[j] != i { j += 1; }
        // heap.remove(j);
    }
    let stop = Instant::now();
    println!("{}", stop.duration_since(start).as_millis());
}

Could you post a snippet of the program you are trying to use this for? I'm assuming you have an operation that uses as_slice or iter to find an index. Both of these methods claim to have arbitrary order, but obviously it is heapy order underlying. So you could find one or more indexes and then .remove them (largest to smallest!) and this would usually be faster than retain.

Can I recommend a BTreeMap BTreeSet for this usecase?

In broad terms, the program is something that would be dominated by push/pop, but with occasional search-and-delete. The balance would generally be such that it would be more expensive to maintain order at all times. As a toy example:

fn main() {
    let prime = 104729;
    let gen = 12;
    let mut elt = gen;

    // let mut xs = BinaryHeap::<i32>::new();
    let mut xs = Vec::<i32>::new();
    let start = Instant::now();
    for i in 1..prime {
        // xs.push(elt);
        let index = xs.binary_search(&elt).unwrap_err();
        xs.insert(index, elt);
        elt *= gen;
        elt %= prime;
    }

    for i in (prime / 2 - 1000)..(prime / 2 + 1000) {
        // let mut j = 0;
        // while xs.as_slice()[j] != i { j += 1; }
        // xs.remove(j);
        xs.remove(xs.binary_search(&i).unwrap());
    }
    let stop = Instant::now();
    println!("{}", stop.duration_since(start).as_millis());
}

On my computer, this is 50ms+-5ms for BinaryHeap, 160ms+-10ms for Vec in release mode.

I should add that the main goal of using BinaryHeap as opposed to BTree{Map,Set} was to have a contiguous memory layout. However, BTreeMap is flat-out better in general, so I am on the side of just tabling this unless there is anyone strongly in favor of seeing it added.

2 Likes

Or you could use a BTreeSet? How does that performance compare for you? Also, toy examples kinda suck; there’s so many ways to “improve” this code (up to the point of making it do nothing at all) and it’s impossible to deduce almost anything about actual, practical use-cases[1] from analyzing this example.

fn main() {
    let prime = 104729;
    let gen = 12;
    let mut elt = gen;

    let mut xs = BTreeSet::new();
    let start = Instant::now();
    for i in 1..prime {
        xs.insert(elt);
        elt *= gen;
        elt %= prime;
    }

    for i in (prime / 2 - 1000)..(prime / 2 + 1000) {
        xs.remove(&i);
    }
    let stop = Instant::now();
    println!("{}", stop.duration_since(start).as_millis());
}

  1. which might have different access patterns, a different balance of push/pop vs. remove, etc… (this toy example doesn’t have any pop at all) ↩︎

2 Likes

This is just:

let range = (prime / 4)..(3 * prime / 4);
heap.retain(|x| range.contains(x));

which is better than both solutions since it's O(n) rather than O(n²)

Couldn't something like heap.remove_value_range(start..end) exploit the heap-layout to excise the range in something like O(k + r) time with k = tail elements that need to be moved, r = number of elements covered by range?

That is probably what I will go for.

Sorry about that, I haphazardly threw something together to have at least some benchmarkable code.

Individual removals are more representative of the scenario I am working with, so I used the example program as a crude approximation. Again, my apologies about the uninformative example; the code I am working with is not easy to get free-standing in a reasonable amount of time.

No, I'm not making that assertion. What I meant was that once we're in the same Big-O complexity range we need to do more careful analysis. @BadAtUsernames gave a toy example (that @SkiFire13 promptly improved) that gives a flavor of what is intended, but before I went with any solution that was time sensitive like this, I'd spend a lot of time profiling the code.

Also, remember that I was arguing from a Big-O point of view, which is always maximally pessimistic, and only applies as the container's size gets large. So while in most cases the trees are reasonably balanced, that isn't the correct view with regards to Big-O.

Finally, a different (much more memory intensive) algorithm that is Big-O faster is to store the elements of interest in something like Arc<T>, and you use two collections concurrently, one a dict[1] and the other is the priority queue. When you pop off the queue, you check to see if the element is in the dict; if it is it's still valid to execute, if it isn't, then it was canceled, and you can get rid of it. The priority queue is still running in O(lg n) time, and the dict makes lookups and deletions fast. You might have to make some adjustments if there can be more than one equivalent T in the dict at a time, but that's pretty easy to deal with[2].

EDIT

Grammar and spelling


  1. You could use a binary tree or a hash map for the dictionary, so fill in whatever you want for look up times. ↩︎

  2. Yes, I know that the standard method of dealing with it is to have a dict of lists, which means that in the worst case you've now moved your linear scan over to the dict, but I'm assuming you've got something smarter, like a dict of dicts, where the secondary level uses the address of the raw objects, or something else smartDetails are left as an exercise for the reader ↩︎

1 Like

Also look at the solution in the priority_queue - Rust crate.

1 Like

As long as we're throwing datastructures at this another reasonable option might be a bitvec.