Append Vec to BinaryHeap?

AFAIK, there only exists a function that append BinaryHeap to a BinaryHeap, but its implememtation is just append a Vec to BinaryHeap. If someone really want to append a vec to the BinaryHeap, the conversion from Vec to BinaryHeap cost some time and it is a totally waste since the source code is

    #[stable(feature = "binary_heap_append", since = "1.11.0")]
    pub fn append(&mut self, other: &mut Self) {
        if self.len() < other.len() {
            swap(self, other);
        }

        let start = self.data.len();

        self.data.append(&mut other.data);//regard `&mut other.data` as a `Vec`, rather than a `BinaryHeap`.

        self.rebuild_tail(start);// assume self.data[0..start] is a vaild heap.
    }

things might be better changing the function to

    #[stable(feature = "binary_heap_append", since = "1.11.0")]
    pub fn append(&mut self, other: &mut Self) {
        if self.len() < other.len() {
            swap(self, other);
        }
        self.append_vec(&mut other.data)
    }
    pub fn append_vec(&mut self, vec: &mut Vec<T>){ // not tested, maybe something simillar like what I've shown below.
    // T is defined in `impl <T:Ord> for BinaryHeap`, no need to re-define it.
        let start = self.data.len();
        self.data.append(vec);
        self.rebuild_tail(start);
    }

Is it worth a PR or just a Pre-RFC?

If you find it could be an acceptable choice, please help me submit the PR and leave a message here. Due to some reason, I can hardly connect to github. I would be appreciate that you help me submit this possible PR.

Anyway, thank you for reading:)

You’ll usually use the .extend(…) method to append from a different collection. In order to do it with a &mut Vec<T> and keep it’s allocation and capacity intact, you can use Vec::drain. I.e. your proposed heap.append_vec(&mut vector) would behave similar to or the same as heap.extend(vector.drain(..)).


use std::collections::BinaryHeap;

pub fn append_binary_heap_with_vec<T: Ord>(heap: &mut BinaryHeap<T>, vector: &mut Vec<T>) {
    heap.extend(vector.drain(..));
}

(in the playground)

2 Likes

However looks like extending a BinaryHeap with a Drain falls back to just pushing each element, which may be much slower than rebuilding the heap if the Drain contained a lot of elements.

Good catch. Maybe we should simply change

    fn extend_desugared<I: IntoIterator<Item = T>>(&mut self, iter: I) {
        let iterator = iter.into_iter();
        let (lower, _) = iterator.size_hint();

        self.reserve(lower);

        iterator.for_each(move |elem| self.push(elem));
    }

to look more like

    pub fn append(&mut self, other: &mut Self) {
        if self.len() < other.len() {
            swap(self, other);
        }

        let start = self.data.len();

        self.data.append(&mut other.data);

        self.rebuild_tail(start);
    }

i.e. something like:

  • reserve lower
  • for_each: push to the underlying vec
  • finally, rebuild_tail

Regarding panic-safety, we don’t need leave the heap in a valid state (w.r.t. the ordering) in case of a panic.

Actually… we could just use extend on the Vector, right?

    fn extend_desugared<I: IntoIterator<Item = T>>(&mut self, iter: I) {
        let start = self.data.len();

        self.data.extend(iter);

        self.rebuild_tail(start);
    }
2 Likes

There's also the possibility of "just" specializing to do the efficient thing here.

2 Likes

That could be a really cool choice

..Since I finally figure out that, for std::Vec, .extend seems as fast as .copy_from_slice.


This question could be mark as solved if a PR is submitted to github.

But... I just found another problem: why we couldn't operate private fields or methods?

The first solution for me is, try unsafe{BinaryHeap.data.extend(iter)}, I know operate private field could be very unsafe, but I know what I am doing..

Why couldn't we operate private field even with the unsafe block?

access to private field, at least, is safer than std::mem::transmute, why we have the latter one?

BinaryHeap::append already has heuristics for how to do that; updating extend to do the same seems perfectly reasonable. (Though generally-descending vs unknown-order can affect things slightly.)

Because it's important to let libraries change internal implementation details without breaking compilation. It's unacceptable to say that you can't even rename private fields without a major version bump.

3 Likes

At this point specializing would just enable a more limited version of the swap trick used in append (but just for copying less memory, it wouldn't work to reduce the number of comparisons) and I'm not sure how important that may be.

1 Like

I found a solution

// RebuildTail is a struct that ensure `rebuild_tail` in `fn extend_desugared`
// is executed
struct RebuildTail<'a, T: Ord>(&'a mut BinaryHeap<T>, usize);
impl<'a, T: Ord> Drop for RebuildTail<'a, T> {
    #[inline]
    fn drop(&mut self) {
        self.0.rebuild_tail(self.1);
    }
}

impl<T: Ord> BinaryHeap<T> {
    fn extend_desugared<I: IntoIterator<Item = T>>(&mut self, iter: I) {
        let start=self.data.len();
        let extend=RebuildTail(&mut self, start);
        extend.0.data.extend(iter);
        // `self.rebuild_tail(start)` is done by `drop(extend)`
    }
}

even the iterator panic, the rebuild_tail is executed.

Now I am fighting with github... It is not easy for me to connect the github...

The PR is here

But we don't have the problem. The BinaryHeap<T>'s document says:

The behavior resulting from such a logic error is not specified, but will not result in undefined behavior. This could include panics, incorrect results, aborts, memory leaks, and non-termination.

Keeping heap ordering after the panic is not needed nor expected.

2 Likes

Doesn't that "panic" refer to what the BinaryHeap could do if you modify an element such that its ordering compared to the other elements changes. Although it's true that currently any BinaryHeap's method doesn't keep the heap ordering if the comparison function panics (and this includes the proposed solution)

The question is not just what belongs to the BinaryHeap.

if self.data.extend(iter) panics (just the iter panics), the heap should keep its order, but in the previous implementation of @steffahn self.rebuild_tail(start); could not be executed.

That is what I want to provide (although I wasn't realized that the compare function could also panic then..)

In Rust, mutable references are generally not considered unwind safe which means that using them anyway after catching a panic is always a logic error (unless the type in question specifies that it isn’t).

Also, the Ord function panicking with append, or the Ord function or closure panicking in retain (the other usage of rebuild_tail currently) leads to a broken binary heap as-well.