Why not use d-ary heap inside rather than binary heap

According to some experiments, d-ary heap (d>2, typically d=4) generally performs better than binary heap.

They have the same compact memory layout as binary heap. I don't see any drawback compared to binary heap. Plus, Rust has already chosen b-tree over red black tree for better cache performance. It's not weird to make the same decision once again.

3 Likes

Where do you believe a binary heap is being used?

I mean std::collections::BinaryHeap. Sorry for ambiguity.

Honestly, I didn't even know it existed, and I know the standard library quite well.

Sounds like you'd be interested in

5 Likes

Thanks. In the comments of that blog post, the author actually mentioned quaternary heap.

Yes, a quaternary heap will beat this min-max heap. You can already try it by using the d-ary heap from my code and setting D=4.

Given that min-max heap is more powerful than binary heap but less performant than quaternary heap, I prefer exposing it as another struct to release its full power rather than using it as the internal implementation of std::collections::BinaryHeap.

BinaryHeap example should show how to make a min-heap. #58174 talks about how the current one being a max-heap isn't inherently obvious.

Actually that makes sense from another perspective. If you heap-sort an max-heap, you get an ascending array, which is the default, thus max-heap is the default. As far as I know, C++'s std::priority_queue is also a max-heap by default. Off-topic anyway.

Is there anything in the interface of BinaryHeap that actually requires it being implemented by a binary heap as opposed to a k-ary heap for higher k? The name is poorly chosen.

That's where I ended up in the conversation too.

I'd opened a PR to stabilize BinaryHeap::into_iter_sorted, but ended up withdrawing that PR as the conversation in it made me think the API is too awkward to be worth it: Stabilize feature(binary_heap_into_iter_sorted) by scottmcm · Pull Request #76234 · rust-lang/rust · GitHub

With a new type, it could have pop_min and pop_max instead of just pop, and it could only offer sorted iterators (sending you through into_vec or whatever if people want other unsorted access to the items).

FWIW, I think that it would be worth it to (soft) deprecate BinaryHeap, and introduce a MinHeap with faster and less ambiguous API. Deprecating the whole data structure is a tall order, but it does seem that BinaryHeap is uniquely unfortunate:

  • The API is inconvenient, you almost always want a minheap in practice (the only common exception is sorting, but iirc into_vec().sort() is faster than into_sorted_vec() anyway).
  • The API is confusing. Nothing in the API tells that this is a max heap
  • The API is slow. I am not entirely sure, but I think we guarantee somewhere in the API that it’s a binary heap.

I don’t think we should do MinMax heap. The motivation for that was that it’s faster, but it’s faster not because it is MinMax, but because it’s n-ary heap. So, just n-ary MinHeap would be the fastest of all. Or, rather, we could add MinMax as I imagine sometimes you do want min-max property, but we should do that after we fix the common variation first.

All that being said, it feels like we don’t have too much spare capacity when it comes to the design of stdlib, and this heap of problems doesn’t feel particularly high priority.

9 Likes

Maybe just create MinHeap and MaxHeap and make BinaryHeap be a type alias to MaxHeap. Do not deprecate it, do not change anything from the API, but change its internals to not be a binary heap anymore.

I just read through the documentation; aside from the type name, the most explicit mention of the internal algorithm occurs in the preamble (emphasis mine):

A priority queue implemented with a binary heap.

This will be a max-heap.

It is a logic error for an item to be modified in such a way that the item’s ordering relative to any other item, as determined by the Ord trait, changes while it is in the heap. This is normally only possible through interior mutability, global state, I/O, or unsafe code. The behavior resulting from such a logic error is not specified, but will be encapsulated to the BinaryHeap that observed the logic error and not result in undefined behavior. This could include panics, incorrect results, aborts, memory leaks, and non-termination.

As long as no elements change their relative order while being in the heap as described above, the API of BinaryHeap guarantees that the heap invariant¹ remains intact i.e. its methods all behave as documented. For example if a method is documented as iterating in sorted order, that’s guaranteed to work as long as elements in the heap have not changed order, even in the presence of closures getting unwinded out of, iterators getting leaked, and similar foolishness.

Âą "The heap invariant" is not further defined anywhere on this page, either explicitly or by reference.


  • Documentation of the methods appear to use the phrases "the binary heap" and "the BinaryHeap" interchangeably to refer to self, but go into no further details about the internals.
  • Some time complexities are guaranteed by the documentation— I believe these would all be met or exceeded by a d-ary heap:
    • Push: O(1) (amortized)
    • Pop: O(log(n))
    • Peek: O(1)
  • The top of the heap is consistently referred to as "the greatest item"
  • Methods that access elements in storage order consistently refer to it as an "arbitrary order"
  • into_sorted_vec explicitly calls it "sorted (ascending) order"
  • The documentation for some of the experimental methods refer to "heap order," but this is never explicitly defined and not used outside experimental documentation.

As we have const generics now, it would be possible to define MaxHeap<D:usize = ??>and then alias BinaryHeap to MaxHeap<2>, which should even keep all of the "arbitrary order"s the same. But that risks locking std into a particular algorithm again, albeit a better one than before.

8 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.