Why is std::collections::BinaryHeap a max-heap?

This is mostly curiosity, but is picking max over min arbitrary, or is there a good reason to do so? In particular, Python's "heap" and Java's queue are min-heaps.

Reasons why min is natural:

  • sorted array is a min-heap
  • Dijkstra needs a min-heap
  • set of timers needs a min-heap

Reasons why max is natural:

  • it is useful for heap-sort
  • user-specified priorities need max heap

Additionally, while solving today's advent of code, I had a very fun, but frustrating bug, where I implemented Dijkstra, assuming the the heap is max heap. Because the code itself looked perfectly fine, debugging this was not trivial. I wonder if we could/should make it more obvious at the call-site that the heap is max. A straw man would be adding pop_max, peek_max, peek_max_mut methods, and deprecating unqualified versions.

I've actually wondered the same thing, and kind of wished that we could go back and change it to be a minheap. My issue with it being a maxheap is that it is different from what you'd get if you sorted a vector of the same values. I think I've seen other examples in the standard library where sorting something (or even iterating over something that is sorted) naturally implies starting at the minimum and monotonically increasing in value, but that isn't always the case.

In my personal opinion, this is one of the things that rust got wrong. All things that are sorted should have picked a direction (monotonically increasing or monotonically decreasing) and stuck with it. If all collections, iterators, (others?) used the concept of the same direction, then we wouldn't be surprised the way that @matklad was (BTW, you're not the only one that has gotten bitten by that 'feature'). I know that it is too late to fix this; rust is already past 1.0, and it would break guarantees to change the code. BUT, if I were planning out rust 2.0, I'd put this on the list of breaking changes to add in.

Could just add a MinBinaryHeap.

There is the binary-heap-plus crate, and see its references back to discussion in this forum.

I think, it is connected to heapsort. Let me remind how heapsort works:

  1. Heapify input array inplace.
  2. N times: 2.1 swap last heap (i.e. arr[n-i-1] item on i-th iteration) with first. 2.2. Decrement heap len by 1 2.3 Sift arr[0] down.
  3. Done

So, we maintain both heap and some answer suffix on one vector in same time. And since we only extract elements from heap tail, it remains correct. But if you pop_front() heap, it can easily break (e.g. 3 1 2 is correct maxheap, but 1 2 is not). So, we need maxheap to do heapsort.

BinaryHeap<std::cmp::Reverse<T>> is almost MinBinaryHeap<T>: https://doc.rust-lang.org/std/collections/struct.BinaryHeap.html#min-heap

2 Likes

Except that you always have to have that oh-so-slightly annoying Reverse in there (I always forget to add the .0 to pull out the object I actually want). I know that this isn't a real issue, but I stub my toe on it often enough that it leaves me muttering on a regular basis.

I don’t have an answer to that. However at this point it would be historical curiosity at most, we can’t change it now.

If you feel this it worth spending time and energy on this, I would suggest:

  • Make a library that provides a min-heap with an API identical to std::collections::BinaryHeap. (I think this should be possible by wrapping BinaryHeap<Reverse<T>>)
  • Put it up on crates.io
  • Gain some usage experience
  • Then propose adding it to std::collections

If the API ends up indeed the same then there isn’t much of a design discussion to be had so an RFC may be unnecessary and that last step could be in the form an #[unstable] implementation PR.

2 Likes

Honestly, I don't feel its worth the effort. This is just enough of an annoyance that I stub my toe on it regularly, but not quite enough for me to justify the effort of making a crate for it.