Pre-RFC: BinaryHeap Flexibility

sekineh · May 9, 2018, 12:15pm

Update (2018-05-17)

I published the crate binary-heap-plus for those who need custom ordered heap NOW. crates.io: Rust Package Registry
FnMut based solution might be better than Compare<T> for RFC inclusion, but it requires some unstable features to be written nicely.
I won't (can't) write the FnMut based RFC for the time being because I'm not familiar with the related unstable features.

Thanks for all suggestions!

Pre-RFC

Feature Name: binary_heap_flexivility
Start Date: (fill me in with today's date, YYYY-MM-DD)
RFC PR: (leave this empty)
Rust Issue: (leave this empty)

Summary

Extend BinaryHeap such that it supports max/min/custom heap in a backward-compatible way.

Motivation

The current BinaryHeap implementation only supports max-heap. It is useful if heap could be used in arbitrary orders.

In C++ version of the same library, priority_queue can pop elements in arbitrary orders. The following C++ codes are excerpt from std::priority_queue - cppreference.com

[C++] Max heap

    std::priority_queue<int> q;
 
    for(int n : {1,8,5,6,3,4,0,9,7,2})
        q.push(n);
 
    print_queue(q); // prints: 9 8 7 6 5 4 3 2 1 0

[C++] Min heap

    std::priority_queue<int, std::vector<int>, std::greater<int> > q2;
 
    for(int n : {1,8,5,6,3,4,0,9,7,2})
        q2.push(n);
 
    print_queue(q2); // prints: 0 1 2 3 4 5 6 7 8 9

[C++] Custom heap using lambda

    // Using lambda to compare elements.
    auto cmp = [](int left, int right) { return (left ^ 1) < (right ^ 1);};
    std::priority_queue<int, std::vector<int>, decltype(cmp)> q3(cmp);
 
    for(int n : {1,8,5,6,3,4,0,9,7,2})
        q3.push(n);
 
    print_queue(q3); // prints: 8 9 6 7 4 5 2 3 0 1

Guide-level explanation

[Rust] Max heap (Backward Compatible)

Existing code should work as max heap like before.

    let heap = BinaryHeap::from(vec![1, 8, 5, 6, 3, 4, 0, 9, 7, 2]); // Max heap
    assert_eq!(
        itertools::unfold(heap, |heap| heap.pop()).collect::<Vec<_>>(),
        vec![9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
    );

[Rust] Max heap

You can explicitly declare it as a max heap.

    let heap: BinaryHeap<_, MaxComparator> = BinaryHeap::from(vec![1, 8, 5, 6, 3, 4, 0, 9, 7, 2]); // Max heap
    assert_eq!(
        itertools::unfold(heap, |heap| heap.pop()).collect::<Vec<_>>(),
        vec![9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
    );

[Rust] Min heap

You can explicitly declare it as a min heap.

    let heap: BinaryHeap<_, MinComparator> = BinaryHeap::from(vec![1, 8, 5, 6, 3, 4, 0, 9, 7, 2]); // Min heap
    assert_eq!(
        itertools::unfold(heap, |heap| heap.pop()).collect::<Vec<_>>(),
        vec![0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    );

[Rust] Custom heap

You can declare custom heap.

    let heap: BinaryHeap<_, CustomComparator> = BinaryHeap::from(vec![1, 8, 5, 6, 3, 4, 0, 9, 7, 2]);
    assert_eq!(
        itertools::unfold(heap, |heap| heap.pop()).collect::<Vec<_>>(),
        vec![8, 9, 6, 7, 4, 5, 2, 3, 0, 1]
    );

In order to implement CustomComparator, you need to read the reference (later in this document).

It requires more setup (defining CustomComparator) compared to C++ version, but the Rust version is potentially faster because compare function could be statically resolved and therefore inlined.

[Edit]

[/Edit]

Reference-level explanation

The `Compare<T>` trait

Compare<T> is a light-weight version of Ord that only carries one compare() function.

The difference than Ord is:

Much simpler:
- To implement Ord, you need to implement all of the family traits (Ord, PartialOrd, Eq and PartialEq).
- Compare<T> requires only one function.
- Compare<T> is nothing to do with operator overloading.
Support multiple ordering:
- Ord can only define one ordering per type T.
- Compare<T> can define arbitrary ordering for type T.

We'll provide the default Compare<T> for T: Ord in std:

    pub struct MaxComparator;
    impl<T: Ord> Compare<T> for MaxComparator {
        fn compare(a: &T, b: &T) -> Ordering {
            a.cmp(&b)
        }
    }

    pub struct MinComparator;
    impl<T: Ord> Compare<T> for MinComparator {
        fn compare(a: &T, b: &T) -> Ordering {
            b.cmp(&a)
        }
    }

In the user code, you'd implement Compare<T> for CustomComparator like this:

    pub struct CustomComparator;
    impl<T: Ord> Compare<T> for CustomComparator {
        fn compare(left: &T, right: &T) -> Ordering {
            (left ^ 1).cmp(right ^ 1)
        }
    }

Change to `BinaryHeap<T>`

The current type signature is:

    pub struct BinaryHeap<T> { /* fields omitted */ }

We'll change this to:

    pub struct BinaryHeap<T, C = MaxComparator> where C: Compare<T> { /* fields omitted */ }

Drawbacks

It does not support the following convenient method:

    let mut heap = BinaryHeap::with_cmp(|&a, &b| (a ^ 1).cmp(b ^ 1)); // custom heap creation using lambda

Rationale and alternatives

[Alt1] Use type alias

My experimental crate used slightly different design and achieved good backward compatibility. After implemented it, I read the API revolution RFC (RFC 1105) and concluded using the default type parameter is a better option.

    pub struct BinaryHeapPlus<T, C> where C: Compare<T>;

    /// A backward-compatibility type that can be used as a max heap.
    pub type BinaryHeap<T> = BinaryHeapPlus<T, MaxComparator>;

[Edit] simplified to illustrate more clearly. [/Edit]

See GitHub - sekineh/binary-heap-plus-rs: Enhancement over Rust's `std::collections::BinaryHeap`. Supports other than max heap. for the details.

[Alt2] Include function pointer as a struct member

This would enable BinaryHeap::with_cmp(|&a, &b| expr) use case, but we need to give up the optimization opportunities such as inlining.

It's fine in most other general-purpose languages, but Rust favors zero-cost abstraction. I'm not sure if it is acceptable in embedded, for example.

[Edit]

The above is wrong.

Using F: Fn trait, the implementation could be monomorphized into concrete f.

[/Edit]

[Alt3] Do nothing

Use new type pattern and implement all of Ord trait families. The code below is modified from a real-world example:

/// QueryOnNode orderd by frequency
#[derive(Debug, Clone, Eq)]
pub struct NodeRangeByFrequency(QueryOnNode);

impl Ord for NodeRangeByFrequency {
    fn cmp(&self, other: &Self) -> Ordering {
        // implement this
    }
}

impl PartialOrd for NodeRangeByFrequency {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        Some(self.cmp(&other)) // boilerplate
    }
}

impl PartialEq for NodeRangeByFrequency {
    fn eq(&self, other: &Self) -> bool {
        self.cmp(other) == Ordering::Equal // boilerplate
    }
}

Prior art

C++ priority_queue
- URL: std::priority_queue - cppreference.com
Rust: heapless::BinaryHeap
- URL: https://japaric.github.io/heapless/heapless/binary_heap/index.html
- supports Max and Min heap depending on type args.
Rust: BinaryHeapPlus
- URL: GitHub - sekineh/binary-heap-plus-rs: Enhancement over Rust's `std::collections::BinaryHeap`. Supports other than max heap.
- Imcomplete prototype for this RFC

Unresolved questions

Is it really backward compatible?

RFC 1105 described that "adding defaulted type parameters" is a minor change, but is it always possible to implement in a compatible way?

github.com

arcnmx/rust-rfcs/blob/master/text/1105-api-evolution.md

- Feature Name: not applicable
- Start Date: 2015-05-04
- RFC PR: [rust-lang/rfcs#1105](https://github.com/rust-lang/rfcs/pull/1105)
- Rust Issue: N/A

# Summary

This RFC proposes a comprehensive set of guidelines for which changes to
*stable* APIs are considered breaking from a semver perspective, and which are
not.  These guidelines are intended for both the standard library and for the
crates.io ecosystem.

This does *not* mean that the standard library should be completely free to make
non-semver-breaking changes; there are sometimes still risks of ecosystem pain
that need to be taken into account. Rather, this RFC makes explicit an initial
set of changes that absolutely *cannot* be made without a semver bump.

Along the way, it also discusses some interactions with potential language
features that can help mitigate pain for non-breaking changes.

This file has been truncated. show original

// MINOR CHANGE

// Before
struct Foo { .. }

// After
struct Foo<A = u8> { .. }

In my experiment, BinaryHeap::from() was broken when I tried to follow this pattern.

https://github.com/sekineh/binary-heap-plus-rs/tree/eliminate_binaryheapplus/src produces the following error:

sekineh@MX3-sekineh MINGW64 /c/workmx3/01WaveletMatrix/binary-heap-plus (eliminate_binaryheapplus)
$ cargo test
   Compiling binary-heap-plus v0.1.0 (file:///C:/workmx3/01WaveletMatrix/binary-heap-plus)
error[E0282]: type annotations needed
  --> src\lib.rs:25:20
   |
25 |         let heap = BinaryHeap::from(data);
   |             ----   ^^^^^^^^^^^^^^^^ cannot infer type for `C`
   |             |
   |             consider giving `heap` a type

error: aborting due to previous error

For more information about this error, try `rustc --explain E0282`.

DDOtten · May 9, 2018, 2:43pm

How about something like this:

struct BinaryHeap<T, F>
where
    F: Fn(&T, &T) -> Ordering
{
    data: Vec<T>,
    f: F,
}

impl<T, F> BinaryHeap<T, F>
where
    T: Ord,
    F: Fn(&T, &T) -> Ordering,
{
    #[deprecated(since = "tbd", note = "please use `new_max` instead")]
    fn new() -> BinaryHeap<T, fn(&T, &T) -> Ordering> {
        BinaryHeap::new_max()
    }

    fn new_max() -> BinaryHeap<T, fn(&T, &T) -> Ordering> {
        BinaryHeap {
            data: Vec::new(),
            f: T::cmp,
        }
    }

    fn new_min() -> BinaryHeap<T, fn(&T, &T) -> Ordering> {
        BinaryHeap {
            data: Vec::new(),
            f: |first, second| { T::cmp(second, first) },
        }
    }
}

sekineh · May 9, 2018, 3:11pm

I presume f is like a function pointer, which involves indirect call. I do think this overhead is small enough in most cases. Do you think so?
It seems to me that type parameter F could be deleted because f’s type is always fn(&T, &T) -> Ordering. Am I right?

steven099 · May 9, 2018, 4:33pm

You could actually do it with static dispatch if you keep the type parameter. It just needs a fair bit of boilerplate to create named Fn* implementations (playground). In general, you could use impl Fn(&T, &T) -> Ordering, but it doesn’t work for inherent constructors (like new, new_max, …), since there’s no way to make the type of Self depend on an unnamed closure type.

CAD97 · May 9, 2018, 4:51pm

What you’re missing is the difference between impl Fn(...) (an instance of the Fn trait) and fn(...).

The latter is a function pointer: a pointer sized struct that virtually dispatches to the unknown variable pointee function.

The former is a monomorphized type. The Fn trait is implemented on a unique type for every[^1] function and closure in the compile, and is a zero-sized[^2] struct that statically dispatches calls. This means that calls are able to be inlined and other monomorphized optimizations.

Effectively, your for<T> Compare<T> is equivalent to for<T> Fn(&T, &T)->Ordering except that it’s not allowed to capture (immutable) state as a closure, since it doesn’t take self, and it’s not auto implemented for closures.

[^1]: not exactly, there’s reasons that the Fn trait won’t be upheald, such as extern or unsafe fn, or FnMut/FnOnce's use cases
[^2]: assuming no captures from the environment at the definition site for closures

steven099 · May 9, 2018, 5:16pm

I was thinking this at first too, but then I noticed that @DDOtten did use fn in the return types from the constructor methods. I'm guessing that was just because using unboxed closures here is a bit sticky, which is what I was addressing in my previous comment.

scottmcm · May 9, 2018, 5:40pm

I definitely prefer the F: Fn version to the custom trait, since trait Comparer<T> = Fn<&T, &T, Output = Ordering>. (See also what sort_by and friends take. Once trait aliases land it might be nice to have a few for things like this and a Predicate to make intent clear.) That might need abstract type before it can be implemented nicely, though.

Another option would be something like struct BinaryHeap<T, K = T> where T: Into<K>, K: Ord + Into<T>, so a max-heap is BinaryHeap<T> and a min-heap is BinaryHeap<T, cmp::Reverse<T>>. That has the potentially-interesting behaviour of allowing the caching comparison keys, if desired.

This isn't true; the C++ one is also a statically-dispatched function object, probably stored in something like a compressed_pair to make up for C++ not having ZSTs.

DDOtten · May 9, 2018, 8:29pm

We could use impl FnMut like I have done here. This would also be a backwards compatible change. Here we don’t explicitly define a function to create a min heap like we don’t have an explicit function to sort in reverse order. Instead we allow this functionality by having a _by function where you can give your own compare function.

However when using a clossure I get an error that an bound lifetime parameter was expected while a concrete lifetime was found.

I would also agree with your solution @scottmcm however I think it would be a bit less easy to work with. I was wondering however if we need the T: Into<K>, I think K: Into<T> would be the only conversion needed.

steven099 · May 9, 2018, 8:42pm

Have you tried calling BinaryHeap::new? You get type inference issues (simplified example). It seems to be trying to infer what _ is in BinaryHeap::<T, _>::new and can’t because it’s unconstrained, since new doesn’t return Self. That’s why I explicitly implemented Fn*. Also, you need a named type to be able to set a default. Otherwise it’s a breaking change.

DDOtten · May 9, 2018, 8:56pm

Yes it only compiles when using something like BinaryHeap::<usize, fn(&usize, &usize) -> Ordering>::new() to give it a concrete type. I don’t know how to work around this except for a hacky default C = fn(&T, &T) -> Ordering which would allow the slightly better BinaryHeap::<usize>::new().

steven099 · May 9, 2018, 9:13pm

That was the point of the playground I posted here. I had to create named types and manually implement the Fn* traits for them. You need this anyways in order to set the type parameter default. Maybe at some point we’ll have abstract types and this won’t be necessary.

ExpHP · May 9, 2018, 9:55pm

Seriously, I don’t think there’s anything wrong with having Compare<T> (except that it should take a &mut self argument).

Here’s wonderful things you can do today in Rust (even in 1.25) with a Compare trait: (playground). Waiting for abstract types buys you nothing of value, as far as I can tell.

fn f(x: &usize, y: &usize) -> Ordering {
    usize::cmp(&x, &y)
}

let _heap = BinaryHeap::<i32>::new();

let _heap = BinaryHeap::new_by(f);

let _heap = BinaryHeap::new_by(|x: &_, y: &_| usize::cmp(x, y));

let _heap = BinaryHeap::new_by_key(|x: &i32| -x);

// make a heap with context
fn expensive_function(x: i32) -> i32 { -x }
let memoized_function = {
    let mut memo = HashMap::new();
    move |&x: &_| {
        *memo.entry(x).or_insert_with(|| expensive_function(x))
    }
};
let _heap_with_context = BinaryHeap::new_by_key(memoized_function);

// make a heap with non-'static context
let data = vec![2.0, 5.0, 1.0];
let _heap_that_borrows = BinaryHeap::new_by(|&a: &usize, &b: &usize| {
    data[a].partial_cmp(&data[b]).unwrap()
});

(not that I know whether any of this is useful for typical Heap use cases. The last time I used a heap in my code was well over a year ago…)

Nemo157 · May 9, 2018, 10:49pm

Instead of having FnComparator couldn’t you impl<T, F> Compare<T> for F where F: FnMut(&T, &T) -> Ordering and have fun new_by(cmp: impl Comparator)? Maybe that breaks the closure type inference though.

ExpHP · May 9, 2018, 11:50pm

Instead of having FnComparator couldn’t you impl<T, F> Compare for F where F: FnMut(&T, &T) -> Ordering and have fun new_by(cmp: impl Comparator)? Maybe that breaks the closure type inference though.

To what end? As far as I can tell:

A user can already construct a BinaryHeap from an arbitrary closure
- ...and even return it from a function as BinaryHeap<T, impl Compare<T>>.
A user's generic function can take BinaryHeap<T, impl Compare<T>> as an argument.
A user's generic function can take F: FnMut(&T, &T) -> bool as an argument and construct a BinaryHeap using it.

So basically, beyond the fact that it shows up in the return type of new_by, hardly anyone needs to know that FnComparator even exists.

Meanwhile, a blanket impl like impl<T, F> Compare<T> for F where F: FnMut(&T, &T) -> Ordering often has a tendency to throw a wrench into basically every aspect of the design because blanket impls beget more blanket impls, which often readily overlap with other impls, which leads to a significant amount of headache for everyone. (which could be justified if it gave us some kind of clear and significant advantage; but as far as I can tell there is none)

scottmcm · May 10, 2018, 1:27am

The only thing that abstract type buys is the stdlib not needing to implement FnMut manually for the (could be unstable) type that's the default. So it can easily do that for now, and compatibly change later. (It could also use typeof(<T as Ord>::cmp) instead if we got support for that, since it's also a ZST function object.)

And overall I'd rather have new_by<F>(f: F) -> BinaryHeap<T, F> than -> BinaryHeap<T, FnComparator<F>>. (Like with_hasher.)

ExpHP · May 10, 2018, 3:04am

What I meant to say is, I don't see what having F: FnMut as the type parameter buys us. As far as I can see, the concern is purely aesthetic.

sekineh · May 10, 2018, 5:24pm

Thanks, all! What I posted in the first time was like a pre-pre-...RFC!

I learned a lot from your comments and read the first edition of the book’s closure section. I’d like to experiment with liballoc along with the suggestions.

By the way, impl Trait in the return position is coming in 1.26.0. I think it will help!

DDOtten · May 11, 2018, 1:21pm

I have a backwards compatible proposal that I think would be similar to the way sort already works. There are currently 4 methods that create a BinaryHeap. These are new, with_capacity, from, and into. I propose that we change the definition of a BinaryHeap to:

struct BinaryHeap<T, C>
where
    C: FnMut(&T, &T) -> Ordering,
{
    data: Vec<T>,
    cmp: C,
}

Here we use FnMut as it gives the greatest flexibility. If C does not capture any variables it would be a zero sized type. The current heaps are an example of this. Similar to sort, sort_by, and sort_by_key we define:

impl<T> BinaryHeap<T, fn(&T, &T) -> Ordering> {
    pub fn new() -> BinaryHeap<T, Max<T>>
    where
        T: Ord,
    {
        BinaryHeap {
            data: Vec::new(),
            cmp: Max::new(),
        }
    }

    pub fn new_by<C>(cmp: C) -> BinaryHeap<T, C>
    where
        C: FnMut(&T, &T) -> Ordering,
    {
        BinaryHeap {
            data: Vec::new(),
            cmp,
        }
    }

    pub fn new_by_key<K, F>(mut f: F) -> BinaryHeap<T, impl FnMut(&T, &T) -> Ordering>
    where
        K: Ord,
        F: FnMut(&T) -> K,
    {
        BinaryHeap {
            data: Vec::new(),
            cmp: move |x, y| K::cmp(&f(x), &f(y)),
        }
    }
}

We use fn(&T, &T) -> Ordering to have a concrete type to define these methods on. The type is not important here, the only thing we care about is that it implements FnMut(&T, &T) -> Ordering. We implement

with_capacity,
with_capacity_by,
with_capacity_by_key,
from,
from_by,
from_by_key

in similar ways. The From trait is only implemented for Vec so we can easily generalise it this way. I propose that we keep this trait implementation and define the other two functions as methods. The only function we can not easily generalize this way is into. For this we would have to define methods on Vec and I am not sure that would be worth it.

We could use new() -> BinaryHeap<T, impl FnMut(&T, &T) -> Ordering> instead of new() -> BinaryHeap<T, Max<T>> and do the same for from, and with_capacity. However this would mean that the compiler would think that these three functions don’t return the same type. This would be a breaking change as these functions currently do return the same type. So I propose we make a new struct:

#![feature(fn_traits, unboxed_closures)]

struct Max<T: Ord>(PhantomData<fn(&T, &T) -> Ordering>);

impl<T: Ord> Max<T> {
    pub fn new() -> Max<T> {
        Max(PhantomData)
    }
}

impl<'a, 'b, T: Ord> FnOnce<(&'a T, &'b T)> for Max<T> {
    type Output = Ordering;
    extern "rust-call" fn call_once(self, args: (&'a T, &'b T)) -> Ordering {
        T::cmp(args.0, args.1)
    }
}

impl<'a, 'b, T: Ord> FnMut<(&'a T, &'b T)> for Max<T> {
    extern "rust-call" fn call_mut(&mut self, args: (&'a T, &'b T)) -> Ordering {
        T::cmp(args.0, args.1)
    }
}

impl<'a, 'b, T: Ord> Fn<(&'a T, &'b T)> for Max<T> {
    extern "rust-call" fn call(&self, args: (&'a T, &'b T)) -> Ordering {
        T::cmp(args.0, args.1)
    }
}

What is currelty BinaryHeap<T> would become BinaryHeap<T, Max<T>>. We can define Min<T> in a similar way. You can create a max heap by calling new() or new_by(Max::new()), a min heap can be ceated by calling new_by(Min::new()). This would be simalar to sorting as one can now sort a slice increasingly by calling sort() or sort_by(Max::new()) and decreasingly by calling sort_by(Min::new()).

DDOtten · May 11, 2018, 1:36pm

I implemented this on the playground here. It works using the latest nightly and only requires impl Trait which will be stabilized in the next release.

ExpHP · May 11, 2018, 2:10pm

Did you share the right link? I see feature(fn_traits), which I certainly would not describe as only requiring impl Trait. An unboxed closure in the standard library would be completely unprecedented, and would make some people (cough cough hack me) extremely annoyed that only the standard library is allowed to do it.

P.S. 1.26 is released.

Topic		Replies	Views
BinaryHeap RFC, redux language design	16	1489	February 27, 2020
Specializing BinaryHeap to MaxHeap and MinHeap libs	4	1148	October 30, 2021
Pre-RFC: Add comparator-api for ordered collections libs	25	2452	April 11, 2022
Subteam reports: 2015-08-07 announcements	1	3136	March 25, 2019
Subteam Reports 2016-04-25 announcements	2	1392	March 25, 2019