Pre-RFC: BinaryHeap Flexibility

Seriously, I don’t think there’s anything wrong with having Compare<T> (except that it should take a &mut self argument).

Here’s wonderful things you can do today in Rust (even in 1.25) with a Compare trait: (playground). Waiting for abstract types buys you nothing of value, as far as I can tell.

fn f(x: &usize, y: &usize) -> Ordering {
    usize::cmp(&x, &y)
}

let _heap = BinaryHeap::<i32>::new();

let _heap = BinaryHeap::new_by(f);

let _heap = BinaryHeap::new_by(|x: &_, y: &_| usize::cmp(x, y));

let _heap = BinaryHeap::new_by_key(|x: &i32| -x);

// make a heap with context
fn expensive_function(x: i32) -> i32 { -x }
let memoized_function = {
    let mut memo = HashMap::new();
    move |&x: &_| {
        *memo.entry(x).or_insert_with(|| expensive_function(x))
    }
};
let _heap_with_context = BinaryHeap::new_by_key(memoized_function);

// make a heap with non-'static context
let data = vec![2.0, 5.0, 1.0];
let _heap_that_borrows = BinaryHeap::new_by(|&a: &usize, &b: &usize| {
    data[a].partial_cmp(&data[b]).unwrap()
});

(not that I know whether any of this is useful for typical Heap use cases. The last time I used a heap in my code was well over a year ago…)

4 Likes

Instead of having FnComparator couldn’t you impl<T, F> Compare<T> for F where F: FnMut(&T, &T) -> Ordering and have fun new_by(cmp: impl Comparator)? Maybe that breaks the closure type inference though.

Instead of having FnComparator couldn’t you impl<T, F> Compare for F where F: FnMut(&T, &T) -> Ordering and have fun new_by(cmp: impl Comparator)? Maybe that breaks the closure type inference though.

To what end? As far as I can tell:

  • A user can already construct a BinaryHeap from an arbitrary closure
    • ...and even return it from a function as BinaryHeap<T, impl Compare<T>>.
  • A user's generic function can take BinaryHeap<T, impl Compare<T>> as an argument.
  • A user's generic function can take F: FnMut(&T, &T) -> bool as an argument and construct a BinaryHeap using it.

So basically, beyond the fact that it shows up in the return type of new_by, hardly anyone needs to know that FnComparator even exists.

Meanwhile, a blanket impl like impl<T, F> Compare<T> for F where F: FnMut(&T, &T) -> Ordering often has a tendency to throw a wrench into basically every aspect of the design because blanket impls beget more blanket impls, which often readily overlap with other impls, which leads to a significant amount of headache for everyone. (which could be justified if it gave us some kind of clear and significant advantage; but as far as I can tell there is none)

The only thing that abstract type buys is the stdlib not needing to implement FnMut manually for the (could be unstable) type that's the default. So it can easily do that for now, and compatibly change later. (It could also use typeof(<T as Ord>::cmp) instead if we got support for that, since it's also a ZST function object.)

And overall I'd rather have new_by<F>(f: F) -> BinaryHeap<T, F> than -> BinaryHeap<T, FnComparator<F>>. (Like with_hasher.)

1 Like

What I meant to say is, I don't see what having F: FnMut as the type parameter buys us. As far as I can see, the concern is purely aesthetic.

Thanks, all! What I posted in the first time was like a pre-pre-...RFC!

I learned a lot from your comments and read the first edition of the book’s closure section. I’d like to experiment with liballoc along with the suggestions.

By the way, impl Trait in the return position is coming in 1.26.0. I think it will help!

I have a backwards compatible proposal that I think would be similar to the way sort already works. There are currently 4 methods that create a BinaryHeap. These are new, with_capacity, from, and into. I propose that we change the definition of a BinaryHeap to:

struct BinaryHeap<T, C>
where
    C: FnMut(&T, &T) -> Ordering,
{
    data: Vec<T>,
    cmp: C,
}

Here we use FnMut as it gives the greatest flexibility. If C does not capture any variables it would be a zero sized type. The current heaps are an example of this. Similar to sort, sort_by, and sort_by_key we define:

impl<T> BinaryHeap<T, fn(&T, &T) -> Ordering> {
    pub fn new() -> BinaryHeap<T, Max<T>>
    where
        T: Ord,
    {
        BinaryHeap {
            data: Vec::new(),
            cmp: Max::new(),
        }
    }

    pub fn new_by<C>(cmp: C) -> BinaryHeap<T, C>
    where
        C: FnMut(&T, &T) -> Ordering,
    {
        BinaryHeap {
            data: Vec::new(),
            cmp,
        }
    }

    pub fn new_by_key<K, F>(mut f: F) -> BinaryHeap<T, impl FnMut(&T, &T) -> Ordering>
    where
        K: Ord,
        F: FnMut(&T) -> K,
    {
        BinaryHeap {
            data: Vec::new(),
            cmp: move |x, y| K::cmp(&f(x), &f(y)),
        }
    }
}

We use fn(&T, &T) -> Ordering to have a concrete type to define these methods on. The type is not important here, the only thing we care about is that it implements FnMut(&T, &T) -> Ordering. We implement

  • with_capacity,
  • with_capacity_by,
  • with_capacity_by_key,
  • from,
  • from_by,
  • from_by_key

in similar ways. The From trait is only implemented for Vec so we can easily generalise it this way. I propose that we keep this trait implementation and define the other two functions as methods. The only function we can not easily generalize this way is into. For this we would have to define methods on Vec and I am not sure that would be worth it.

We could use new() -> BinaryHeap<T, impl FnMut(&T, &T) -> Ordering> instead of new() -> BinaryHeap<T, Max<T>> and do the same for from, and with_capacity. However this would mean that the compiler would think that these three functions don’t return the same type. This would be a breaking change as these functions currently do return the same type. So I propose we make a new struct:

#![feature(fn_traits, unboxed_closures)]

struct Max<T: Ord>(PhantomData<fn(&T, &T) -> Ordering>);

impl<T: Ord> Max<T> {
    pub fn new() -> Max<T> {
        Max(PhantomData)
    }
}

impl<'a, 'b, T: Ord> FnOnce<(&'a T, &'b T)> for Max<T> {
    type Output = Ordering;
    extern "rust-call" fn call_once(self, args: (&'a T, &'b T)) -> Ordering {
        T::cmp(args.0, args.1)
    }
}

impl<'a, 'b, T: Ord> FnMut<(&'a T, &'b T)> for Max<T> {
    extern "rust-call" fn call_mut(&mut self, args: (&'a T, &'b T)) -> Ordering {
        T::cmp(args.0, args.1)
    }
}

impl<'a, 'b, T: Ord> Fn<(&'a T, &'b T)> for Max<T> {
    extern "rust-call" fn call(&self, args: (&'a T, &'b T)) -> Ordering {
        T::cmp(args.0, args.1)
    }
}

What is currelty BinaryHeap<T> would become BinaryHeap<T, Max<T>>. We can define Min<T> in a similar way. You can create a max heap by calling new() or new_by(Max::new()), a min heap can be ceated by calling new_by(Min::new()). This would be simalar to sorting as one can now sort a slice increasingly by calling sort() or sort_by(Max::new()) and decreasingly by calling sort_by(Min::new()).

1 Like

I implemented this on the playground here. It works using the latest nightly and only requires impl Trait which will be stabilized in the next release.

1 Like

Did you share the right link? I see feature(fn_traits), which I certainly would not describe as only requiring impl Trait. An unboxed closure in the standard library would be completely unprecedented, and would make some people (cough cough hack me) extremely annoyed that only the standard library is allowed to do it.

P.S. 1.26 is released.

1 Like

I've implemented using @ExpHP's method because I want to run the code on stable compiler.

Almost all test code except unstable ones works in backward-compatible way.

While coding, Clone was difficult but 1.26 added Clone for closures so it compiles on 1.26!

The is_sorted crate needs callables representing binary operations, and it does so in a very different approach than what is suggested here.

pub mod ordering {
    use cmp::Ordering;

    /// Equivalent to `Ord::partial_cmp(a, b)`
    pub struct less_than();

    impl<'a, 'b, T: PartialOrd> FnOnce<(&'a T, &'b T)> for less_than {
        type Output = Option<Ordering>;
        extern "rust-call" fn call_once(
            self, arg: (&'a T, &'b T),
        ) -> Self::Output {
            arg.0.partial_cmp(arg.1)
        }
    }

    impl<'a, 'b, T: PartialOrd> FnMut<(&'a T, &'b T)> for less_than {
        extern "rust-call" fn call_mut(
            &mut self, arg: (&'a T, &'b T),
        ) -> Self::Output {
            arg.0.partial_cmp(arg.1)
        }
    }

    // less_equal, greater_than, greater_equal, equal, not_equal
}

It then defines some consts to make these easier to use:

pub const less_than: ordering::less_than = ordering::less_than();
// less_equal, greater_than, greater_equal, equal, not_equal

This way you can just write:

type MaxHeap = BinaryHeap<T, C = ordering::greater_thanr>;

or directly use them, for example, with sort_by:

foo.sort_by(greater_than);  // works because greater_than is a `const`

One thing these types allow is specializing impls for different comparators. The is_sorted crate specialize the is_sorted_... iterator algorithms for {less,greater}_than to achieve up to 10x speed-ups using std::arch intrinsics. These speed-ups are not possible if the user passes a lambda to the is sorted functions because we can’t know what the lambda does, and there is no way to specialize on that.

I think these types and consts are sufficiently generally useful for them to be on their own RFC. Once an RFC about these is merged, proposing customizable BinaryHeaps can be done in subsequent RFCs.

I also think that trying to do both things in the same RFC would be un-productive because it would mix the discussion about two completely-unrelated concerns: being able to denote certain types of operations at the type level (e.g. less_than, add, …) so that one can do something based on them (customize data-structures, specialize algorithms, etc.), with how to make a particular data-structure more generic.

The C++ <functional> header added a bunch of these Callables to C++ in C++11: https://en.cppreference.com/w/cpp/utility/functional The main mistake they made is not make these Callables generic on their arguments, but they fixed that in C++14.

@gnzlbg

My comment on less_than approach is:

  • less_than() function traditionally returns bool instead of Ordering. The name is confusing.
  • the struct name should be LessThan
  • the const should be LESS_THAN
  • using the same less_than term for the different things (struct name and const name) is confusing.

For RFC purpose, the both the above less_than approach and my Compare<T> trait are too controversial to be accepted widely. I think FnMut approach feels more natural and make people more comfortable. But to use FnMut elegantly, we might want extra unstable features…

As a short-term solution, I published an external crate for those who need various heap NOW (me). Later, RFC may come up. Please note that I’m not familiar with the unstable features and won’t (can’t) write such RFC for the time being.

Don't functions already have distinct types, though? They can't be named right now, but typeof or abstract type ought to make it possible in the future, so you could do something like impl Foo for typeof(<u32 as Ord>::cmp) { ... } or abstract type U32Cmp; fn _dummy() -> U32Cmp { <u32 as Ord>::cmp } impl Foo for U32Cmp { ... }.)

1 Like

The problem is not the type of the function, but that two functions with the same type can do different things.

I can wrap Ord::cmp in a lambda, and coerce that to a function pointer a. I can also coerce Ord::cmp to a function pointer b with the same type as a. What I can’t do is tell what a and b are doing. They are both doing the same thing, just calling Ord::cmp, but there is no way to specialize on “what a function of a certain type actually does”.

I can specialize on the type of the function, but what should I implement there if I don’t know what the function is doing? For example, I can write a lambda that coerces to the same type as a and b but does Ord::cmp(x,y).reverse(). Specializing on the type of a function is not enough.

Thanks for the style nitpicks.

Using Max and Min is an alternative, but I'd like to be able to use PartialOrd as well.

I memoed what I learned from your comments:

  1. Closure in rust is not a function pointer unless you put it in something like Box.

  2. When you create a closure, compiler automatically defines a struct which implement Fn/FnMut/FnOnce family of traits.

  3. So closure is actually a struct! If it does not capture any variable, it is a zero-sized type (ZST).

  4. You could theoretically put a closure on type parameter of generic arguments. By doing that, the implementation is monomorphized into each given closures.

  5. The above has some difficulty in today’s Rust because currently each closure structs don’t have its own name!

  6. Eventually, Rust will have features like typeof and/or abstract type. It will work as a substitute for closure struct name.

That's only true for non-ZST types. So long as you don't have an fn type or a stateful closure, the same type will do the same thing.

Demonstration that u32::gt and u32::lt are ZSTs and distinguishable at the type level: Rust Playground

fn assert_same_type_and_zsts<T: Any, U: Any>(x: T, y: U) {
    assert_eq!(size_of::<T>(), 0);
    assert_eq!(size_of::<U>(), 0);
    assert_eq!(x.get_type_id(), y.get_type_id());
}

fn main() {
    assert_same_type_and_zsts(u32::lt, u32::gt);
}
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `TypeId { t: 15723412695625198684 }`,
 right: `TypeId { t: 3437632669508324879 }`', src/main.rs:8:5

(The fact that these are distinct ZSTs, not just both fn(&u32,&u32)->bool, is core to the perf of things like sort that monomorphize based on a an impl Fn of some kind.)

1 Like

@scottmcm

we still have the problem that two functions, with different types, can do the same thing. And that the user can defined these and we can’t specialize for them.

I still don’t see how this allows us to do anything useful. Either we specialize for Ord::cmp, or a callable type, or… but if the user pass us a different type that just calls Ord::cmp the specialization that we provided won’t trigger because it has a different type.

Or in other words, we can’t specialize for any callable type that is semantically equivalent to Ord::cmp.


Then there is the issue of how we specialize for Ord::cmp(a,b).reverse(). Is there something better than just defining our own types that the user must use for the specializations to trigger?

What specializations are you worried about not being able to apply?

Monomorphizaiton is done, so inlining optimization is done, as has been said multiple times. That leaves default impl specializations, of which there are none today, obviously (as this generic parameter isn’t there and specialization isn’t stable).

1 Like

What specializations are you worried about not being able to apply?

I am already specializing is_sorted_by for {i,u}{8,16,32,64} and f{32,64} for Ord::cmp, Ord::cmp(a,b).reversed(), PartialOrd::cmp(a,b).unwrap() and PartialOrd::cmp(a,b).unwrap().reverse().

This specializations are used by is_sorted as well behind the scenes.

This delivers ~2x and ~10x speed ups in some cases, and ~3-6x speed ups in most cases:

On my laptop (2012 Intel Core i5, AVX, no AVX2) for slices (_baseline is with specialization disabled):

test run_gt_16i_baseline  ... bench:   1,056,679 ns/iter (+/- 143,130)
test run_gt_16i_is_sorted ... bench:     297,737 ns/iter (+/- 73,536)
test run_gt_16u_baseline  ... bench:   1,048,473 ns/iter (+/- 151,361)
test run_gt_16u_is_sorted ... bench:     330,464 ns/iter (+/- 57,180)
test run_gt_32f_baseline  ... bench:   5,859,745 ns/iter (+/- 643,377)
test run_gt_32f_is_sorted ... bench:     665,421 ns/iter (+/- 116,844)
test run_gt_32i_baseline  ... bench:   1,089,292 ns/iter (+/- 124,127)
test run_gt_32i_is_sorted ... bench:     627,826 ns/iter (+/- 85,923)
test run_gt_32u_baseline  ... bench:   1,108,998 ns/iter (+/- 163,938)
test run_gt_32u_is_sorted ... bench:     702,923 ns/iter (+/- 112,332)
test run_gt_64f_baseline  ... bench:   3,824,177 ns/iter (+/- 492,995)
test run_gt_64f_is_sorted ... bench:   1,364,920 ns/iter (+/- 197,156)
test run_gt_64i_baseline  ... bench:   1,310,525 ns/iter (+/- 245,387)
test run_gt_64i_is_sorted ... bench:   1,301,780 ns/iter (+/- 367,669)
test run_gt_64u_baseline  ... bench:   1,313,168 ns/iter (+/- 169,762)
test run_gt_64u_is_sorted ... bench:   1,300,316 ns/iter (+/- 209,528)
test run_gt_8i_baseline   ... bench:   2,010,967 ns/iter (+/- 175,342)
test run_gt_8i_is_sorted  ... bench:     303,082 ns/iter (+/- 68,407)
test run_gt_8u_baseline   ... bench:   2,029,840 ns/iter (+/- 300,009)
test run_gt_8u_is_sorted  ... bench:     337,869 ns/iter (+/- 114,609)
test run_lt_16i_baseline  ... bench:   1,037,078 ns/iter (+/- 104,067)
test run_lt_16i_is_sorted ... bench:     301,939 ns/iter (+/- 89,317)
test run_lt_16u_baseline  ... bench:   1,031,273 ns/iter (+/- 115,523)
test run_lt_16u_is_sorted ... bench:     334,392 ns/iter (+/- 107,146)
test run_lt_32f_baseline  ... bench:   3,201,105 ns/iter (+/- 257,790)
test run_lt_32f_is_sorted ... bench:     681,012 ns/iter (+/- 264,631)
test run_lt_32i_baseline  ... bench:   1,140,243 ns/iter (+/- 587,907)
test run_lt_32i_is_sorted ... bench:     789,255 ns/iter (+/- 890,137)
test run_lt_32u_baseline  ... bench:   1,192,369 ns/iter (+/- 615,291)
test run_lt_32u_is_sorted ... bench:     747,146 ns/iter (+/- 233,472)
test run_lt_64f_baseline  ... bench:   3,635,677 ns/iter (+/- 3,049,427)
test run_lt_64f_is_sorted ... bench:   1,565,629 ns/iter (+/- 521,948)
test run_lt_64i_baseline  ... bench:   1,321,831 ns/iter (+/- 265,014)
test run_lt_64i_is_sorted ... bench:   1,478,014 ns/iter (+/- 412,410)
test run_lt_64u_baseline  ... bench:   1,428,323 ns/iter (+/- 413,818)
test run_lt_64u_is_sorted ... bench:   1,313,273 ns/iter (+/- 223,336)
test run_lt_8i_baseline   ... bench:   2,000,606 ns/iter (+/- 253,341)
test run_lt_8i_is_sorted  ... bench:     299,591 ns/iter (+/- 45,478)
test run_lt_8u_baseline   ... bench:   1,989,555 ns/iter (+/- 247,432)
test run_lt_8u_is_sorted  ... bench:     327,004 ns/iter (+/- 74,697)

On an Xeon(R) CPU E5-2695 v3 @ 2.30GHz:

test run_gt_16i_baseline  ... bench:     530,724 ns/iter (+/- 9,966)
test run_gt_16i_is_sorted ... bench:     164,685 ns/iter (+/- 2,791)
test run_gt_16u_baseline  ... bench:     530,724 ns/iter (+/- 11,917)
test run_gt_16u_is_sorted ... bench:     166,817 ns/iter (+/- 2,649)
test run_gt_32f_baseline  ... bench:   3,870,192 ns/iter (+/- 58,284)
test run_gt_32f_is_sorted ... bench:     270,014 ns/iter (+/- 3,011)
test run_gt_32i_baseline  ... bench:     530,887 ns/iter (+/- 2,091)
test run_gt_32i_is_sorted ... bench:     329,381 ns/iter (+/- 5,795)
test run_gt_32u_baseline  ... bench:     530,927 ns/iter (+/- 19,677)
test run_gt_32u_is_sorted ... bench:     333,764 ns/iter (+/- 6,847)
test run_gt_64f_baseline  ... bench:   2,180,730 ns/iter (+/- 52,635)
test run_gt_64f_is_sorted ... bench:     538,037 ns/iter (+/- 8,761)
test run_gt_64i_baseline  ... bench:     691,102 ns/iter (+/- 14,588)
test run_gt_64i_is_sorted ... bench:     628,104 ns/iter (+/- 15,375)
test run_gt_64u_baseline  ... bench:     690,482 ns/iter (+/- 16,620)
test run_gt_64u_is_sorted ... bench:     690,421 ns/iter (+/- 15,802)
test run_gt_8i_baseline   ... bench:     909,540 ns/iter (+/- 16,632)
test run_gt_8i_is_sorted  ... bench:     136,164 ns/iter (+/- 1,880)
test run_gt_8u_baseline   ... bench:     909,536 ns/iter (+/- 14,981)
test run_gt_8u_is_sorted  ... bench:     140,761 ns/iter (+/- 2,443)
test run_lt_16i_baseline  ... bench:     530,707 ns/iter (+/- 2,364)
test run_lt_16i_is_sorted ... bench:     160,733 ns/iter (+/- 1,780)
test run_lt_16u_baseline  ... bench:     530,713 ns/iter (+/- 9,249)
test run_lt_16u_is_sorted ... bench:     168,386 ns/iter (+/- 2,356)
test run_lt_32f_baseline  ... bench:   1,529,191 ns/iter (+/- 11,369)
test run_lt_32f_is_sorted ... bench:     269,269 ns/iter (+/- 3,844)
test run_lt_32i_baseline  ... bench:     530,920 ns/iter (+/- 18,349)
test run_lt_32i_is_sorted ... bench:     327,124 ns/iter (+/- 5,909)
test run_lt_32u_baseline  ... bench:     530,893 ns/iter (+/- 749)
test run_lt_32u_is_sorted ... bench:     338,442 ns/iter (+/- 3,649)
test run_lt_64f_baseline  ... bench:   1,537,085 ns/iter (+/- 22,255)
test run_lt_64f_is_sorted ... bench:     537,958 ns/iter (+/- 15,249)
test run_lt_64i_baseline  ... bench:     690,423 ns/iter (+/- 21,300)
test run_lt_64i_is_sorted ... bench:     635,785 ns/iter (+/- 17,912)
test run_lt_64u_baseline  ... bench:     690,270 ns/iter (+/- 15,633)
test run_lt_64u_is_sorted ... bench:     690,380 ns/iter (+/- 13,587)
test run_lt_8i_baseline   ... bench:     909,537 ns/iter (+/- 13,486)
test run_lt_8i_is_sorted  ... bench:     131,389 ns/iter (+/- 1,350)
test run_lt_8u_baseline   ... bench:     909,535 ns/iter (+/- 20,730)
test run_lt_8u_is_sorted  ... bench:     140,781 ns/iter (+/- 1,980)

@LukasKalbertodt made some comparisons of this against the std facilities here:

So even if specialization isn't stable, I'd like this to not be a "solution that only works for BinaryHeap", but something that works for me too, given that both things are 99% similar.