BinaryHeap RFC, redux

Last year, @sekineh drafted an RFC and published a prototype crate proposing changing std::BinaryHeap to permit custom (potentially stateful) comparators.

The Internals thread was eventually closed for inactivity, and as far as I can tell, the RFC was never proposed on the official RFC repo; @sekineh or anyone else, are you still interested in working on such an RFC? I think this would be a valuable feature.

1 Like

Edit: playground codes revised after initial post. Edit: Problem title edited.

The biggest stopper for me was the problem described below. Edit: but the entire problem space is too big and I can’t write actual RFC for the time being.

Problem: Adding type parameter with default type is not sufficiently backward-compatible

The biggest problem I've encountered is backward compatibility of added type parameter. Generally adding default type parameter is considered 'compatible' but it isn't really.

Suppose you have struct DataStructure<T> and wrote some code:

    let data = DataStructure::from(vec![0, 1, 2]);
    println!("{:?}", data);

Then you extended your struct with second type parameter with default argument:

pub struct DataStructure<T, S = DefaultStrategy>
where S: Strategy
{ ... }

Looks nice, but unfortunately it won't compile old codes without added annotation:

   Compiling playground v0.0.1 (/playground)
error[E0282]: type annotations needed for `DataStructure<i32, S>`
  --> src/main.rs:24:16
   |
24 |     let data = DataStructure::from(vec![0, 1, 2]);
   |         ----   ^^^^^^^^^^^^^^^^^^^ cannot infer type for `S`
   |         |
   |         consider giving `data` the explicit type `DataStructure<i32, S>`, where the type parameter `S` is specified

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=368afa8860349e6bd2c59575a9fa8c04

It’s ridiculous because I supplied default type to S.

Possible solutions

(A) 'Fix' compiler to infer the type 'correctly'

I don't know whether it's feasible.

(B) use (ugly) type alias.

Rename the original struct, and provide alias to the old name:

pub type DataStructure<T> = DataStructureWithStrategy<T, DefaultStrategy>;
pub struct DataStructureWithStrategy<T, S> { ... }

Old code and new code both compile, but code becomes ugly:

    // old code compiles!
    let data = DataStructure::from(vec![0, 1, 2]);
    println!("{:?}", data);

    // new code looks ugly... :-( 
    let data: DataStructureWithStrategy<_, DefaultStrategy> = DataStructureWithStrategy::from(vec![0, 1, 2]);
    println!("{:?}", data);

    let data: DataStructureWithStrategy<_, SuperStrategy1> = DataStructureWithStrategy::from(vec![0, 1, 2]);
    println!("{:?}", data);

    let data: DataStructureWithStrategy<_, SuperStrategy2> = DataStructureWithStrategy::from(vec![0, 1, 2]);
    println!("{:?}", data);

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=fb82901dde2616b7108f809978630d3f

I am not sure that storing closure as a part of data structure is flexible enough. I suggest this as a flexibility test:

struct Person { first_name: String, last_name: String }

struct PersonTable {
    persons: Vec<Person>,
    // Index of persons sorted by last and by first name
    by_first_name: BinaryHeap<usize, ???>,
    by_last_name: BinaryHeap<usize, ???>,
}

impl PersonTable {
    fn push(&mut self, p: Person) { ... }
    fn first_by_first_name(&self) -> Option<&Person> {
        self.by_first_name.peek().map(|&idx| &self.persons[idx])
    }
    fn first_by_last_name(&self) -> Option<&Person> {
        self.by_last_name.peek().map(|&idx| &self.persons[idx])
    }
}

My gut feeling is that, due to borrowing constraints, we want to supply closure at a call-site, and not at the construction site (this is similar to the OnceCell/Lazy split).

In today's Rust, the above puzzle can be solved by using a sorted array as an index (because binary_search_by accepts a predicate), but won't work with heaps and maps.

1 Like

As far as I know, you can’t get actual type name from closure in the current rust. We’d need ‘type of’ operator which translates closure into its internal type/struct name for @matklad‘s PersonTable example.

I don’t think name of the type is a problem here: one can use boxed closures. The problem is lifetimes: closures can’t close over the containing struct.

I'd like to first see the motivation expanded. Quoting from the original pre-RFC:

The current BinaryHeap implementation only supports max-heap.

That is not true. BinaryHeap works with an Ord. If you want a min-heap, just add a newtype that swaps the order. Having a newtype with a different Ord implementation is the usual way to encode differences in the order in a type-based way.

We could even have generic types for this:

struct DualOrd<T>(T);
impl PartialOrd<T: PartialOrd> for DualOrd<T> { ... }

Now a BinaryHeap<DualOrd<i32>> is a min-heap.

1 Like

This exists: std::ord::Reverse.

6 Likes

I agree that max/min heaps are easy.

But in real code, I needed plenty of boilerplate codes like below where max/min heap is not sufficient.

Oh you are suggesting better explanation for pre-RFC. Thanks for that!

Yes motivation description was not enough, need expansion.

Sounds to me like the solution is to provide an easier way to generate such newtyped comparisons?

I think your fn cmp can be simplified by doing something like (self.0.count(), self.0.depth, self.0.pos_start).cmp((other.0.count(), other.0.depth, other.0.pos_start)). And for the rest we could have a proc-macro where all you give is the "key" which should be used for comparison and a name for the new type, and then it generates everything.

2 Likes

Easy new type is a possible way to implement flexible heap. But it requires additional call to get its inner type and it feels like redundant or unnatural because c++ can do the same thing by providing a single function.

Edit: I agree that my comparison is not optimum, tuple is good way to simplification. But it simplifies just Ord. The problem i have here is Ord requires other family traits (PartialOrd and PartialEq) also implemented. This is the reason why I feel it’s very, very redundant.

I agree that key comparison as in .sort_by_key() is useful. My experimental crate implemented key comparison: https://docs.rs/binary-heap-plus/0.2.0/binary_heap_plus/struct.BinaryHeap.html#method.new_by_key

It’s using Extract struct from ‘compare’ crate which I found very useful. https://contain-rs.github.io/compare/compare/struct.Extract.html

I don’t think my crate in the current state is ready for std inclusion. The reason includes it does not implemented traits in a very generic way. Many traits are still implemented for BinaryHeap<T, MaxComparator> only because of backward compatibility (type inference) problem.

My current plan is:

  1. Create an issue for the below. Although I don’t believe I’m a first person who encounter this problem, I couldn’t find similar one in existing issues...
  1. Expand traits in my crate and learn something. Fixing issue 1) will give broader implementation space, e.g. I can expand From<Vec> and other std trait for morE generic BinaryHeap<T, C>. (Currently they are only implemented for BinaryHeap<T, MaxComparator> because of the backward-compatibility problem.)

  2. Write a PR for std or RFC if the impl is not trivial enough. I can’t promise schedule etc.

That sounds like a useful crate; do you know if anyone has prototyped such a thing? I looked around a bit but didn't find anything along those lines.

1 Like

It’s an interesting suggestion. My current approach includes closure in its struct member and don’t support call-site closure supply. If I could eliminate closure from its member, it might support call-site supply at some trade off....

No I don't, sorry.

Well, this is definitely suboptimal (in fact it's the first macro I've written), but here's a quick stab at what such a thing would look like: https://github.com/BatmanAoD/flexible_bin_heap

Even with the macro, the user would still need to explicitly convert to and from the wrapper type, which seems inconvenient.

2 Likes