BinaryHeap RFC, redux

BatmanAoD · November 25, 2019, 2:57am

Last year, @sekineh drafted an RFC and published a prototype crate proposing changing std::BinaryHeap to permit custom (potentially stateful) comparators.

The Internals thread was eventually closed for inactivity, and as far as I can tell, the RFC was never proposed on the official RFC repo; @sekineh or anyone else, are you still interested in working on such an RFC? I think this would be a valuable feature.

sekineh · November 25, 2019, 7:30am

Edit: playground codes revised after initial post. Edit: Problem title edited.

The biggest stopper for me was the problem described below. Edit: but the entire problem space is too big and I can’t write actual RFC for the time being.

Problem: Adding type parameter with default type is not sufficiently backward-compatible

The biggest problem I've encountered is backward compatibility of added type parameter. Generally adding default type parameter is considered 'compatible' but it isn't really.

Suppose you have struct DataStructure<T> and wrote some code:

    let data = DataStructure::from(vec![0, 1, 2]);
    println!("{:?}", data);

Then you extended your struct with second type parameter with default argument:

pub struct DataStructure<T, S = DefaultStrategy>
where S: Strategy
{ ... }

Looks nice, but unfortunately it won't compile old codes without added annotation:

   Compiling playground v0.0.1 (/playground)
error[E0282]: type annotations needed for `DataStructure<i32, S>`
  --> src/main.rs:24:16
   |
24 |     let data = DataStructure::from(vec![0, 1, 2]);
   |         ----   ^^^^^^^^^^^^^^^^^^^ cannot infer type for `S`
   |         |
   |         consider giving `data` the explicit type `DataStructure<i32, S>`, where the type parameter `S` is specified

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=368afa8860349e6bd2c59575a9fa8c04

It’s ridiculous because I supplied default type to S.

Possible solutions

(A) 'Fix' compiler to infer the type 'correctly'

I don't know whether it's feasible.

(B) use (ugly) type alias.

Rename the original struct, and provide alias to the old name:

pub type DataStructure<T> = DataStructureWithStrategy<T, DefaultStrategy>;
pub struct DataStructureWithStrategy<T, S> { ... }

Old code and new code both compile, but code becomes ugly:

    // old code compiles!
    let data = DataStructure::from(vec![0, 1, 2]);
    println!("{:?}", data);

    // new code looks ugly... :-( 
    let data: DataStructureWithStrategy<_, DefaultStrategy> = DataStructureWithStrategy::from(vec![0, 1, 2]);
    println!("{:?}", data);

    let data: DataStructureWithStrategy<_, SuperStrategy1> = DataStructureWithStrategy::from(vec![0, 1, 2]);
    println!("{:?}", data);

    let data: DataStructureWithStrategy<_, SuperStrategy2> = DataStructureWithStrategy::from(vec![0, 1, 2]);
    println!("{:?}", data);

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=fb82901dde2616b7108f809978630d3f

matklad · November 25, 2019, 8:44am

I am not sure that storing closure as a part of data structure is flexible enough. I suggest this as a flexibility test:

struct Person { first_name: String, last_name: String }

struct PersonTable {
    persons: Vec<Person>,
    // Index of persons sorted by last and by first name
    by_first_name: BinaryHeap<usize, ???>,
    by_last_name: BinaryHeap<usize, ???>,
}

impl PersonTable {
    fn push(&mut self, p: Person) { ... }
    fn first_by_first_name(&self) -> Option<&Person> {
        self.by_first_name.peek().map(|&idx| &self.persons[idx])
    }
    fn first_by_last_name(&self) -> Option<&Person> {
        self.by_last_name.peek().map(|&idx| &self.persons[idx])
    }
}

My gut feeling is that, due to borrowing constraints, we want to supply closure at a call-site, and not at the construction site (this is similar to the OnceCell/Lazy split).

In today's Rust, the above puzzle can be solved by using a sorted array as an index (because binary_search_by accepts a predicate), but won't work with heaps and maps.

sekineh · November 25, 2019, 11:07pm

As far as I know, you can’t get actual type name from closure in the current rust. We’d need ‘type of’ operator which translates closure into its internal type/struct name for @matklad‘s PersonTable example.

matklad · November 26, 2019, 6:47am

I don’t think name of the type is a problem here: one can use boxed closures. The problem is lifetimes: closures can’t close over the containing struct.

RalfJung · November 26, 2019, 7:55am

I'd like to first see the motivation expanded. Quoting from the original pre-RFC:

The current BinaryHeap implementation only supports max-heap.

That is not true. BinaryHeap works with an Ord. If you want a min-heap, just add a newtype that swaps the order. Having a newtype with a different Ord implementation is the usual way to encode differences in the order in a type-based way.

We could even have generic types for this:

struct DualOrd<T>(T);
impl PartialOrd<T: PartialOrd> for DualOrd<T> { ... }

Now a BinaryHeap<DualOrd<i32>> is a min-heap.

matklad · November 26, 2019, 9:00am

This exists: std::ord::Reverse.

sekineh · November 26, 2019, 1:10pm

I agree that max/min heaps are easy.

But in real code, I needed plenty of boilerplate codes like below where max/min heap is not sufficient.

github.com

sekineh/wavelet-matrix-rs/blob/e3d9d611b68b2684edce388f1c0e1c0562b96bbe/src/node_range.rs#L68-L90


impl Ord for NodeRangeByFrequency {
    fn cmp(&self, other: &Self) -> Ordering {
        if self.0.count() != other.0.count() {
            self.0.count().cmp(&other.0.count()) // wide node first
        } else if self.0.depth != other.0.depth {
            self.0.depth.cmp(&other.0.depth) // deepest node first
        } else {
            self.0.pos_start.cmp(&other.0.pos_start) // just to make the order more predictable
        }
    }
}


impl PartialOrd for NodeRangeByFrequency {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        Some(self.cmp(&other))
    }
}


impl PartialEq for NodeRangeByFrequency {
    fn eq(&self, other: &Self) -> bool {

This file has been truncated. show original

sekineh · November 26, 2019, 1:15pm

Oh you are suggesting better explanation for pre-RFC. Thanks for that!

Yes motivation description was not enough, need expansion.

RalfJung · November 27, 2019, 9:39am

Sounds to me like the solution is to provide an easier way to generate such newtyped comparisons?

I think your fn cmp can be simplified by doing something like (self.0.count(), self.0.depth, self.0.pos_start).cmp((other.0.count(), other.0.depth, other.0.pos_start)). And for the rest we could have a proc-macro where all you give is the "key" which should be used for comparison and a name for the new type, and then it generates everything.

sekineh · November 27, 2019, 12:09pm

Easy new type is a possible way to implement flexible heap. But it requires additional call to get its inner type and it feels like redundant or unnatural because c++ can do the same thing by providing a single function.

Edit: I agree that my comparison is not optimum, tuple is good way to simplification. But it simplifies just Ord. The problem i have here is Ord requires other family traits (PartialOrd and PartialEq) also implemented. This is the reason why I feel it’s very, very redundant.

I agree that key comparison as in .sort_by_key() is useful. My experimental crate implemented key comparison:

It’s using Extract struct from ‘compare’ crate which I found very useful.

sekineh · November 27, 2019, 9:25pm

I don’t think my crate in the current state is ready for std inclusion. The reason includes it does not implemented traits in a very generic way. Many traits are still implemented for BinaryHeap<T, MaxComparator> only because of backward compatibility (type inference) problem.

My current plan is:

Create an issue for the below. Although I don’t believe I’m a first person who encounter this problem, I couldn’t find similar one in existing issues...

Expand traits in my crate and learn something. Fixing issue 1) will give broader implementation space, e.g. I can expand From<Vec> and other std trait for morE generic BinaryHeap<T, C>. (Currently they are only implemented for BinaryHeap<T, MaxComparator> because of the backward-compatibility problem.)
Write a PR for std or RFC if the impl is not trivial enough. I can’t promise schedule etc.

BatmanAoD · November 27, 2019, 11:22pm

That sounds like a useful crate; do you know if anyone has prototyped such a thing? I looked around a bit but didn't find anything along those lines.

sekineh · November 28, 2019, 2:57am

It’s an interesting suggestion. My current approach includes closure in its struct member and don’t support call-site closure supply. If I could eliminate closure from its member, it might support call-site supply at some trade off....

RalfJung · November 28, 2019, 9:02am

No I don't, sorry.

BatmanAoD · November 29, 2019, 11:10pm

Well, this is definitely suboptimal (in fact it's the first macro I've written), but here's a quick stab at what such a thing would look like: https://github.com/BatmanAoD/flexible_bin_heap

Even with the macro, the user would still need to explicitly convert to and from the wrapper type, which seems inconvenient.

system · February 27, 2020, 11:10pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pre-RFC: BinaryHeap Flexibility libs	35	4471	March 25, 2019
Pre-RFC: Traits for crates (or: canonical API portability) language design	11	710	January 18, 2025
Named type parameters	23	3312	March 25, 2019
Idea/Pre-RFC: Runtime-erased types and reflections around object safety language design	15	831	October 30, 2022
Subteam Reports 2016-04-25 announcements	2	1398	March 25, 2019

BinaryHeap RFC, redux

Problem: Adding type parameter with default type is not sufficiently backward-compatible

Possible solutions

(A) 'Fix' compiler to infer the type 'correctly'

(B) use (ugly) type alias.

Related topics