Pre-RFC: Add comparator-api for ordered collections

TennyZhuang · January 2, 2022, 1:30am

In fact, we have both Hash and Hasher.

Comparator is used in ordered collections, and it seems that no collections depends on Partial*.

steffahn · January 2, 2022, 1:35am

Hash and Hasher are different from Ord vs. Comparator. Hash is a trait that allows implementors to expose their data to a Hasher in the form of a sequence of bytes/numbers. A custom Hasher just controls the way this sequence of bytes/numbers is converted into a single u64. On the other hand, Comparator allows fancy things like e.g. "only compare by one field and ignore the remaining ones", or "work with data that doesn't implement any trait itself". The Hasher in a HashMap allows neither working with data that doesn't implement Hash nor to only hash certain fields.

A Hashor (intentionally silly temporary name) would allow both of these kinds of things. Its job would be to be able to provide a custom way of the "expose data as a sequence of bytes/numbers" part of hashing, so you can only hash certain fields, or can hash work with types that don't implement Hash. In fact, for a HashMap you wouldn't only need a Hashor but also a corresponding Equalitor (intentionally silly temporary name), and furthermore this Hashor+Equalitor pair would have to match, so the "if two keys are equal, their hashes must also be equal" condition is still fulfilled. Maybe it would even make sense to have a trait Hashor: Equalitor relationship, combining the two, since the ability to produce hashes without the ability to also check for collisions might not actually be something you'd need all that often.

xxchan · January 2, 2022, 8:15am

Oh, I got it. This is like C++:

template<
    class Key,
    class T,
    class Compare = std::less<Key>,
    class Hash = std::hash<Key>,
    class KeyEqual = std::equal_to<Key>,
>

Compare, Hash, KeyEqual are all template parameters instead of bounds on Key, so any fancy custom logic can be implemented. But I don't think the comparator Pre-RFC necessarily leads to that situation and this does not constitute a direct objection to the OP.

Let's put aside the API flavor first and just consider what practical fancy things are needed but cannot be done now: the two cases you mentioned ("only compare by one field and ignore the remaining ones" and "work with data that doesn't implement any trait itself") can be solved by newtype pattern with custom Ord and similarly Hash (although some people may complain it's not elegant).

For hashing, there are more magics like hash memoization and value interning, and thus raw_entry API and RawTable API are proposed to handle them.

For comparison, I think it's much less magical than hashing. The OP's runtime generated comparator in database is a good example. The current workaround is too expensive and thus comparator-api is proposed. We can discuss wether the need is real (and important) and then how to solve it.

sekineh · January 4, 2022, 11:05pm

I'm the author of the binary-heap-plus and the ultimate goal of my crate was to merge back to the std (Not Done!). I'm glad to see this thread as comparators are very important. Here are some prior attempts so far you might be able to learn from:

Prior Art

The binary-heap-plus crate uses the compare crate which is very useful and similar to this comparator-api proposal. You could use it as a source of ideas. https://crates.io/crates/compare/0.1.0
Using the knowledge that I've learnt, I wrote the following PR#. It used C: Fn(&T, &T) -> Option<Ordering> as a comparator trait. I had to change return type Ordering to Option<Ordering> because of some backward-compatibility problem. This might demonstrate the problems you'll encounter when you add comparator-api into std. [WIP] Add `From<(Vec<T>, C)>` for BinaryHeap a.k.a. flexible `BinaryHeap` by sekineh · Pull Request #69454 · rust-lang/rust · GitHub

jan_hudec · January 11, 2022, 1:09pm

The cases where the comparison is stateless can be handled by the newtype pattern even if less conveniently. What isn't supported is when the comparison is customized at runtime. E.g. when when strings should be sorted using locale-specific rules for a locale selected at runtime, and possibly different one for different collections, or like the initial example when implementing backing store for something with database-like interface with user-defined ordering.

That said it does make working with types that don't define Ord more convenient even for the cases where the newtype is adequate.

The comparator is a property of the collection, because you need it from each and every operation, including impl Index where you can't pass it otherwise. And it only need to be non-zero-sized if the different collections may need to use different ones, in which case they are not redundant.

So I'd say it not only isn't wrong API, it's the only useful extension of the existing API.

system · April 11, 2022, 1:10pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pre-RFC: BinaryHeap Flexibility libs	35	4479	March 25, 2019
Traits for common data structures libs	45	3223	July 1, 2021
Collection Traits, Take 2	29	7685	May 8, 2019
BTreeSet/BTreeMap: Custom Ord libs	21	4029	March 25, 2019
Libcollections traits: a standard interface for all collections libs	36	5920	March 25, 2019

Pre-RFC: Add comparator-api for ordered collections

Prior Art

Related topics