BTreeSet/BTreeMap: Custom Ord


#1

Currently BTreeSet/BTreeMap are defined to take a <T> or a <K, V>.

I propose allowing them to take a <..., F: fn(T or K, T or K) -> std::cmp::Ordering + 'static = stdord> where stdord is defined as:

fn stdord<T: std::cmp::Ord>(left: T, right: T) -> Ordering {
  left.cmp(right)
}

I’m not sure if this is currently possible but it’d be really cool if it were.


#2

You can always make new wrapper type and impl Ord for it. Is there some problem which can’t be easily solved with this pattern?


#3

Hmm, I’m pretty sure this topic was recently discussed, possibly on the user’s forum, but I can’t find it. But basically, it’s boilerplate and inconvenient to need wrapper types for this. An “external” comparison, similar to how a Hasher is used in hash containers, would make this easier and more to the point.


#4

I don’t think Hasher approach fits well with Ord case. The Hasher trait represents specific hash algorithm, and the Hash trait represents the way how to apply that algorithm to this type. This separation is important as we should allow mark types Hashable without having specific algorithm in mind.

But for ordering, I believe that all types which marked as comparable should know how to compare themselves. And with this in mind, comparison-based containers like BTreeMap should always use the provided logic. When I see set.insert(1usize), I expect this 1usize will be compared as every usizes should be, not some other logic the evil yesterday myself wrote.


#5

I don’t think there’s anything fundamental precluding same line of reasoning for Ord - why must the comparison algorithm be intrinsic to the type itself? In its simplest form, you may want to order some elements in reverse of their natural order - currently this requires a newtype wrapper with all ensuing boilerplate.

There’s precedent for allowing custom user-specified comparisons for similar data structures - for ordered sets (maps is similar):

  1. C++
  2. C#
  3. Java

IMO and experience, these are absolutely natural and useful.


#6

I don’t have a strong opinion if this feature should be included, but one usecase i can think of would be types that are not ordered in the mathematical sense, but one wants to store in a BTreeMap/Set.

For example complex numbers. I’d be wrong to mark the type as Ord, but i’d be totally fine to store them in a BTreeSet with an arbitrary ordering like real component before imaginary component.


#7

Another usecase would be a min heap with |x, y| T::cmp(y, x). However currently there is no way to name the type of a function. One way to do this would be the existential type feature so this change would be blocked on that.


#8

existential type???


#9

#10

what does that have to do with

fn foo() {}
type Bar = foo;
// idk how to do type assertions but `Bar` should be `foo` should be `impl fn() for foo`.

#11
fn foo();
fn bar() -> Foo {
    || foo()
}
existential type Foo: Fn() -> ();

This is sort of a hack but could bu used to name a closure that does the same thing as the function. Being able to name the type of the function would be better however there isn’t something planned that allows you to do that as far as I know.


#12

Somewhat unrelated to the OP, but there’s indeed a pattern that could be served neither by newtype wrappers, nor by the proposed fn parameter API.

Consider, for example, this:

struct Entity {
    name: String,
    size: usize,
    // ...
}

struct Scene {
    entities: Vec<Entity>,
    entities_by_name: BTreeSet<u32>,
    entities_by_size: BTreeSet<u32>,
}

That is, I want to store a number of entities, and I want to order them by two properties. To avoid duplication, ideally I’d love to store entities in a single array and use indices. However, I just can’t write a newtype wrapper for u32 to do this, because the wrapper will need access to entities, and you have nothing to work with except for u32. Note as well that I can’t use &Entity instead of u32, b/c that’ll run into “storing owner and ref in a single object” issue.

I think the most general API, which would allow for such usages, is to accept closure with ops on the call site:

map.insert_with_cmp(92, |&i, &j| self.entities[i].name.cmp(&self.entities[j].name));

I’d love to see a crate.io crate with such APIs! The particular use case I had was to write a string interner like

struct Interner {
    data: Vec<String>,
    map: HashSet<u32>,
}

#13

Some questions about this, mostly for my own understanding:

  • It’s immaterial to the intent and the clarity, but I thought this would have to be usize rather than u32.
  • using a closure like this, where does the closure itself get stored for later comparisons as the tree grows?
  • if I have multiple call sites, what enforces that the closures (and thus the ordering) must be the same for all entities?

I think you’d need something more like .new_with_cmp() to address the latter two points at construction time.

Otherwise, I think I’d have to write an Entity type that stored a redundant field for its own vec index (yuck), and newtype (or full type) wrappers to impl each ordering.


#14

but I thought this would have to be usize rather than u32 .

I just use u32 by default for ECS-style indices. If there are a lot of indexes stored, you can save a bunch of memory.

using a closure like this, where does the closure itself get stored for later comparisons as the tree grows?

The trick in the API is that the closure is not stored anywhere, it’s just supplied on the call-site. I’ve cobbled together an interner based on the proposed API here: https://play.rust-lang.org/?gist=cae61bad89f1c77d3e14b0fa05d1586b&version=stable&mode=debug&edition=2015

if I have multiple call sites, what enforces that the closures (and thus the ordering) must be the same for all entities?

Nothing, it’s your responsibility as a programmer to make sure they are correct. If you mess them up, operations with the mapping will give nonsense results, but no memory unsafety (just like it happens with broken Ord implementations).

I think you’d need something more like .new_with_cmp() to address the latter two points at construction time.

Nope! :slight_smile: It I think is a pattern in Rust that supplying closures on the call site (and not on the definition site) makes the API more powerful (but less convenient to use). Another instance where it comes up is that Lazy<T, F: FnOnce() ->T> with fn get(&self) -> &T is less expressive than OnceCell<T> with get_or_init(&self, F: FnOnce() -> T) -> &T (https://docs.rs/once_cell/0.1.4/once_cell/#general-purpose-lazy-evaluation).

Otherwise, I think I’d have to write an Entity type that stored a redundant field for its own vec index (yuck), and newtype (or full type) wrappers to impl each ordering.

You’ll need to store the same entity twice then, or to solve the “storing owner and a borrow together” problem I think?


#15

Yeah, which is fine for insert calls. I hadn’t thought about it too carefully, but I was assuming the comparator would be needed for other operations as well; perhaps rebalancing on removal. If that’s not true for the specific tree implementation (or the set of allowed operations is suitably constrained), ok.

Ok, again not what I expected/assumed but fair enough once clarified and explicit. Thanks.


#16

again, I thought each function was its own type, implementing the fn trait?

you don’t need existential types to use a::foo;

this should just work:

struct Foo<F: fn()> {
    _f: PhantomData<F>
}
fn bar() {
}
const BAZ: Foo<bar> = Foo { _f: PhantomData };

#17

But it doesn’t. If you try to compile this, you will see errors due to the following facts:

  • fn() is not a trait. It is a type.
  • bar is not a type. It is a value of an unnameable type
    • This unnameable type can be coerced to the type fn()
    • It also implements the traits Fn(), FnMut(), and FnOnce().
    • If you try to get the compiler to write the type’s name, it writes fn() {bar}.
      (This is not valid syntax, it only does this for the reader’s sake.)

To see this, consider the following:

fn bar() {}
fn baz() {}

// 'bar' is not a type
let x: bar = bar;
//     ^^^ error[E0573]: expected type, found function `bar`

// bar and baz have different types.
let mut a;
a = bar;
a = baz; // error: mismatched types
         //  note: expected type `fn() {bar}`
         //           found type `fn() {baz}`

// but both can be coerced to fn()
let mut b: fn();
b = bar;
b = baz;

existential type is needed because the true type of bar cannot be written.


#18

so make it possible to write the true type of plain old fn.

it’s just const generics. (not “really confusing syntax and semantics that most ppl wouldn’t understand or need”)

imagine if you could write the type of iterators…


#19

Yes, const generics could probably be used for this, if support for const generics of type fn() are ever added. (but as it stands, const generics are currently not supported in any capacity, so that’s quite a ways off!)

…but people are attracted to existential type because it is strictly more powerful. For example, you can make a BTreeSet of indices sorted by the values in another array:

BTreeSet::new_with_cmp(|i, j| array[i].cmp(&array[j]))

You cannot write this comparator as a fn because the closure has a context. (it borrows array)


#20

okay, they may be more powerful but I don’t like them. I’d rather have const generics and variable generics (and the ability to combine both?).