Two distinct use of ordering: semantic ordering and... systematic ordering?

Two kinds of ordering

I feel there are two kinds of "ordering". One is "semantic" ordering, which arises when users (especially the humanity) interpret the value. And the other is... let's call it "systematic" ordering, which is required to put into or retrieve from containers, but not necessarily brings natural interpretation.

I think that the meaning and use of Ord is ambiguous. Does (or should) developers implement it as "semantic" ordering, or "systematic" ordering? And, does users always consider it as "semantic" ordering, or sometimes they expect Ord to be "systematic" ordering?

Should it implement Eq and Ord? If so, what they actually mean?

For example, consider defining a value type for generic data serialization format.

#[derive(Debug, Clone, Copy)]
enum Value {
    Null,
    Boolean(bool),
    Integer(bool, u64),
    Float(f64),
    String(String),
}

Question: Is it natural to implement PartialEq, Eq, PartialOrd, and Ord?

As a human:

  • They can implement PartialEq and Eq.
    • NaNs should be compared bitwise (as they have additional info).
    • Depending on specification or use cases, we want Integer(true, 42) and Float(42.0) to be equal, or not equal. Anyway some consistent ordering can be defined.
  • They shouldn't implement PartialOrd and Ord.
    • Thinking about the ordering among null, true, "foo", and 3.14 would be nonsense.
    • Should true be "less than" "foo", or not? In both case, why?

From this point of view, I consider [Partial]Eq and [Partial]Ord as "semantic" ordering.

But as a developer:

  • They can have total order (not necessarily natural nor meaningful for human).
    • It is quite natural to use the value as keys of BTreeSet, BTreeMap, or some other key-value containers.
      • Ordering would be non-intuitive for human, but it does not matter. Consistency is only we need.
    • Consider you want to normalize the set of the value. You would use BTreeSet<Value>, or use Vec<Value> and .sort() it.
      • It is quite natural to make Vec<Value> sortable, i.e. to give Value total order.

From this point of view, I see [Partial]Eq and [Partial]Ord are required by the containers as "systematic" ordering.

Ambiguity from language and std design?

I've faced this problem several times, and made a decision without 100% confidence every time. I think this relatively simple problem does not have perfect answer for now, and it is because of the language (and/or the standard library) design.

Since Rust 1.62.0 (released today), f64::total_cmp is provided. I feel that this is "systematic" ordering, and that containers requiring Ord should essentially require this kind of ordering, rather than "semantic" ordering provided by traditional PartialOrd and Ord.

Opinions?

So, I'd like to hear your opinions:

  • Are there really "semantic" ordering and "systematic" ordering? Do you need more variantions, or are they essentially equivalent and merged to single ordering?
  • If it is better to have them distinct traits, is it possible to change Rust and the standard libraries to use them with or without breaking changes? If possible, how?
    • Or, is this mitigated by the third-party libraries outside the std?
  • I'm not sure "semantic" and "systematic" is the best words... Any other better words?
9 Likes

There’s a bit of a grey area for containers like BTreeMap due to the fact that they offer API such as BTreeMap::range, which gives a “semantic” meaning to an otherwise mostly “systematic” Ord constraint. Another thing is that the order will (be guaranteed to) determine iteration order, which is “semantic” if (and only if) you care about iteration order.


That said, I did post about this kind of idea myself recently, here:

identifying yet-another distinction, which is that Ord/PartialOrd currently also serve as a way to do operator overloading for </<=/>/>=.

9 Likes

Definitely. We have a ton of things built on Ord that make no sense because people are encouraged to just #[derive(Ord)] without thinking about it.

For example, one of x.clamp(Ok(3), Err(3)) and x.clamp(Err(3), Ok(3) panics immediately for being in the wrong order, but I have no idea which because there's no good reason to pick one as smaller than the other.

Similarly, .fold(None, std::cmp::max) and .fold(None, std::cmp::min) do drastically different things because Option just got derive(Ord) slapped on it at some point in the past, without thinking about how it's not really the semantics that people want -- it's not how nullable values or in SQL or C#, for example.

Not to mention that the derives produce code that cares about the ordering of the variants, which is arguably a misfeature for nominal things like enum variants (and braced-struct fields).

This is the hard part.

We're stuck with min/max and such depending on Ord, and we can't change what Ord does on existing types because that could break code. Not to mention that derive(PartialOrd) does the (IMHO) silly semantic, but we probably can't change that either.

Which implies that we'd need to make it a new trait, but then it wouldn't be usable with things like BTreeMap, which we can't change from needing Ord because that's also a breaking change for people.

So I have no idea how to do anything reasonable here :confused:

15 Likes

Side note, I wish we had gotten median instead of clamp so it did not panic (idea shamelessly cribbed from one of the TAOCP fascicles).

2 Likes

I think we could introduce a new trait backwards-compatibly.

// name bikesheddable ofc
/// Contract: Same requirements as `Ord`.
/// Additionally, if the implementing type also implements `PartialOrd`
/// (which it isn't required to do),
/// then whenever `PartialOrd::partial_cmp` returns
/// `Some(Ordering::Less)` or `Some(Ordering::Greater)`, `SystematicOrd::cmp` 
/// should return `Ordering::Less` or `Ordering::Greater`, respectively.
/// (Should `Some(Ordering::Equal)` -> `Ordering::Equal`
/// also be part of the contract?)
pub trait SystematicOrd {
   ... // identical to Ord
}

impl<T: Ord> SystematicOrd for T { ... }

Because Ord would become a subtrait of SystematicOrd, Ord bounds could be loosened to SystematicOrd backward-compatibly when it makes sense.

Many std types would continue to implement Ord for backward-compatibility even when they should really be only SystematicOrd, but maybe some way to deprecate trait impls could help with that? We have #[deprecated_safe] now, which does something similar.

(Edited to suggest alternative SystematicOrd contract)

1 Like

Would it be similar to StructuralEq, but for Ord?

That wouldn't help e.g. with floats, where SystematicOrd would be expected, or for already existing enums which may have meaningless ordering currently defined.

I've wished for something like this in the past.

Notable thoughts from comparison:

  • Trying to introduce this backwards compatibly would have the issue that types with only #[derive(PartialOrd, Ord)] would now also have to derive SystematicOrd, making this derive much unlike other standard library derives.
  • The majority of use cases of #[derive(PartialOrd, Ord)] really want #[derive(SystematicOrd)]. At the end of the day, I'm not even sure if deriving PartialOrd would still be useful.

I imagine that separating the traits is something that was considered, possibly even attempted before rust 1.0, and was somehow decided against. I think I may have even been told this at one point. Hard to remember.

I think there might actually be two types of SystematicOrd people care about.

  • Float-style SystematicOrd (or SortOrd): useful mostly for sorting. Might compare unequal even when the PartialEq impl compares equal. Using in a BTreeMap is likely a huge footgun.
  • Enum-style SystematicOrd (or StructuralOrd): useful for sorting and for BTreeMap-like things. Should match PartialOrd in all cases where the latter defines an ordering.
1 Like

I hope this may contribute to the discussion, but I don’t really know how to classify (Ord/SystematicOrd) those to kind of sorts:

Pure alphanumeric sort (look at every characters one by one)

  • abc100def
  • abc3def
  • abc99def

Semantic sort (group number together, and compare their values, it’s slower, but more in line with human intuition)

  • abc3def
  • abc99def
  • abc100def

Should both of those be considered SystematicOrd since they can both compare any string? Or does Ord means « as a human would do » in that case only the second one should be Ord only?

1 Like

The way I think idiomatic is, defining dedicated alias types for them with special comparisons intended for specific usage, rather than using generic types as is.

For example, if you want "version-like string", you should define struct VersionString(pub String); and implement comparisons to satisfy VersionString("1.0.10".into()) > VersionString("1.0.9".into()).

In your example, generic string type will use the lexical sorting (i.e. the former ordering), and "number-aware string" type will use the latter ordering. I hope they won't conflict... more strictly, the idiomatic Rust code will distinguish them using type systems so that they can coexist. Both comparisons will be "natural" in some context. Both comparisons are "semantic", according to the intended usage of each type.

std::cmp::Reverse<T>(pub T) seems like the typical example (no pun intended :wink:) of Rust's stance with comparison and defining types: prepare another type for customizing comparisons.

2 Likes

I came up with an idea of providing struct SystematicCompare<T>(pub T); as a customization point for comparisons inside the container (Disclaimer: I am C++er), but it won't be useful for the keys with user-defined generic types since std-trait impls for std-types cannot be fully customizable :cry:.

pub mod std { pub mod cmp {

// In containers (e.g. `BTreeMap<K, V>`), compare keys
// using `Systematic<&K>` rather than `&K`!
#[derive(Default, Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct Systematic<T>(pub T);

// Reuse `PartialOrd` and `Ord` by default.
// This may not cause breaking changes if the container switch to use
// `Systematic<&T>` instead of `&T` for internal comparison.
//
// Container types will require `Systematic<T>: Ord` instead of `T: Ord`,
// but the latter always implies the former!
// No breaking changes 🎉
impl<T: PartialOrd> PartialOrd for Systematic<T> {
    default fn partial_cmp(&self, other: &Self) -> Option<Ordering> { self.0.partial_cmp(&other.0) }
}
impl<T: Ord> Ord for Systematic<T> {
    default fn cmp(&self, other: &Self) -> Ordering { self.0.cmp(&other.0) }
}

}}

// Outside std, users will define custom `Ord for Systematic<T>` as they need.
impl Ord for Systematic<MyType> { /* ... */ }

// Oops, maybe this seems impossible... (´·ω·`)
// `Systematic` and `Ord` are both std.
impl<T> Ord for Systematic<MyGenericType<T>> { /* ouch */ }

This would require feature(specialization) to do what is effectively template specialization for user types.

However, maybe it would make sense to have a standard std::cmp::Total wrapper to provide an arbitrary Ord on.

1 Like

That could work, if it's #[fundamental] to let people add arbitrary implementations for it. It's certainly come up as one of the options for exposing f32::total_cmp in a way that would work with, say, BTreeMap.

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.