# Fallible sorting

I'm encountering a limitation of `[T]::sort_by` (and `sort_unstable_by`), namely I have a fallible sort function:

``````fn my_sort(a: &T, b:&T) -> Result<Ordering, Error> { ... }
``````

However `sort_by` is defined as:

``````pub fn sort_by<F>(&mut self, compare: F)
where
F: FnMut(&T, &T) -> Ordering,
{ ... }
``````

It seems like my use case could potentially be handled by a fallible `try_sort_by`:

``````pub fn try_sort_by<E, F>(&mut self, compare: F) -> Result<(), E>
where
F: FnMut(&T, &T) -> Result<Ordering, E>,
{ ... }
``````

Does that seem reasonable enough?

On error, I suppose the slice will be left in its arbitrary intermediate state?

FWIW, we're also talking about adding a `sort_floats` that will use `fNN::total_cmp`, if that's relevant to what you're doing...

1 Like

Yes, I'd like to have it in the context of a `TryFrom<Vec<T>>` impl, which takes ownership of the `Vec` anyway, so if there's an error the entire `Vec` is thrown out.

I think flowing the fallibility everywhere in all of the sorting logic is a bit iffy to me.

Here's a previous conversation: try_sort · t-libs · Zulip Chat Archive

Do you really need a `_by` version, as opposed to a `_by_key` version? What exactly is fallible? If it's really fallible in the comparator that feels like it'd be really easy to not be a legal comparator, as it could return inconsistent things if it fails sometimes.

I think `try_sort_by_cached_key` would be very reasonable, though. Then it would call the fallible key generator once per item -- in which it could load a file or deserialize something or whatever -- then re-use the same logic that `sort_by_cached_key` already has to do the sorting now that it has the `impl Ord` keys.

(And that method does about the same amount of allocation that `sort_by` does, since they're both one O(n) allocation for temporary space.)

2 Likes

The context is an ASN.1 DER library and an implementation of `SET OF`, where elements of the set need to be lexicographically sorted by their DER encodings.

The comparison effectively performs a partial encoding of DER productions (namely their tag+length headers), and the encoder is fallible. The main thing that can go wrong is integer overflow when computing the lengths of nested data structures. The entire library is written with checked arithmetic with the goal of avoiding both overflows and panic-on-overflow.

I can't use the `*_by_key` versions because creating the "key" would involve serializing the entire message which is something I'm definitely trying to avoid.

Could you make the error value always compare as smallest, and then check the first sorted element if it's erring?

1 Like

The error in this case is being returned from the compare function. So similar reasoning, assume an error means don't change the current observed order. Which isn't going to help if the error is inconsistent, but nothing was going to help that.

I don't think this is a legal comparer. Because it's inconsistent if you see `cmp(A, B)` and `cmp(B, A)` and get an error.

I think in this case the comparisons should be consistent, because parsing is deterministic, so if some element fails to parse, it will consistently fail to parse on every comparison.

I agree on that part. The bit that I objected to was "an error means don't change the current observed order", because then the "same" items could be in different relative orders after a sort, which feels like it doesn't meet any reasonable postcondition.

But that would be resolvable by just saying "if it returns `Err`, the only constraint is that the slice will be a permutation of the original". I don't really know that we could offer anything specific when comparers error.

As a workaround making the error case always `Ordering::Less` and then doing a last pass over them to make sure they appear to be in the correct order should work ok for this particular use case.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.