Fallible sorting

I'm encountering a limitation of [T]::sort_by (and sort_unstable_by), namely I have a fallible sort function:

fn my_sort(a: &T, b:&T) -> Result<Ordering, Error> { ... }

However sort_by is defined as:

pub fn sort_by<F>(&mut self, compare: F)
where
    F: FnMut(&T, &T) -> Ordering, 
{ ... }

It seems like my use case could potentially be handled by a fallible try_sort_by:

pub fn try_sort_by<E, F>(&mut self, compare: F) -> Result<(), E>
where
    F: FnMut(&T, &T) -> Result<Ordering, E>, 
{ ... }

Does that seem reasonable enough?

On error, I suppose the slice will be left in its arbitrary intermediate state?

FWIW, we're also talking about adding a sort_floats that will use fNN::total_cmp, if that's relevant to what you're doing...

1 Like

Yes, I'd like to have it in the context of a TryFrom<Vec<T>> impl, which takes ownership of the Vec anyway, so if there's an error the entire Vec is thrown out.

I think flowing the fallibility everywhere in all of the sorting logic is a bit iffy to me.

Here's a previous conversation: try_sort · t-libs · Zulip Chat Archive

Do you really need a _by version, as opposed to a _by_key version? What exactly is fallible? If it's really fallible in the comparator that feels like it'd be really easy to not be a legal comparator, as it could return inconsistent things if it fails sometimes.

I think try_sort_by_cached_key would be very reasonable, though. Then it would call the fallible key generator once per item -- in which it could load a file or deserialize something or whatever -- then re-use the same logic that sort_by_cached_key already has to do the sorting now that it has the impl Ord keys.

(And that method does about the same amount of allocation that sort_by does, since they're both one O(n) allocation for temporary space.)

2 Likes

The context is an ASN.1 DER library and an implementation of SET OF, where elements of the set need to be lexicographically sorted by their DER encodings.

The comparison effectively performs a partial encoding of DER productions (namely their tag+length headers), and the encoder is fallible. The main thing that can go wrong is integer overflow when computing the lengths of nested data structures. The entire library is written with checked arithmetic with the goal of avoiding both overflows and panic-on-overflow.

I can't use the *_by_key versions because creating the "key" would involve serializing the entire message which is something I'm definitely trying to avoid.

Could you make the error value always compare as smallest, and then check the first sorted element if it's erring?

1 Like

The error in this case is being returned from the compare function. So similar reasoning, assume an error means don't change the current observed order. Which isn't going to help if the error is inconsistent, but nothing was going to help that.

I don't think this is a legal comparer. Because it's inconsistent if you see cmp(A, B) and cmp(B, A) and get an error.

I think in this case the comparisons should be consistent, because parsing is deterministic, so if some element fails to parse, it will consistently fail to parse on every comparison.

I agree on that part. The bit that I objected to was "an error means don't change the current observed order", because then the "same" items could be in different relative orders after a sort, which feels like it doesn't meet any reasonable postcondition.

But that would be resolvable by just saying "if it returns Err, the only constraint is that the slice will be a permutation of the original". I don't really know that we could offer anything specific when comparers error.

As a workaround making the error case always Ordering::Less and then doing a last pass over them to make sure they appear to be in the correct order should work ok for this particular use case.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.