[Feature request] Methods for sorting/reordering with indices

Hi,

I'm enquiring as to whether two new methods would be of interest to the standard library.

The first is a method (argsort?) for getting the indices that would sort a slice/vector - akin to argsort in numpy (this SO question sums up this method).

The second is a method (sort_by_indices?) for sorting a slice/vector by given indices (such as the output of another slice's argsort; see this SO question).

I would be happy to have a go at implementing both (or one) if they seem in scope.

3 Likes

I've never even heard of these methods let alone use them. What's the use case?

1 Like

I would guess it might help sorting column-oriented data; where you grab the argsort of one column and then apply it to each of the others?

3 Likes

It could be interesting when you want to sort two slices using the same order, for exemple, let's say you have

let mut keys = [0, 2, 1, 3];
let mut values = ["d_val0", "c_val2", "b_val1", "a_val3"];
keys.sort_unstable();
values.???;

And you want to sort both slice "the same way", you can't just sort the values slice, or else you will get &["a_val3", "b_val1", "c_val2", "d_val0"] instead of &["d_val0", "b_val1", "c_val2", "a_val3"].

(ex: rust - How can I co-sort two Vecs based on the values in one of the Vecs? - Stack Overflow)

4 Likes

The problem with proposing new additions to the standard library is that you have to justify why this method must be in the standard library and can not exist as a utility function/crate.

In general only functions/methods are added that are definitely useful for a large number of people, otherwise one would have dead-code in the standard library.

Another problem is "what if the method was not the right decision?" and one later on regrets adding it to the standard library?

One can not remove (as far as I know) methods from the standard library through an edition, so the best thing one could do would be to deprecate it.

A lot of deprecated methods add noise to the documentation which makes reading the docs a bit more annoying with every new deprecated method.

So what I want to say with this, is that one has to think really carefully about which methods should be added to the standard library and you need to give very convincing arguments how this could be used.

I would recommend adding a few examples to your post that show how one could use it.

A working implementation with an extension trait on play.rust-lang.org would be very appreciated as well.

4 Likes

More generally, sort_by_indices is the act of applying a[n inverse] permutation.

1 Like

I don't think it should be called sort_by_indices. Because you are not necessarily sorting it, you are just reordering (or permuting) the array.

2 Likes

See also crates like GitHub - jeremysalwen/rust-permutations: Permutations library for Rust.

FWIW, I've wanted this functionality before. Right now, reordering requires cloning or copying, but it would be great if there was some general interface that would allow sorting based on an external order.

The code where I wanted it is here: instant-distance/lib.rs at main · InstantDomain/instant-distance · GitHub. In this case, I wanted to randomly order values of a generic type, and in this case I had to impose a Clone bound because there is no (straightforward) way to sort based on the generated random numbers (at least I wasn't able to figure one out).

Argsort is a function to get an array of indices that describe the order of the elements in an array. This index array can then be used to rearrange multiple arrays in the same manner.

A sample implementation can be found here: Rust Playground

It seems to me like the best way to apply a permutation to a slice in-place is to first have the permutation in represented according to cycle notation. Then you don't need that scratchpad anymore that the SO answer was talking about.

If the usage pattern is: create a permutation for one unsorted slice and apply it to multiple slices, then creating an optimal1 representation of the permutation first seems like the correcte approach.

1optimal for the application of in-place reordering slices

This approach would mean that an abstract "Permutation" datatype would be used instead of just a list/array of indices. This also seems suficciently specific to be better fitted for a custom crate than the standard library.

2 Likes

These are very common in array-oriented numeric code.

I think the library already has code that does this for sort_by_cached_key, so it might be reasonable to expose it so long as it works for that use.

2 Likes

Something similar is worth adding to Rust stdlib.

Thank you all for the feedback.

@Luro02

I would recommend adding a few examples to your post that show how one could use it.

A working implementation with an extension trait on play.rust-lang.org would be very appreciated as well.

Here is an example of something I used in a recent tool. I can transfer this to a playground if you want though. (I realise this may not be an ideal implementation.)

And some associated tests

@IndianBoy42

I don't think it should be called sort_by_indices. Because you are not necessarily sorting it, you are just reordering (or permuting) the array.

Well the definition of "sort" depends on the comparison function you are using. But I agree with you, there is probably a more descriptive way to name it. reorder, permute etc.

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.