[Pre-RFC] Unstable sort in libcore


#1

Summary

EDIT: The scope of this pre-RFC is much smaller than it originally was.

This is a proposal to provide slice::unstable_sort in libcore, implemented as pattern-defeating quicksort (pdqsort).

There would also be variants slice::unstable_sort_by and slice::unstable_sort_by_key.

Motivation

slice::sort is by contract a stable sort. This is a decision that can’t be changed now, but stable sorting is a good default.

Rust users have expressed interest in unstable sorts and explained that they very rarely (some even say “never”) need stable sorting, and that it unnecessarily constrains the potential for faster sorting.

Another dissatisfaction is expressed towards libcore, which does not include a sorting method at all.

Wait, what is stable sorting and why would I care?

What does “stable” mean when we talk about sorting? A sorting algorithm is stable if it guarantees that it will not reorder equal elements. For example:

[(0, 5), (0, 4)].sort_by_key(|p| p.0);          // Will preserve the order.
[(0, 5), (0, 4)].unstable_sort_by_key(|p| p.0); // May or may not preserve the order.

Why would anyone care about stability? Take this table as an example.

You want to have a table of albums sorted by release year, while sorting by number of sales within the same year. To do that, first you click the “Claimed sales” column to sort by sales, and then click the “Year” column to sort by year. The original order within the same year will be preserved because that wiki page uses a stable sorting algorithm.

If the second sort were unstable, then the albums within the same year may have been unpredictably shuffled. This is a classic example of stable sorting being useful in practice.

Detailed design

Let’s reiterate why users don’t want stable sorting in most cases:

  1. Because stable sorting is generally slower than unstable sorting.
  2. Because stable sorting usually allocates a lot of memory. Technically, it doesn’t have to, but then it would be even slower!

This creates a fairly clear split between two kinds of sort functions:

  1. Stable sort, which is allowed to allocate.
  2. Unstable sort, which is not allowed to allocate.

In order to make an educated decision on API design, we must first understand what kinds of sorting algorithms are offered, and what are their advantages and deficiencies. The following algorithms are where the state of art is.

| Algorithm | Stable? | Allocates?    | Comparison-based? | Speed      | Complexity |
|-----------|---------|---------------|-------------------|------------|------------|
| timsort   | Yes     | Yes           | Yes               | 1   .. 1   | 1          |
| wikisort  | Yes     | No            | Yes               | 1.7 .. 1.7 | 3          |
| introsort | No      | No            | Yes               | 1   .. 1   | 1.5        |
| pdqsort   | No      | No            | Yes               | 0.6 .. 1   | 2          |
| radixsort | Yes     | ~2kB on stack | No                | 0.4 .. N/A | 1.5        |

Some notes:

  1. Speed is my rough estimate, where lower means faster. The first number indicates how fast the algorithm is when sorting primitive data types (e.g. integers), and the second how fast it is when sorting using a non-trivial comparison function.
  2. Complexity is roughly how much code is required and how difficult it is to implement & understand. Higher number means more complex.
  3. Timsort is the current implementation of slice::sort.
  4. Timsort and wikisort are similar - the difference is that wikisort does not allocate.
  5. Introsort is a typical fusion of quicksort with fallback on heapsort to guarantee O(n log n) worst-case.
  6. Pdqsort is a strict improvement over introsort. When sorting primitive integers or using branchless comparison functions, speed is fantastic - that is the estimated 0.6. In other cases, the performance is comparable to introsort.
  7. Pdqsort and radixsort are not always faster than timsort. Timsort wins when sorting few concatenated sorted lists, but other than that timsort is generally slower.

All right, so which algorithms of the bunch should we use?

  1. Timsort was a good choice for slice::sort.
  2. Now I propose pdqsort for slice::unstable_sort.

Why not use radix sort?

  1. It’s difficult to make it perform as well as pdqsort on partially sorted inputs, and it’s not much faster than pdqsort on random inputs anyway
  2. It’s not applicable as a general purpose comparison-based sort, so it would be used only as a specialization for sorting integers. This further complicates implementation.

How We Teach This

In the documentation for slice::sort just mention that there’s a faster non-allocating alternative: slice::unstable_sort.

Drawbacks

More complexity and code to review. However, pdqsort really isn’t that much more code.

Sometimes slice::unstable_sort might be slower than slice::sort, which could be surprising. An example of such cases would be several concatenated sorted lists. However, such cases are not too common and I’m not aware of any others.

Alternatives

Another possibility would be to simply provide pdqsort in an external crate for users who need it.

Unresolved questions

Since stable and unstable sorting are indistinguishable from each other when sorting integers, it might be a good idea to specialize slice::sort for those cases and switch to pdqsort. But I’m not too fond of this idea. I’d prefer to have more consistent and predictable performance.


#2

For slice::sort_unstably, if elements are primitive integers (u8, i8, u16, i16, etc.), switch to pdqsort.

Perhaps slice::sort also could use pdqsort in this case?


#3

Right. I edited “Unresolved questions” to make this more clear.


#4

Is it intended to be contractual that sort_unstably (in both std and core) does not allocate (or allocates at most O(1) space)? If so, this should be explicit.

Could you supply reference links to all the different sort implementations you mention, please?


#5

I find “unstably” pretty awkward – I don’t think this is commonly used as an adverb. So I’d prefer unstable_sort despite the small ambiguity, because the meaning is well-known in the context of sorting.


#6

Yes. I imagine that would be clearly stated in documentation.

  1. timsort - Our standard sort at the moment.
  2. wikisort - This one doesn’t exist in Rust (yet!) and looks very complex, but I’m confident it can be reasonably simplified.
  3. introsort/pdqsort - Recently released, I exchanged a few emails with the designer of pdqsort and have some ideas for further improvements. Introsort would be exactly the same as pdqsort, except have a simpler fn partition without partition_in_blocks.

Sorry that I don’t have anything more concrete at the moment. If this pre-RFC receives a green light from the community, a RFC with complete implementation will follow.

That’s fair. I don’t feel strongly about naming and would be open to alternatives. Whatever makes most people happy, makes me too.


#7

As I mentioned in my email, pdqsort is not slightly slower than introsort when not sorting integers.

There are two pdqsort variants: one with regular partitioning and one with block partitioning. The block partitioning is significantly (50-80%) faster than introsort when the comparison function is branchless. When the comparison function is not branchless, it’s 5-15% slower.

The version with regular partitioning is indistinguishable from a properly implemented introsort on random data, but is significantly faster on data with patterns (ascending, descending, one element out of order, all equal keys, etc).

pdqsort is intended to be a strict upgrade over introsort. The only confusion here is that I recently added block partitioning, which heavily sped up integer sorting, but slowed other things down slightly. The latest C++ version of pdqsort automatically dispatches the right version of partitioning depending on the comparison operator and data type given to sort.


#8

but in the table it’s the other way round.


#9

The table is correct.

EDIT: the table is incorrect for pdqsort. pdqsort is comparison-based.


#10

Understood. I fixed this in my writeup.

Fixed.

Fixed.


#11

Since I’m not a sort algorithm performance guru, my qualms with this proposal are:

  1. Having a function in std:: and a function in core:: with the same exact name (“sort”) but different implementations strikes me as misleading. I’m used to std::X always being either a reexport or something that can’t exist in core.

  2. Having “unstable_sort” be the name of the “sort that doesn’t allocate” function also seems misleading, unless there’s some deep theoretical reason I’m not aware of why instability obviously and necessarily means an in-place/non-allocating algorithm and instability is considered more important than non-allocation.

I’d rather be more explicit and have “noalloc_sort”, “unstable_sort” and maybe even “noalloc_unstable_sort” be distinct functions from just “sort” so there’s zero confusion over what functions guarantee what, or what is and isn’t a reexport. Then we can move on to figuring out whether pdqsort or wikisort or whatever is the optimal implementation for each one.


#12

After some more thinking, this is something I actually tend to agree with. I’m not so sure about having stable sort in libcore anymore.

Well, naming is hard. The reason why “unstable_sort” and “noalloc_sort” are the same function is that they go hand in hand:

  • If you’re implementing “unstable_sort”, you don’t need allocation because it won’t gain you anything in terms of performance. There isn’t a deep theoretical reason for that, but is something that has been consistently true in practice.
  • If you’re implementing “noalloc_sort”, then a stable variant will be considerably slower than an unstable variant (about twice as slow).

I think most Rust users in most situations prefer performance over stability. They also prefer zero allocation over stability. If they’re using #![no_std] there is no sort function at all!

All that said, stable sorting by default was a good decision, but there is obviously a strong desire for really fast unstable non-allocating sorting. It’s a fortunate fact that high performance, zero-allocation, instability, and inclusion into libcore can be uncompromisingly covered by a single function.


#13

One thing I forgot to consider in my last post is that “inclusion into libcore” already implies “zero-allocation”, so with that in mind I wouldn’t object to “unstable_sort” being the name of the sort function that gets to be in libcore. Then libstd would have “sort” (guarantees stability but might allocate) and a reexport of “unstable_sort” (guarantees no allocation but might be unstable), which seems reasonable to me.

The only case that wouldn’t satisfy would be a sorting algorithm that guarantees both stability and zero-allocation. I assume that’s sufficiently niche we can simply ignore it, i.e. leave it to 3rd-party crates?


#14

I’d be very happy with this solution! :slight_smile:

Exactly. Stable zero-allocation sort is very niche.


#15

I’ve reduced the scope of this pre-RFC.

Stable sorting in libcore (wikisort) is ditched, and all the fuss with block-partitioning was just a distraction. The unstable sort is now named slice::unstable_sort.


#16

I like this reduction in the RFC scope. Further changes are better moved in successive RFCs.

What about adding slice::unstable_sort_by and slice::unstable_sort_by_key too?

Since stable and unstable sorting are indistinguishable from each other when sorting integers, it might be a good idea to specialize slice::sort for those cases and switch to pdqsort. But I’m not too fond of this idea. I’d prefer to have more consistent and predictable performance.

What are exactly the disadvantages of adding this specialization? It seems nice.


#17

I thought it’s obvious that those variants would exist, too. Will mention them. :slight_smile:

I look at this issue from the standpoint of documentation.

If you look at how slice::sort is documented, you’ll see it’s not difficult to understand the performance characteristics of this function. Someone who has a rough idea of how timsort works might want to take advantage of it’s peculiarities. If a user wants to efficiently merge 5 sorted sequences, timsort will probably be faster than pdqsort.

Now, if you patch the function with special cases and say pdqsort is used for this, this, and that case, then it gets ugly. Then, the user might not have the choice of choosing timsort vs pdqsort anymore!

This might seem like a contrived example, but having straightforward implementations that don’t patch various cases with wildly different algorithms is certainly desirable.


#18

Let’s make sure we document the difference between a “stable sort” and an “unstable sort” and give some examples of when it matters.


#19

Done!


#20

I would prefer if the API of unstable sort in libcore would optionally take a buffer (as a slice) since typically is possible to significantly speed up unstable sort with one. Obviously, unstable sort should still work when provided with an empty buffer. I think that taking a slice for the buffer would be the correct thing to do, but I am not 100% confident about this. Are there any alternatives?

For lib std, on the other hand, it would make sense to have an unstable sort function that does not take a buffer but always tries to allocate one. Those who want to avoid the allocation can always resort to using the one from libcore.