[Pre-RFC] Unstable sort in libcore

Summary

EDIT: The scope of this pre-RFC is much smaller than it originally was.

12 Likes

Is it intended to be contractual that sort_unstably (in both std and core) does not allocate (or allocates at most O(1) space)? If so, this should be explicit.

Could you supply reference links to all the different sort implementations you mention, please?

I find "unstably" pretty awkward -- I don't think this is commonly used as an adverb. So I'd prefer unstable_sort despite the small ambiguity, because the meaning is well-known in the context of sorting.

12 Likes

As I mentioned in my email, pdqsort is not slightly slower than introsort when not sorting integers.

There are two pdqsort variants: one with regular partitioning and one with block partitioning. The block partitioning is significantly (50-80%) faster than introsort when the comparison function is branchless. When the comparison function is not branchless, it’s 5-15% slower.

The version with regular partitioning is indistinguishable from a properly implemented introsort on random data, but is significantly faster on data with patterns (ascending, descending, one element out of order, all equal keys, etc).

pdqsort is intended to be a strict upgrade over introsort. The only confusion here is that I recently added block partitioning, which heavily sped up integer sorting, but slowed other things down slightly. The latest C++ version of pdqsort automatically dispatches the right version of partitioning depending on the comparison operator and data type given to sort.

but in the table it's the other way round.

The table is correct.

EDIT: the table is incorrect for pdqsort. pdqsort is comparison-based.

Since I’m not a sort algorithm performance guru, my qualms with this proposal are:

  1. Having a function in std:: and a function in core:: with the same exact name (“sort”) but different implementations strikes me as misleading. I’m used to std::X always being either a reexport or something that can’t exist in core.

  2. Having “unstable_sort” be the name of the “sort that doesn’t allocate” function also seems misleading, unless there’s some deep theoretical reason I’m not aware of why instability obviously and necessarily means an in-place/non-allocating algorithm and instability is considered more important than non-allocation.

I’d rather be more explicit and have “noalloc_sort”, “unstable_sort” and maybe even “noalloc_unstable_sort” be distinct functions from just “sort” so there’s zero confusion over what functions guarantee what, or what is and isn’t a reexport. Then we can move on to figuring out whether pdqsort or wikisort or whatever is the optimal implementation for each one.

1 Like

One thing I forgot to consider in my last post is that “inclusion into libcore” already implies “zero-allocation”, so with that in mind I wouldn’t object to “unstable_sort” being the name of the sort function that gets to be in libcore. Then libstd would have “sort” (guarantees stability but might allocate) and a reexport of “unstable_sort” (guarantees no allocation but might be unstable), which seems reasonable to me.

The only case that wouldn’t satisfy would be a sorting algorithm that guarantees both stability and zero-allocation. I assume that’s sufficiently niche we can simply ignore it, i.e. leave it to 3rd-party crates?

1 Like

I like this reduction in the RFC scope. Further changes are better moved in successive RFCs.

What about adding slice::unstable_sort_by and slice::unstable_sort_by_key too?

Since stable and unstable sorting are indistinguishable from each other when sorting integers, it might be a good idea to specialize slice::sort for those cases and switch to pdqsort. But I'm not too fond of this idea. I'd prefer to have more consistent and predictable performance.

What are exactly the disadvantages of adding this specialization? It seems nice.

Let’s make sure we document the difference between a “stable sort” and an “unstable sort” and give some examples of when it matters.

1 Like

I would prefer if the API of unstable sort in libcore would optionally take a buffer (as a slice) since typically is possible to significantly speed up unstable sort with one. Obviously, unstable sort should still work when provided with an empty buffer. I think that taking a slice for the buffer would be the correct thing to do, but I am not 100% confident about this. Are there any alternatives?

For lib std, on the other hand, it would make sense to have an unstable sort function that does not take a buffer but always tries to allocate one. Those who want to avoid the allocation can always resort to using the one from libcore.

1 Like

That's the right attitude!

How would you significantly speed up unstable sort with a buffer?

So I just checked because I had in mind that some c++ sort implementations do this, but in fact none of the unstable ones do so I was mistaken. The stable sort implementations in libc++, libstdc++, and MSVC do allocate memory (using std::get_temporary_buffer for merge sort) but the unstable sort ones do not allocate memory. Sorry about the noise.

2 Likes

Everything you said applies to stable sort. The reason std::get_temporary_buffer exists is because it was believed that operating systems would have a way of allocating memory based on memory pressure, such that if you asked for 50MB and that request would end up being slow, but 20MB could be returned cheaply, then you’d get 20MB of memory instead.

Anyways, that never happened, but the stable_sort implementation is required to work whether it gets a temporary buffer or not and there’s no real reason that the sort implementation in either standard or core can’t use that strategy.

1 Like

The faster sort PR (https://github.com/rust-lang/rust/pull/38192) showed its improvement with a bunch of data sets. Will those be the right ones to demonstrate perf of this too, or are there other cases because of the different algorithm?

Should this RFC talk about what kind of quicksort is expected? Single/double pivot? Fat partition because cmp is ternary?

I’d call it inplace_sort(), in case a non-allocating stable sort is used in the future. :bike::house_abandoned:

1 Like

But don't worry, pdqsort does not suffer if the input slice contains a lot of duplicates.

Rather, the opposite happens. pdqsort is both O(n log n) and O(nk), where k is the number of unique elements you're sorting. Both are upper bounds, so when k is fixed, pdqsort is actually linear. This shows in the 'shuffled 16 values' benchmark from my github repo:

3 Likes

I think there may be a desire for really fast sorting and sometimes a requirement for non-allocating sorting, but I can't remember ever hearing anyone express a desire or requirement for unstable sorting. Usually, the requirement is for stable sorting or is indifferent to stability. Hence, I agree with

So I think the naming bikeshed boils down to this poll:

Which of these is more important to express in the name of the function?

  • “Use this function if you need an in-place sort.”
  • “Don’t use this function if you need a stable sort.”

0 voters

Why not just provide two functions: unstable_sort (which does not guarantee being inplace), and unstable_inplace_sort (which guarantees being inplace) ?

Whether unstable_sort just calls unstable_inplace_sort internally is an implementation detail, and we can change it at any time without breaking the API in case a better unstable non-in-place sorting algorithm is implemented in the future (which will probably be in place for some inputs, and out-of-place for others).

2 Likes

Today I’ve tried the unstable sort functions, and they seem rather nice, I’ve seen a small speedup of my code. :+1:

2 Likes