I’m very skeptical of this claim. How would you significantly speed up unstable sort with a buffer?
That’s the right attitude!
How would you significantly speed up unstable sort with a buffer?
So I just checked because I had in mind that some c++ sort implementations do this, but in fact none of the unstable ones do so I was mistaken. The stable sort implementations in libc++, libstdc++, and MSVC do allocate memory (using
std::get_temporary_buffer for merge sort) but the unstable sort ones do not allocate memory. Sorry about the noise.
Everything you said applies to stable sort. The reason
std::get_temporary_buffer exists is because it was believed that operating systems would have a way of allocating memory based on memory pressure, such that if you asked for 50MB and that request would end up being slow, but 20MB could be returned cheaply, then you’d get 20MB of memory instead.
Anyways, that never happened, but the
stable_sort implementation is required to work whether it gets a temporary buffer or not and there’s no real reason that the sort implementation in either standard or core can’t use that strategy.
The faster sort PR (https://github.com/rust-lang/rust/pull/38192) showed its improvement with a bunch of data sets. Will those be the right ones to demonstrate perf of this too, or are there other cases because of the different algorithm?
Should this RFC talk about what kind of quicksort is expected? Single/double pivot? Fat partition because
cmp is ternary?
I’d call it
inplace_sort(), in case a non-allocating stable sort is used in the future.
Those data sets are mostly okay, but I have plans to expand the set of benchmarks.
Yes, this is something worth discussing in the RFC.
The short story is: pdqsort gains massive speedups purely from block partitioning. Double pivoting gives us much smaller gains and is incompatible with block partitioning, so it goes out the window. The same thing applies to fat partitioning. But don’t worry, pdqsort does not suffer if the input slice contains a lot of duplicates.
But don’t worry, pdqsort does not suffer if the input slice contains a lot of duplicates.
Rather, the opposite happens. pdqsort is both O(n log n) and O(nk), where k is the number of unique elements you’re sorting. Both are upper bounds, so when k is fixed, pdqsort is actually linear. This shows in the ‘shuffled 16 values’ benchmark from my github repo:
I think there may be a desire for really fast sorting and sometimes a requirement for non-allocating sorting, but I can’t remember ever hearing anyone express a desire or requirement for unstable sorting. Usually, the requirement is for stable sorting or is indifferent to stability. Hence, I agree with
So I think the naming bikeshed boils down to this poll:
Which of these is more important to express in the name of the function?
- “Use this function if you need an in-place sort.”
- “Don’t use this function if you need a stable sort.”
Thanks for making this poll!
I voted “don’t use this function if you need a stable sort”. My reasoning is that it’s more important to express what you lose when departing from the default (you lose stability), rather than expressing one of the benefits (being in-place, in addition to better performance). Especially so because there is just one drawback and multiple benefits. Moreover, the stable/unstable dichotomy is surely familiar to C++ users.
By the way, an RFC is already submitted so you may want to participate in the discussion there:
Why not just provide two functions:
unstable_sort (which does not guarantee being inplace), and
unstable_inplace_sort (which guarantees being inplace) ?
unstable_sort just calls
unstable_inplace_sort internally is an implementation detail, and we can change it at any time without breaking the API in case a better unstable non-in-place sorting algorithm is implemented in the future (which will probably be in place for some inputs, and out-of-place for others).
A PR that implements unstable sort in libcore has been submitted: https://github.com/rust-lang/rust/pull/40601
Today I’ve tried the unstable sort functions, and they seem rather nice, I’ve seen a small speedup of my code.