[pre-pre RFC] stable sorting building blocks in libcore


So… since I derailed #1884 with an offtopic discussion, I am opening this post to keep that here instead of there (sorry about that).

RFC: Refactor stable sort building-blocks into libcore

In a nutshell, std::sort is a stable sorting algorithm with O(N logN) complexity. It achieves this by allocating a memory buffer (currently of length N / 2 using a Vec<T>), and if allocation runs out of memory, it panics.

This RFC proposes refactoring some implementation details of std::sort out of libstd into libcore. Those details are “how to obtain the ideal length that a buffer should have”, and “how to stable sort a slice when given a buffer” (which can possibly have zero length). By doing this we achieve the following two goals:

  1. Allow easy, efficient, and correct stable sorting in no-std crates.
  2. Allow more efficient stable sorting in std crates.

Since this allows those who don’t have a heap or write no-std crates to stable sort with an user provided buffer that might live everywhere (the stack, static memory, …) or have zero length (in which case they get O(N log^2(N)) complexity instead of O(N log N)), and also those who call std::sort often enough (e.g. repeatedly in a loop) to notice the impact of the memory allocations, since now they can pre-allocate a buffer, and reuse it across stable sort calls.

Implementation Proposal

Currently, all the stable sorting algorithms in std (std::sort, std::sort_by, std::sort_by_key) just call std::merge_sort (private / not exported). It is both possible and trivial to refactor std::merge_sort into two functions (whose name can be bikeshed):

  • (&mut [T]).sort_with_buffer_by(buffer: &mut [T], comp: Comp) with O(N log^2(N)) complexity if the buffer has zero length, and O(N log N) complexity if the buffer’s length is >= (&[T]).sort_buffer_len():
  • (&[T]).sort_buffer_len() -> usize (currently (&[T]).len() / 2).

and then reimplement std::merge_sort on top of these in ~3 lines of code.

These functions do not allocate memory, and should be added to libcore (and made public).

Note that 99.99% of the work, which comprises writing a good stable sorting algorithm, is already done. This is just a minor refactor.

What this enables

It fulfills Goal #1, since now those writing no-std crates can trivially stable sort their slices. Whether they want to use a buffer of zero length, or a stack allocated buffer, or a buffer in static memory, is up-to-them.

It also fulfills Goal #2, since those writing std-crates and noticing the memory allocations in std::sort can trivially use the libcore functions with a pre-allocated buffer. Whether this buffer is just a Vec<T> with elements on the heap, a Box<[T]>, an array on the stack, static memory, or something else, is again, up to them. Writing small custom wrappers over the libcore functions becomes a ~3 liner.


We could add a couple of utility functions to libcore for symmetry with std::sort, e.g., non-allocating sort_with_buffer_by_key and friends, and maybe even a convenience “non-allocating” stable sort with the same interface as std::sort that uses a small buffer on the stack.

Note that we don’t need to provide these niceties, writing them is trivial, and users grabbing this low probably have special requirements anyways and want to write their own wrappers, but we could provide some of them in the future.


Explosion of sorting functions in std and core.


Do nothing: That is, expect users to implement their own correct and performant stable sorting algorithm to work around this limitations without leveraging any of the work that has already been done to improve the stable sort implementation in std. While this is always a fun exercise, it is also a titanic task, and not everyones’ kind of FUN.