Add a "Vec.sorted()" function

Doesn't that have wildly different performance characteristics?

I'd expect .collect<BTreeSet<_>>() to be terrible for cache locality, since it allocates a node for every element in your array.

BTreeSet doesn't allocate nodes per value, but for some array of values, so it's not bad for the cache. Yes, a Vec<_> is better, but in this case I don't think it would be too bad.

From the BTreeMap docs (BTreeSet<T> is a thin wrapper around BTreeMap<T, ()>)

A B-Tree instead makes each node contain B-1 to 2B-1 elements in a contiguous array. By doing this, we reduce the number of allocations by a factor of B, and improve cache efficiency in searches.

1 Like

Also, a BTreeSet can't be trivially converted to sorted slice. So there are use cases where Vec.sorted() is strictly better.

Theoretically an iterator adapter could be slightly better by first incrementally quicksort-partitioning incoming elements, but OTOH it wouldn't be able to pick a good pivot element, so that partitioning may be terrible. So in the end, perhaps .sorted() just on the Vec is not too bad.

In https://lib.rs codebase I use this pattern quite often:

let mut tmp: Vec<_> = iter.collect();
tmp.sort_by(compare_floats);
tmp.truncate(10);

Theoretically for top-N sorted elements it's possible to optimize the sort and avoid most of the work on discarded elements. Rust currently doesn't have anything for this.

Maybe there could also be .sorted_and_truncated(n)? Or if sorted() was an iterator itself, then magically specialized sorted().take(n) would be awesome.

You can do lazy quick sort to achieve O(N) complexity here, but a faster solution is to do one pass with a binary heap of fixed size. This would look like this:

let mut top = BoundedBinaryHeap::new(10);
top.extend(iter):
top.into_sorted_vec()
6 Likes

One way to get first-k-of-n in O(n + k log(n)) is to use BinaryHeap::from(the_vec).into_sorted_iter(), since that from is O(n). (I don't remember if into_sorted_iter() actually got added; if not then it's iter::from_fn(|| the_heap.pop()).)

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.