Doesn't that have wildly different performance characteristics?
I'd expect .collect<BTreeSet<_>>() to be terrible for cache locality, since it allocates a node for every element in your array.
Doesn't that have wildly different performance characteristics?
I'd expect .collect<BTreeSet<_>>() to be terrible for cache locality, since it allocates a node for every element in your array.
BTreeSet doesn't allocate nodes per value, but for some array of values, so it's not bad for the cache. Yes, a Vec<_> is better, but in this case I don't think it would be too bad.
From the BTreeMap docs (BTreeSet<T> is a thin wrapper around BTreeMap<T, ()>)
A B-Tree instead makes each node contain B-1 to 2B-1 elements in a contiguous array. By doing this, we reduce the number of allocations by a factor of B, and improve cache efficiency in searches.
Also, a BTreeSet can't be trivially converted to sorted slice. So there are use cases where Vec.sorted() is strictly better.
Theoretically an iterator adapter could be slightly better by first incrementally quicksort-partitioning incoming elements, but OTOH it wouldn't be able to pick a good pivot element, so that partitioning may be terrible. So in the end, perhaps .sorted() just on the Vec is not too bad.
In https://lib.rs codebase I use this pattern quite often:
let mut tmp: Vec<_> = iter.collect();
tmp.sort_by(compare_floats);
tmp.truncate(10);
Theoretically for top-N sorted elements it's possible to optimize the sort and avoid most of the work on discarded elements. Rust currently doesn't have anything for this.
Maybe there could also be .sorted_and_truncated(n)? Or if sorted() was an iterator itself, then magically specialized sorted().take(n) would be awesome.
You can do lazy quick sort to achieve O(N) complexity here, but a faster solution is to do one pass with a binary heap of fixed size. This would look like this:
let mut top = BoundedBinaryHeap::new(10);
top.extend(iter):
top.into_sorted_vec()
One way to get first-k-of-n in O(n + k log(n)) is to use BinaryHeap::from(the_vec).into_sorted_iter(), since that from is O(n). (I don't remember if into_sorted_iter() actually got added; if not then it's iter::from_fn(|| the_heap.pop()).)
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.