Doesn't that have wildly different performance characteristics?
I'd expect .collect<BTreeSet<_>>()
to be terrible for cache locality, since it allocates a node for every element in your array.
Doesn't that have wildly different performance characteristics?
I'd expect .collect<BTreeSet<_>>()
to be terrible for cache locality, since it allocates a node for every element in your array.
BTreeSet
doesn't allocate nodes per value, but for some array of values, so it's not bad for the cache. Yes, a Vec<_>
is better, but in this case I don't think it would be too bad.
From the BTreeMap
docs (BTreeSet<T>
is a thin wrapper around BTreeMap<T, ()>
)
A B-Tree instead makes each node contain B-1 to 2B-1 elements in a contiguous array. By doing this, we reduce the number of allocations by a factor of B, and improve cache efficiency in searches.
Also, a BTreeSet
can't be trivially converted to sorted slice. So there are use cases where Vec.sorted()
is strictly better.
Theoretically an iterator adapter could be slightly better by first incrementally quicksort-partitioning incoming elements, but OTOH it wouldn't be able to pick a good pivot element, so that partitioning may be terrible. So in the end, perhaps .sorted()
just on the Vec
is not too bad.
In https://lib.rs codebase I use this pattern quite often:
let mut tmp: Vec<_> = iter.collect();
tmp.sort_by(compare_floats);
tmp.truncate(10);
Theoretically for top-N sorted elements it's possible to optimize the sort and avoid most of the work on discarded elements. Rust currently doesn't have anything for this.
Maybe there could also be .sorted_and_truncated(n)
? Or if sorted()
was an iterator itself, then magically specialized sorted().take(n)
would be awesome.
You can do lazy quick sort to achieve O(N) complexity here, but a faster solution is to do one pass with a binary heap of fixed size. This would look like this:
let mut top = BoundedBinaryHeap::new(10);
top.extend(iter):
top.into_sorted_vec()
One way to get first-k-of-n in O(n + k log(n))
is to use BinaryHeap::from(the_vec).into_sorted_iter()
, since that from is O(n). (I don't remember if into_sorted_iter()
actually got added; if not then it's iter::from_fn(|| the_heap.pop())
.)
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.