Doesn't that have wildly different performance characteristics?

I'd expect `.collect<BTreeSet<_>>()`

to be terrible for cache locality, since it allocates a node for every element in your array.

Doesn't that have wildly different performance characteristics?

I'd expect `.collect<BTreeSet<_>>()`

to be terrible for cache locality, since it allocates a node for every element in your array.

`BTreeSet`

doesn't allocate nodes per value, but for some array of values, so it's not bad for the cache. Yes, a `Vec<_>`

is better, but in this case I don't think it would be too bad.

From the `BTreeMap`

docs (`BTreeSet<T>`

is a thin wrapper around `BTreeMap<T, ()>`

)

A B-Tree instead makes each node contain B-1 to 2B-1 elements in a contiguous array. By doing this, we reduce the number of allocations by a factor of B, and improve cache efficiency in searches.

1 Like

Also, a `BTreeSet`

can't be trivially converted to sorted slice. So there are use cases where `Vec.sorted()`

is strictly better.

Theoretically an iterator adapter could be slightly better by first incrementally quicksort-partitioning incoming elements, but OTOH it wouldn't be able to pick a good pivot element, so that partitioning may be terrible. So in the end, perhaps `.sorted()`

just on the `Vec`

is not too bad.

In https://lib.rs codebase I use this pattern quite often:

```
let mut tmp: Vec<_> = iter.collect();
tmp.sort_by(compare_floats);
tmp.truncate(10);
```

Theoretically for top-N sorted elements it's possible to optimize the sort and avoid most of the work on discarded elements. Rust currently doesn't have anything for this.

Maybe there could also be `.sorted_and_truncated(n)`

? Or if `sorted()`

was an iterator itself, then magically specialized `sorted().take(n)`

would be awesome.

You can do lazy quick sort to achieve O(N) complexity here, but a faster solution is to do one pass with a binary heap of fixed size. This would look like this:

```
let mut top = BoundedBinaryHeap::new(10);
top.extend(iter):
top.into_sorted_vec()
```

6 Likes

One way to get first-k-of-n in `O(n + k log(n))`

is to use `BinaryHeap::from(the_vec).into_sorted_iter()`

, since that from is O(n). (I don't remember if `into_sorted_iter()`

actually got added; if not then it's `iter::from_fn(|| the_heap.pop())`

.)

2 Likes