Currently, the .dedup()
method only exists on Vec
. But it can be implemented on any Iterator
as well: https://gist.github.com/lfairy/0f952345ddf07c26eabe
Much of the complexity in the implementation boils down to one issue: how do we yield values as early as possible? The naive algorithm consumes a whole run before yielding, which is elegant but falls flat on infinite sequences (e.g. repeat(1)
). It is also suboptimal when the underlying .next()
is slow, say when we’re streaming data from the network or from disk.
Why can’t we yield the first element straight away? Because then we’ll have no element to compare with on the next iteration, since the previous element has already been moved out.
There are a few ways to approach this problem:
-
.clone()
the value before we return it. This is nice and simple but requires aClone
bound and an extra copy. -
Use
Peekable
as a one-element buffer. We.peek()
at the next item before returning the current one, effectively doing the comparison one step in advance. This has the advantage of not requiringClone
, but the drawback of consuming an extra element. -
Return a reference to the element, rather than moving it out. This would be the ideal solution, but requires a
StreamingIterator
interface.
My implementation uses solution (2). It’s unlikely we’ll get the trait needed for (3), and copying is often more expensive than consuming an extra element.
I haven’t run benchmarks yet, but the Vec
version should be faster as it works in-place. We could write the in-place .dedup()
in terms of the iterator one – it’s safe because we yield fewer values than we consume – but I’m not sure if that’s worth the trouble.
So, any questions, comments? In particular, is an iterator .dedup()
worth adding to the standard library?