Applying `take` on `Cycle` iterator should implement `ExactSizedIterator`

Applying `take()` on an infinite iterator should make it exactsizediterator is related.

Take<Repeat>> already implements it, so Take<Cycle should as well.

Take<Cycle> returns 0 elements if the inner iterator has 0 elements, even when you try to take more.

2 Likes

I guess it would still work for Take<Cycle<impl ExactSizeIterator>>.

1 Like

Unless the Clone implementation doesn't actually produce an equivalent iterator. However, that does already break Cycle's expected behaviour.

playground (defining an I: ExactSizeIterator so that Cycle<I> produces a handful of elements, but no more)

“Evil” implementations don’t matter, as none of this is relevant beyond logic errors. Also, ExactSizeIterator only requires (not for soundness, but for correctness) that .size_hint() returns a result of the form (l, Some(l)), where lower and upper bound match and upper bound is not None. Everything else (i.e. the bounds being correct themselves) is already a requirement (not for soundness, but for correctness) of Iterator itself.

And looking at the .size_hint() implementation of Cycle<I> already

pub struct Cycle<I> {
    orig: I,
    iter: I,
}

impl<I> Iterator for Cycle<I>
where
    I: Clone + Iterator,
{
    fn size_hint(&self) -> (usize, Option<usize>) {
        // the cycle iterator is either empty or infinite
        match self.orig.size_hint() {
            sz @ (0, Some(0)) => sz,
            (0, _) => (0, None),
            _ => (usize::MAX, None),
        }
    }
    …
}

this implementation does:

  • return (0, Some(0)) when the original iterator (passed to Iterator::cycle) returns size (0, Some(0))
  • return (usize::MAX, None) when the original iterator returns some size (l, u) with lower bound strictly greater than 0, i.e. l > 0
  • return (0, None) in the remaining case, lower-bound is 0, upper-bound is non-0

For a correct ExactSizeIterator inner iterator, the last case should never happen, so this boils down to

  • return (0, Some(0)) when the original iterator has .len() == 0
  • return (usize::MAX, None) when the original iterator has .len() > 0

Now, wrap with Take:

pub struct Take<I> {
    iter: I,
    n: usize,
}

impl<I> Iterator for Take<I>
where
    I: Iterator,
{
    fn size_hint(&self) -> (usize, Option<usize>) {
        if self.n == 0 {
            return (0, Some(0));
        }

        let (lower, upper) = self.iter.size_hint();

        let lower = cmp::min(lower, self.n);

        let upper = match upper {
            Some(x) if x < self.n => Some(x),
            _ => Some(self.n),
        };

        (lower, upper)
    }
    …
}

whose size_hint does essentially simply

  • return (min(n, l), min(n, r)) for an inner iterator size-hint of (l, r) if we interpret r == None like r == ∞
    • the actual implementation does short-circuit this for n == 0 because it saves the inner size_hint call
    • the actual implementation has to handle the None case manually, since the Option API does not offer a way to interpret None like some kind of

Applying this to the two possible outcomes of Cycle<impl ExactSizeIterator>:

  • the size-hint of Take always maps (0, Some(0)) to (0, Some(0)
    • trivially true for the short-circuiting case
    • cmp::min(lower, self.n) is always 0 for lower == 0
    • the match upper … for upper == Some(0) either results in Some(0) when 0 < self.n or in Some(self.n) if self.n == 0, but that’s also Some(0)
  • the size-hint of Take always maps (usize::MAX, None) to (self.n, Some(self.n))
    • trivially true for the shirt-circuiting case, as that’s the self.n == 0 case, so (0, Some(0) is (self.n, Some(self.n))
    • cmp::min(lower, self.n) is always self.n for lower == usize::MAX
    • the match upper … case for upper == None takes _ => branch for None producing Some(self.n)

Combining these facts characterizes the full .size_hint() behavior of Take<Cycle<I>> for (correct) contained I: ExactSizeIterator, namely:

  • return (0, Some(0)) when the original iterator has .len() == 0
  • return (self.n, Some(self.n)) when the original iterator has .len() > 0

all that really matters to us though is simply that both cases do have the form (l, Some(l)) i.e. lower and upper bound are the same and upper bound is not None.


The assumption that a correct “Clone” should preserve the size_hint is already relied upon for the correct implementation of size_hint of Cycle.

It looks like this assumption isn’t properly documented, as far as I could tell; perhaps someone should add this to the docs of fn size_hint? :thinking: Or perhaps on the whole trait or module; I believe the common practical assumption, with API such as Cycle is that .clone()ing an iterator should generally return another iterator that will produce the “same” list of items.

It’s certainly something that can be relevant for API design, e.g. a receiver iterator for a mpmc channel should not implement Clone because what you can get (by sharing or cloning a Receiver) is two iterators that share the load of items (in the sense that one receives exactly those items the other doesn’t receive). Whereas I suppose a broadcast channel could consider this[1].


  1. depending on the exact flavor of broadcast channel implementation ↩︎

5 Likes

Great explanation! Very well thought out. It sounds like we can add

impl<T: ExactSizedIterator> ExactSizedIterator for Take<Cycle<T>>

2 Likes