Order of evaluation for zip iterator


#1

The current documentation for std::iter::Iterator::zip is unclear about the order in which the two iterators are queried:

Creates an iterator that iterates over both this and the specified iterators simultaneously, yielding the two elements as pairs. When either iterator returns None, all further invocations of next() will return None.

If I had to guess from this documentation, I’d say that Zip's next calls next on both its iterators, then returns None if either is None. In fact, it only queries the second iterator if the first is Some(_):

    #[inline]
    fn next(&mut self) -> Option<(A::Item, B::Item)> {
        match self.a.next() {
            None => None,
            Some(x) => match self.b.next() {
                None => None,
                Some(y) => Some((x, y))
            }
        }
    }

Is this behavior intentionally unspecified?

The order can matter if the sequences being zipped are unbalanced in length. Currently, if the first sequence is longer, then the Zip iterator discards an item each time its next is called. If the second sequence is longer, no items are discarded.

core::fmt::write exploits this behavior by iterating over arguments zipped with string pieces. When the zipped iteration completes, it retrieves the trailing piece using the underlying piece iterator. I did a quick scan of the standard library, but I wasn’t able to find any other obvious instances.


#2

So you propose to add “the underlying iterators will be queried in order of appearance. The iteration will stop on encountering the first None.” to the documentation?


#3

Yes, something like that would work.


#4

Technically it is specified that it is unspecified.

The Iterator protocol does not define behavior after None is returned. A concrete Iterator implementation may choose to behave however it wishes, either by returning None infinitely, or by doing something else.


#5

In that case the behavior of core::fmt::write is a little dangerous, isn’t it?


#6

do you have an example code to explain that? I’m afraid I don’t understand the implications


#7

Link to the core::fmt::write code. The code does something like this:

fn main() {
    let args = [1, 2];
    let pieces = ["a", "b", "c"];
    let mut it = pieces.iter();
    for (arg, piece) in args.iter().zip(it.by_ref()) {
        print!("{:?} ", (arg, piece));
    }
    println!("trailing={:?}", it.next());
}

It outputs: (1, "a") (2, "b") trailing=Some("c")

If the order of the Zip inputs are swapped:

    for (piece, arg) in it.by_ref().zip(args.iter()) {

then it outputs (1, "a") (2, "b") trailing=None instead.


#8

No, it’s using specific iterators it knows to be sane (slice iterators). It’s not a generic context.


#9

I have already reported an issue for take_while: https://github.com/rust-lang/rust/issues/22802

It’s along the same lines. Passing stuff via by_ref can yield confusing results… Maybe by_ref should be documented better, but the issue also happens if you create a reference to an iterator by hand…

Sadly I assume it’s too late to suggest an overhaul for IteratorExt. Not that I’d know how to improve the situation really, since you can always implement Iterator for references to your own types and then have the issue again.