Pseudo RFC? Interleave Iterator


#1

Hey all, I’ve been directed to move this here since it doesn’t need a full-blown RFC.

I’m adding an interleaving iterator to the set of core iterator adaptors. That is a_iter.interleave(b_iter) yields a1,b1,a2,b2,... until one terminates.

There’s no easy way to accomplish this with the current set of iterator adaptors, and there are enough subtle corner cases that I wouldn’t expect someone to trivially implement it correctly when they need it.

Off the top of my head, this is desirable for when something expects a single iterator, but you want to interleave the contents of multiple iterators. For instance, if you wanted to create a stream of loosely unsorted data, you could do something like sorted.interleave(sorted.rev())

I’ve actually fully implemented this in this pull request, this is sort of a retroactive RFC-ish thing to see if this is in fact desirable to have in libcore at all.

My biggest open question on the whole thing is that it’s possible to have interleave not terminate when any one of the source iterators terminates, but rather just output the rest of the remaining iterator. This would increase complexity, and is not clearly desirable. In this vein, Interleave could potentially be augmented to provide all kinds of combinations of behaviour, such as offsetting the splicing of the iterators (a1, a2, a3, b1, a4, b2, a5, b3, ...) or unbalanced usage of the sources (a1, a2, b1, a3, a4, b2...).

Would we want any of these in addition or instead of my current default behaviour?


#2

zip also doesn’t continue to yield after any of the iterators is exhausted, so I think it’s fine if interleave doesn’t. This seems like a fine adapter to me. Anything more complex than the simple interleave shouldn’t be in core, they seem incredibly niche.


#3

I think there has to be a general approach that would fit the current set of iterator adaptors. Interleaving is flattening after transposition. Perhaps all we need is a new method, transpose.

// a, b, c: Vec<T>
a.iter().zip(b.iter()); // move_iter can't work on tuples or arrays...

(a.iter(), b.iter(), c.iter()).transpose()
                                  // -> Iterator<Iterator<&T>>
                              .flat_map(|it| it)
                                  // -> Iterator<&T>

transpose could be implemented for a trait IteratorTuple3<T, T, T> = Tuple3<Iterator<T>, Iterator<T>, Iterator<T>>.


#4

I’m generally in favor of interleave. I hadn’t considered allowing it to yield the remaining elements after one iterator is exhausted. That actually may be useful. There have in fact been multiple times where I wish you could at least test what caused Zip to stop yielding elements, though I don’t know a good way to do that (short of having the iterator yield Option<(Option<A>, Option<B>)> which isn’t very good).

To allow such behavior, perhaps just a method .chain_remainder() that changes the behavior to only stop once both iterators are exhausted. A boolean flag would be sufficient for the next() impl, but the downside is this may significantly complicate the impls of the other methods. This could also yield a separate type entirely, which would leave the methods simpler but would still require writing a lot of code similar to the current code. Because of this, I think it’s reasonable to not do this now, but I wouldn’t rule out adding such a method down the road if someone has a good use for it.


#5

What I miss is the ability to create iterators in a flat_map that yield a (short) sequence of values. This makes it easy to write things like interleaving on the fly. zip, flat_map, then creating an iterator that converts the pair into a sequence of two values we’re there. The naive approach of creating a fixed-size array in the flat_map, then iterating over it doesn’t work due to lifetimes issues. I’m guessing that a move_iter off a fixed-size array is all that’s needed to make this work.


#6

Note that somehow transforming the contents of a zip will never work here.

Given:

let a = [1,3,5,6]
let b = [2,4]

Zip will only yield 1-4, but interleave should yield 1-5 (as currently defined).


#7

something related that would be nice IMO is “intersperse” e.g. pretty printing values interspersing a separator. (is there an existing elegant way of doing that?) eg something like [1,2,3,4].iter().map(|x|x.to_str()).intersperse(",").concatenate() == “1,2,3,4”


#8

I like interleave without any additional behavior. I’m in favor of adding it to libcore.

Note that somehow transforming the contents of a zip will never work here.

Good point. But a zip might work after extending the second iterator…

a.zip(b.map(Some).chain(Some(None).move_iter())).flat_map(...)

something related that would be nice IMO is “intersperse”

Something like .interleave(Repeat::new(",")) should work… almost, producing "1,2,3,4," with the trailing comma.


#9

If you have exact size, you can do it perfectly:

fn intersperse <T:Clone, I: ExactSize<T>> (it:I, seperator:T) -> Interleave<I, Take<Repeat<T>>>{
    let len = it.len();
    it.interleave(Repeat::new(seperator).take(len-1))
}

#10

Actually, I think that the idea of intersperse being implemented in terms of interleave actually reveals why stopping whenever on of the two streams stop is so useful: it lets you use infinite streams.

However, I wonder if we need not consider the case of more than 2 streams, chaining interleave calls does not work, it seems:

a.iter().interleave(b.iter()).interleave(c.iter())

will produce a stream:

a0, c0, b0, c1, a1, c2, b2, c3, ...

which differs from what interleave(a.iter(), b.iter(), c.iter()) would produce:

a0, b0, c0, a1, b1, c1, a2, b2, c2, ...

Thus, I do wonder if interleave should be added to the iterator trait, or if it would be better to instead define a free function that would accept any number of iterators and produce an interleaving iterator out of them ? Of course, the biggest issue would be the “any number of iterators”.


#11

Oh, interleave has been formally rejected from Rust (as of last night) for being not sufficiently useful to justify supporting it as part of std.


#12

Maybe we should make a separate cargo crate for extra iterator tools

I’ve got a cartesian product (IterA x IterB) implementation that would be useful sometimes, and macros for arbitrary product (IterA x IterB x … x IterZ), and an iterator comprehension macro.


#13

Blake, if you want to make a library, you’re more than free to take the code I moved to: https://github.com/Gankro/rust/tree/interleave

licensed under Rust’s license, I guess.


#14

I’m working on the library now just for fun. I’m collecting all the iterator stuff I have in various files. I do have an interleave impl from 2013, and it’s very similar (except it runs both iterators to the end). Don’t be offended but I’m going to start with my code even for interleave. (Now pushed as rust-itertools)

I guess I’ll come back to your impl to look at the next_back impl.


#15

No worries man, have fun. I only wrote this to learn.


#16

@matthieum Exactly my thoughts, your interleave(a.iter(), b.iter(), c.iter()) would have been (a.iter(), b.iter(), c.iter()).interleave() in my example, because flattening zip/transpose has slightly different behavior, as pointed out.

I think remainder handling is the only distinction between intersperse / flattened zip / interleave / interleave with chain_remainder.


#17

Something like this would be super useful for things like fmt::Show implementations for collection-like things. I’ve really missed it in the past, for example:

impl<T: fmt::Show> fmt::Show for RingBuf<T> {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        try!(write!(f, "["));
        for (i, e) in self.iter().enumerate() {
            if i != 0 { try!(write!(f, ", ")); }
            try!(write!(f, "{}", *e));
        }
        write!(f, "]")
    }
}

Stuff like this litter the standard libs. Ick.


#18

Maybe we could have a std::io helper function that looks like

fn write_sequence<T: fmt::Show, It: Iterator<T>, W: Writer>(w: &mut W, mut it: It, sep: &str) -> IoResult {
    for (i, e) in it.enumerate() {
        if i != 0 { try!(write!(w, ", ")) }
        try!(write!(w, "{}", e));
    }
    Ok(())
}

This way you can say write_sequence(f, self.iter(), ", ") (this differs subtlety from your function in that it writes e instead of *e but &T should render the same as T; alternatively it could take an Iterator<&T> instead and show *e).

Although you still need to map IoResult to fmt::Result, but I guess that’s not hard.


#19

That’s not to say that interleave wouldn’t be helpful here, though I don’t think it actually would work right for this. It requires both iterators have the same element type (whereas you want to mix arbitrary T: fmt::Show and &str elements), and assuming you interleave with an infinite cycle of ", " elements it would actually print a trailing ", " after the last element.