How to design a composable iterator adapter/consumer API?

This question origins from a question I put in user.rust-lang.org: How to count multiple filters of same iterator. To make it clear, it is common that there are multiple adapters and consumers for one iterator:

fn adapter1(iter: impl Iterator<Item=&Elem>) -> impl Iterator<Item=Elem1> {}
fn adapter2(iter: impl Iterator<Item=&Elem>) -> impl Iterator<Item=Elem2> {}
fn consumer1(iter: impl Iterator<Item=&Elem>) -> Vec<Elem3> {}
fn consumer2(iter: impl Iterator<Item=&Elem>) -> Elem4 {}

As they all just need &Elem, these adapters/consumers SHOULD be able to be driven with the input iter iterated EXACTLY ONCE (I call this pattern as composable iterator adapter/consumer). However, as answered in the link above, there is NO way to do this without changing the function implementation.

If we can change how function writes, what is the best way to define such composable iterator adapter/consumer? There are two approaches come to my mind:

Approach1: Convert iterator pull-based logic back to for-loop logic.

There may be a trait roughly like this:

enum ForLoopIteratorOutput<T> {
    Pending,
    Out(T),
    Finished,
}

trait ForLoopIterator<T> {
    type Item;
    fn on_new_item(item: T) -> ForLoopIteratorOutput<Item>;
}

In this format, we can easily compose these adapters/consumers. However, this requires large rewrite of current implementations.

Approach2: Use async/await logic.

Maybe we could use await every time we requires a new item from the input iterator, and a custom scheduler to execute the adapters/consumers one-by-one after each item is awaited. I don't know if this could work, or if this introduces extra runtime overhead.

In my opinion, both approaches are not good enough, and I wonder if we could design a good API for composable iterator adapter/consumer in std or itertools?

1 Like

You're listing 4 functions but aren't showing how you expect them to be used together. What should be the result?

If you want to build arbitrarily complex push-based forking and joining data pipelines you could use channels and threads. Or rayon.

If you just want to deal with two mismatched streams then maybe write an adapter that returns (Option<T>, Option<U>) that'll take one item from the source and then run it through the two adapters until they're exhausted and you get some mix of Ts and Us

Sorry for the ambiguity.

For example, I want to call sum() and count() to an impl Iterator<Item=&usize> to get its sum and item count with exactly one round. This could be easily done by using a for-loop style:

let mut sum = 0;
let mut count = 0;
for val in iter {
    sum += val;
    count += 1;
}

However, if it is not sum and count, but some complex iterator adaptors/consumers provided by other crates which could not be easily transformed to for-loop style, things become harder. Using multi-threads can handle this problem, but the overhead makes it not good enough.

write an adapter that takes one item from the source and then run it through the two adapters until they're exhausted

Sorry I could not understand how to do this, are there some examples?

You can look at existing adapters in itertools or the standard library. Peekable is a simple stateful one. Flatten is one that alters state between driving a source iterator and then getting items from the inner iterator.

However, if it is not sum and count, but some complex iterator adaptors/consumers provided by other crates

The issue here is that summing and counting are reducers/folders, not consumers or iterators. Those crates are simply providing the wrong API for what you want to do. Composable reduction could be done when provided as a simple binary operation.

Previous discussion: Adding Iterator::extrema, argmin, argmax - #4 by dlight

4 Likes

https://docs.rs/reductor/latest/reductor/ is indeed pretty neat! And should be more widely used.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.