Some thoughts on Generators and For-Loops

Edit: I am no longer as convinced as I used to be of the utility of for-loops over generators with resume arguments. But I still think this idea I introduce of stage 1 vs. stage 3 generators is important.

Original post follows...

There are issues with getting Rust's generators to work well with for-loops. In this post, I would like to explain my view of what generators really are, and show how that sheds light on the problem. I hope this helps!

The problem

We want for-loops to work with generators:

for val in gen {
    // Loop Body
}

How should this be desugared?

let mut arg = ???;
loop {
    match gen.resume(arg) {
        GeneratorState::Yielded(val) => {
            arg = {
                // Loop Body
            };
        }
        GeneratorState::Complete(_) => break;
    }
}

The first time through the loop, we don't have an argument to pass to resume!

What is a generator, really?

In my view, a generator is a cycle of four stages.

               Arg
        ╔═══╗   │   ╔═══╗
        ║ 1 ║ <─┴── ║ 4 ║
        ╚═══╝       ╚═══╝
          │           Ʌ
Return <──┤           │
          V           │
        ╔═══╗       ╔═══╗
        ║ 2 ║ ──┬─> ║ 3 ║ ────> Cancel
        ╚═══╝   V   ╚═══╝
              Yield

Stage 1

The generator might complete.

fn stage1(Stage1) -> Result<Stage2, Return>;

Stage 2

The generator yields a value.

fn stage2(Stage2) -> (Stage3, Yield);

Stage 3

You may choose to cancel the generator, consuming it.

fn stage3(Stage3) -> Stage4;
fn cancel(Stage3) -> Cancel;

Usually, Cancel will be (), and cancel will be equivalent to drop. But I could imagine Cancel containing some of the internal state of the generator.

Stage 4

You must provide a value to the generator.

fn stage4(Stage4, Arg) -> Stage1;

There is a nice duality here:

  • In stage 2, the generator gives you a value. In stage 4, you give it a value.
  • In stage 1, the generator might decide to stop. In stage 3, you might decide to stop the generator.

In fact, this duality is how I discovered stage 3.

The problem

I claim that the problem is this:

  • For loops want to work on stage 1 generators.
  • The current Generator trait describes stage 3 generators.

Loop

Desugaring of a for-loop over a stage 1 generator:

let mut generator: Stage1 = ...;
for y in generator {
    // loop body
    ...
    if ... {
        break;
    } else {
        continue arg;
    }
}
let mut generator: Stage1 = ...;
loop {
    match stage1(generator) {
        Ok(s2) => { 
            let (s3, y) = stage2(s2);

            // loop body
            ...
            if ... {
                let _: Cancel = cancel(s3);
                break;
            } else {
                generator = stage4(stage3(s3), arg);
                continue;
            }
        }
        Err(_) => {
            break;
        }
    }
}

Current Generator trait

Here is part of the current definition of Generator:

trait Generator {
    ...
    /// If `Complete` is returned then the generator has completely finished
    /// with the value provided. It is invalid for the generator to be resumed
    /// again.
    ...
    fn resume(self: Pin<&mut Self>, arg: R) -> GeneratorState<Self::Yield, Self::Return>;
}

Compare that to this function on Stage3:

fn resume(s: Stage3, arg: Arg) -> Result<(Stage3, Yield), Return> {
    let s: Stage4 = stage3(s);
    let s: Stage1 = stage4(s, arg);
    let s: Stage2 = stage1(s)?;
    Ok(stage2(s))
}

In both cases, the function acts on a generator, and returns either a Yield or a Return. If you get a Yield, you get the generator back. If you get a Return,

  • In the Stage3 implementation, you don't get the generator back.
  • In the current Generator, you get it back but it is now invalid.

Conclusion

A generator is a cycle of four stages. for-loops make sense over stage 1 generators. The Generator trait describes stage 3 generators. This mismatch is the cause of the trouble.

P.S.

(A collection of random notes)

  • Personally, I would like to see a generator API that provides types for both stage 1 and stage 3 generators. Maybe WaitingGenerator and ReadyGenerator? But I don't know if that is possible, because I don't fully understand Pin.

    • If it isn't, I think we should carefully consider the pros and cons of providing stage 1 vs stage 3 generators.

    • I don't think stage 2 or 4 generators are useful enough to warrant inclusion in the API.

  • Current generator syntax defines stage 3 generators, but this could probably be changed.

  • I would be fine with leaving Cancel out of the design. The fact that we can freely drop things removes much of its utility. And I don't see a nice way to integrate it into the generator syntax.

  • A for-loop over a stage 2 generator also makes sense, and the body of the loop is guaranteed to run at least once.

  • If you know linear logic / linear type theory, you might like this description of the generator cycle:

    • Stage1 = Stage2 ⊕ Return
    • Stage2 = Stage3 ⊗ Yield
    • Stage3 = Stage4 & Cancel
    • Stage4 = Stage1 ⅋ Arg^⊥
  • There is also another for-loop issue -- the question of how to extend the for syntax to not throw away the Return value.

    • I like the suggestion that the for-loop should return the final result of the generator, because I want to be able to call generators from within other generators with for y in gen { yield y }.
  • This is crossposted from my post on the Rust subreddit, with a few modifications.

    • It is also my first post on this forum.
1 Like

Why not just assume the args is ()? I can't imagine that wanting resume args in an iterator generator will be very common, and if the user does want that, they can write some sort of adapter that does that for them.

1 Like

Nearly any use of generators, including those with arguments, will end in iterating over them. I don't want that to require this boilerplate:

let mut tmp = ...;
loop {
    match (yield gen.resume(tmp)) {
        Yielded(y) => tmp = { /* Loop Body */ },
        Complete(r) => break r;
    }
}

As an important special case, suppose you are dealing with a complicated generator:

|arg: Arg| {
    // Lots of code
    while ... {
        // Lots of code
    }
    // Lots of code
}

You might want to refactor so that that while-loop is in its own generator. Currently, that requires the above boilerplate. Even if there were a map function for generators, you couldn't use it here, because the loop body might call yield!

If generators were compatible with for-loops, you could just say this:

|arg: Arg| {
    // Lots of code
    for y in other_generator {yield y}; // other_generator contains the `while`-loop
    // Lots of code
}

Personally, I really wouldn't expect the "return" value of a for loop to be the value passed to the generator's resume argument (especially in case where it's not obvious that the object being iterated is a generator).

Honestly, I don't really see any use case where the syntax you suggest would be idiomatic or natural.

1 Like

I would expect people to invent unexpected uses for generators, especially if they are eventually stabilized. For instance, I am currently experimenting with modeling a game loop with Generator<Event, Yield = (), Return = !>.

I got the idea from https://internals.rust-lang.org/t/pre-rfc-generator-integration-with-for-loops/6625/11.

But that is a fair criticism. I'll have to think of some examples before I decide whether I agree. I do think there should be some nice way to iterate over generators with return arguments, though. And a for_each combinator isn't enough, because we may want to yield in the body of the loop.

But I honestly think that the stuff about for-loops is the less-interesting part of my post. The more interesting part is the idea that there are different kinds of generators (stage 1 and stage 3), and that they are useful in different circumstances.

I might be misunderstanding something critical here, so I apologize in advance.

Assuming the following trait implementation:

impl<T> Iterator for Generator<R = (), Yield=T, Return=()> {
    type Item = T;
    
    fn next(&mut self) -> Option<T> {
        match self.resume() {
            GeneratorState::Yielded(value) => Some(value),
            GeneratorState::Complete(_) => None
        }
    }
}

...and a hypothetical shorthand for defining generators as standalone functions along the lines of:

gen fn evens() -> i32 {
    for i in 0..20.filter(|i| i % 2 == 0) {
        yield i;
    }
}

gen fn odds() -> i32 {
    for i in 1..20.filter(|i| i % 2 == 1) {
        yield i;
    }
}

Why wouldn't the following code work?

for i in evens() {
    for j in odds() {
        dbg!(i, j);
    }
}

I would expect people to invent unexpected uses for generators, especially if they are eventually stabilized. For instance, I am currently experimenting with modeling a game loop with Generator<Event, Yield = (), Return = !> .

But that is a fair criticism. I'll have to think of some examples before I decide whether I agree. I do think there should be some nice way to iterate over generators with return arguments, though. And a for_each combinator isn't enough, because we may want to yield in the body of the loop.

I am really wary of trying to find an ideal, highly general solution for this use-case when we can solve concrete issues today, allowing people to easily create streams and iterators without needing manual implementations or interacting with Pin. I'd really hate for perfect to become the enemy of good, especially when more exotic variations on generators could be explored in libraries with macros.

1 Like

This makes sense.

One specific issue I worry about is refactoring complex generators into smaller pieces. As an example, let's take my experiment where I model a game loop as a Generator<Event, Yield = (), Return = !>. Here is some sample code:

let game_loop = |mut event| {
    loop { // Game Loop
        match event {
            Key("e") => {
                loop { // Inventory Loop
                    match (yield) {
                        Key("e") => break,
                        ...
                    }
                }
            }
            ...
        }
        event = yield;
    }
};

I would like to factor the inventory loop into its own generator:

let inventory_loop = |mut event| {
    loop { // Inventory Loop
        match event {
            Key("e") => break,
            ...
        }
    }
    event = yield;
}

let game_loop = |mut event| {
    loop { // Game Loop
        match event {
            Key("e") => {
            	while let GeneratorState::Yielded(()) = inventory_loop.resume(yield) {}
            }
            ...
        }
        event = yield;
    }
};

... ... ...

Ok, that's nowhere near as much boilerplate as I was expecting. Never mind.

Also, "removing boilerplate" is the kind of design problem that should really be solved last, when a working implementation has been shipped and developers have used it in various real-life context so we can get a good model of what the boilerplate actually is.

2 Likes

Glad to hear! It's always fun to realize you can get away with the "ugly" solution, and besides, the "ugly" solution is rather nice!

+1 to that. I think about the canonical example of syntax simplification/reduction in Rust, is the ? operator. For those unfamiliar, Rust went from matching on a result, to the try! macro, to the ? operator. Trying to come up with a nice syntax for resuming a generator feels a bit like trying to come up with the ? operator before matching on a result was even supported.


I'm sure others can weight in on this, but the main thing blocking the stabilization of some subset of functionality with generators as outlined in withoutboats's "async interview" is a champion to push through the RFC + getting this feature implemented in the compiler?

Also note that you could use the while let syntax for generators if you don't care about the return value

while let GeneratorState::Yielded(item) = generator.resume(argument) {
}

I should mention that while I no longer care about the for-loop issue, I still like the idea I outlined about the different stages of the generator cycle. I'm thinking it may have been a mistake to present it as I did, because all of the focus ended up on the for-loop syntax, rather than the interesting part.

I mean, I don't really get how that's a better abstraction than "stage 1 is when it's yielding something, stage 2 is when it returns something".

As you point out yourself, the "cancel" value is usually empty (unless we want to implement cancel tokens, but I don't think resume arguments are relevant then); and if the "return" value is empty (or bottom), then you essentially have a regular iterator.

I don't get what your abstraction brings in concrete terms that iterators don't have.