[Pre-RFC]: Generator integration with for loops

newpavlov · January 21, 2018, 10:10pm

Motivation

Although Generator was initially introduced with async story in mind, it can be also used for enhancing core for loop functionality. Integration of Generator with for loops looks quite natural:

for value in generator {
    // ..
}

Here we iterate over yielded values and discard return value of the generator.

Introduction of this feature will also prevent a lot of confusion for people coming from Python, as syntax of for loops and generators are quite close, thus it will be natural for them to try to use Rust Generator with for loops. Of course the big difference between Generator and Iterator is ability to return result in addition to yielding values. This can be again quite naturally incorporated as resulting value of for loop expression, so it will be possible to write:

let result = for value in generator {
    // ..
};

Which can make some use-cases much more pleasant, for example in data processing:

let metadata = for record in parse_file(f) {
    // process record
}?;

Here parse_file returns Generator<Yield=Record, Return=io::Result<Metadata>>. On first encountered error in data processing it will return io::Error which will be bubbled by ?. If everything went without problems all records will be processed and user can get some additional data about processed file (number of records, time of creation, authors, description, etc.).

let leftover_data = for chunk in data.array_chunks(4) {
    // process chunk which has type &'a [T; 4]
};

Here we use imaginary array_chunks(&self, const N: usize) -> Generator<Yield=&'a [T; N], Return=&'a [T]> method for slices which yields chunks in the form of array references and returns leftover data in the form of slice &'a [T], so it could be e.g. asserted to be empty

As will be showed later Generator is a natural generalization of Iterator trait, which can be perceived as Generator<Yield=T, Return=()>;. Also it will solve problem of breaking with value from for loops without introducing (arguably) ad-hoc else blocks.

Implementation

Currently this code:

for value in values {
    // body
}

Desugars into:

let mut iter = IntoIterator::into_iter(values);
loop {
    match iter.next() {
        Some(value) => {
            // body
        },
        None => break,
    }
}

To allow Generators to be used with for loops we’ll have to change it, so this code:

let result = for value in values {
    // body
};

Will be desugared into:

let generator = IntoGenerator::into_generator(values);
let result = loop {
    match generator.resume() {
        GeneratorState::Yielded(value) => {
            // body
        },
        GeneratorState::Complete(result) => break result,
    }
};

As it can be seen this desugaring allows breaking with values from for loop body with appropriate type restrictions (borrowing Rusky’s examples):

// iterator returns (); loop returns ()
let _: () = for x in iter { }
// iterator returns (); loop breaks with ()
let _: () = for x in iter { ... break ... }
// ERROR: iterator's type R=() and break's type S don't unify
let _: () = for x in iter { ... break s ... }
// generator returns R; loop returns R
let _: R = for x in gen { }
// generator returns R; loop can break with R
let _: R = for x in gen { ... break r ... }
// ERROR: generator's type R and break's type S don't unify
let _: R = for x in gen { ... break s ... }

To allow convenient value breaking for iterators we can design several helper methods which will explicitly convert iterator into generator, for example:

fn generator_default<T>(self, value: T) -> impl Generator<Yield= Self::Item, Return=T> { .. }

Which can be used as follows:

let result = for value in iterator.generator_default(0u32) {
    match value {
        Foo => foo(),
        Bar => break 10u32,
    }
}

To save backward compatibility with Iterator based for loops we’ll need to add the following support code to the standard library:

impl<T: Iterator> Generator for T {
    type Yield = T::Item;
    type Return = ();
    
    fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return> {
        match self.next() {
            Some(val) => GeneratorState::Yielded(val),
            None => GeneratorState::Complete(()),
        }
    }
}

trait IntoGenerator {
    type Yield;
    type Return;
    type IntoGen: Generator<Yield=Self::Yield, Return=Self::Return>;

    fn into_generator(self) -> Self::IntoGen;
}

// this line does not currently work
impl<T: Generator> !IntoIterator for T {}

impl<G: Generator> IntoGenerator for G {
    type Yield = G::Yield;
    type Return = G::Return;
    type IntoGen = G;
    
    fn into_generator(self) -> Self::IntoGen {
        self
    }
}

impl<T: IntoIterator> IntoGenerator for T {
    type Yield = T::Item;
    type Return = ();
    type IntoGen = <Self as IntoIterator>::IntoIter;
    
    fn into_generator(self) -> Self::IntoGen {
        self.into_iter()
    }
}

Unfortunately this code relies on implementation of mutually exclusive traits. (otherwise known as negative trait bounds)

Alternatives

else block

Previously discussed approach for adding breaking with value to for loops was to use the following syntax:

let r: R = for val in iter {
    // .. do stuff, which can break with type `R`
} else {
    // this block must always evaluate to type `R`, this value
    // will be used if `for` loop body didn't break with a value
}

(see also then proposal)

The big drawback of this approach is that we replace currently used simple desugaring of for loops with a significantly more involved construct understanding by compiler, which will have a negative effect on the compiler complexity. Also arguably this solution is quite ad-hoc in nature and can be surprising for new users.

It does not integrate generators with for loops, but should be considered as an alternative due to the poor compatibility with this proposal. Also it can be used in conjunction with the next alternative.

Converting generators to iterators by discarding result value

The easiest approach for generators integration with for loops will be to use blank implementation of Iterator for T: Generator with discarded result value. It automatically provides integration with for loops.

impl<T> Iterator for Generator<Yield=T, Return=()> {
    type Item = T;
    
    fn next(&mut self) -> Option<T> {
        match self.resume() {
            GeneratorState::Yielded(value) => Some(value),
            GeneratorState::Complete(_) => None
        }
    }
}

The big drawback of this approach is that users lose return value unconditionally, otherwise they’ll have to use explicit loop which will be a less ergonomic and pleasant experience.

Rationale

While two listed alternatives combined can be a working solution, they both provide ad-hoc solutions to narrow problems and combined provide less compact and expressive overall result. On the other hand the proposed solution naturally solves two problems: improving Generator ergonomics through for loop integration and breaking with value from for loops, without losing desugaring simplicity and without extending for loops with new blocks, although at the expense of support code amount and by introducing additional IntoGenerator trait.

Unresolved questions

We cover only Iterator -> Generator conversions, but it could be usefull (especially for interacting with Iterator based code) to do Generator -> Iterator conversion as well with discarding return result. Probably the best approach will be introduction of explicit helper conversion method for Generator.
Is there alternatives for using negative trait bounds? And if yes, should we use them instead of waiting? For example, specialization with lattice rules could be such alternative, and it looks like a lot closer to being implemented.
What about possible extension of Generator with Arg associated type for resume(arg: Self::Arg)? Should we do for loop integration only for Arg = () or should we add something like continue val expression? (Personally I am not sure if we need Arg at all)
Can we somehow make Iterator<Item=T> just an alias for Generator<Yield=T, Return=()> and change accordingly all associated traits without breaking backward compatibility?

canndrew · January 22, 2018, 3:27am

I’ve been thinking about this aswell. Some things I’d like to add:

We should support else blocks (or whatever we want to call them. I prefer then) and allow them to take an argument. This argument is the final result that the generator yielded.

let foo: u32 = for x in wow {
    ...
} then y {
    ...
}

The then block allows you to map the final result of the generator. If the generator returns () then we can skip naming the argument (ie. just have then { .. }).

The Generator trait should also be extended to take a resume argument. If this happens then we need a way to pass the argument when iterating the for loop. For this we can use continue

let gen: impl Generator<Resume=u32>;
for x in gen {
    ...
    continue 123;
}

If the generator resumes with anything other than () then the continue is mandatory at the end of every path in the for-loop body. If the generator does resume with () then we can continue to use continue with no argument, or to omit it entirely and the end of the body. This makes the usage of continue resemble the usage of break and return with regard to ().

newpavlov · January 22, 2018, 7:20am

One of motivations to for this proposal was to remove the need for else/then blocks. I can't see how you example is better than:

let y: Foo = for x in wow {
    ...
};
let foo: u32 = {
    // compute `foo` based on `y`
};

Well, except one thing, it creates explicit scope for y which could make handling lifetime containing results better, but with NLL, I think, its importance significantly diminishes.

You probably meant the same as an Arg argument? I.e. Generator signature will look like this:

pub trait Generator {
    type Yield;
    type Return;
    type Arg;
    fn resume(&mut self, arg: Self::Arg) -> GeneratorState<Self::Yield, Self::Return>;
}

One problem with you example is that we can't decide what value to use as an argument on the first iteration. We could leave resume argumentless and add method resume_with_arg, or use Option<Self::Arg>, but it will make generators significantly more complex. And overall I think requiring to use continue in every path of the for-loop body will make for loops harder to write, modify and read.

Also it will make desugaring of for loops significantly more complex, as continue with the value currently does not work anywhere, in contrast with break expressions.

Could you please provide examples where argument based iterators and continue with value could be useful?

canndrew · January 22, 2018, 8:12am

My example only runs if the for loop doesn't break. Your idea doesn't negate the need for else clauses, it only makes them even more useful since sometimes people will want to handle the final result of the generator separately.

One problem with you example is that we can’t decide what value to use as an argument on the first iteration.

Generators should have separate start and resume methods. That's the way I'm imagining it at least.

And overall I think requiring to use continue in every path of the for-loop body will make for loops harder to write, modify and read.

Well the feature is only there if you want to use it. If you're resuming with () then you don't need to use continues. And if you do want to resume with a value then continue seems like the most natural place to put it.

Also it will make desugaring of for loops significantly more complex, as continue with the value currently does not work anywhere, in contrast with break expressions.

for loops don't necessarily have to desugar to a loop. They can be their own thing entirely.

Could you please provide examples where argument based iterators and continue with value could be useful?

Anywhere where yield returning a value is useful. eg. you want to write an async implementation where coroutines yield the set of events they're waiting on and get resumed with the event they were woken up for.

newpavlov · January 22, 2018, 8:56am

Ah, so you want to be able to distinguish between break result from loop body and from generator result.

And what should in general happen if resume method is called before start?

I think changing relatively simple desugaring to more involved understanding of for loops by the compiler is a big drawback of your proposal.

While I am personally not convinced that we need complexities which you proposal introduces, your ideas seems more or less compatible with the current proposal, so they probably can be introduced later as separate RFC(s).

HadrienG · January 22, 2018, 9:04am

I like the idea of improving the interoperability between iterators and generators a lot. I also agree with your point that this would be best done by making Generator<Yield=T, _> and Iterator<T> as equivalent as possible in the eye of the type system.

I would strengthen the point that Generator -> Iterator conversion and Iterator -> Generator conversion are both very important. We have a rather large body of Iterator-based code (standard Iterator methods, itertools, rayon, etc…), which should not need to be rewritten in order to be able to consume generators. Similarly, iterators should be usable in any place where generators are expected in order to ease adoption of generators in existing iterator-based codebases.

I am less sure about the proposed for loop changes. It would make for loops more special with respect to “raw” loop statements, since break value; would be much more usable with the later than with the former. That is because in the raw loop case, the user controls the type of the loop return value, whereas in the for loop case, the underlying iterator/generator type controls it. I’m not sure how much of a problem that would be in practice, though, and the proposed use cases do look pretty nice.

canndrew · January 22, 2018, 9:56am

Same thing that happens if resume is called after the generator has finished - it panics. (Though I'd prefer a Generator API based on move semantics rather than using &mut).

rpjohnst · January 22, 2018, 5:49pm

Move semantics for generators are insufficient. We want them to be able to self-borrow across suspension points, which means the trait needs to work with immovable types. Further, event loops are a major consumer of relatively large, heap-allocated generators that definitely shouldn’t be moved in and out of whatever collection they’re a part of, because that would mean a lot of unnecessary (de)allocation.

Even a partially move-based trait (say, fn start(A) -> Self and fn resume(&mut self, R) -> Y) isn’t great on this front. The trait could be implemented for G: Generator directly, forbidding self-borrows in start; or it could be implemented for Box<G>, forcing an extra allocation on the client.

Separate start and resume methods could technically have their ordering enforced via zero-sized token types, but that’s probably overkill. I do like the idea as a solution to resume arguments, though.

canndrew · January 23, 2018, 5:08am

Damn.

That's an interesting idea. Without dependent types though there's nothing to link the token to a specific instance of the generator. So it wouldn't be fool-proof. Might still be worth protecting against inappropriate resumes though:

trait Generator {
    type Yield;
    type Resume;
    type Return;

    fn start(&mut self) -> GeneratorResult<Self>;
    fn resume(&mut self, arg: Self::Resume, token: ResumeToken<Self>) -> GeneratorResult<Self>;
}

enum GeneratorResult<G> {
    Yield(Self::Yield, ResumeToken<G>),
    Return(Self::Return),
}

struct ResumeToken<G>(PhantomData<G>);

for-loop integration would make juggling the ResumeTokens much less of a chore of course.

rpjohnst · January 23, 2018, 5:23am

Now that you mention it, that reminds me of generative types. We already will probably need generative lifetimes to make self-referential types sound, so maybe we could apply the same thing to the resume tokens.

That is a lot of machinery to bake this low in the stack, though. Not sure if it's worth it.

idanarye · January 23, 2018, 11:30am

canndrew:

The Generator trait should also be extended to take a resume argument. If this happens then we need a way to pass the argument when iterating the for loop. For this we can use continue
let gen: impl Generator<Resume=u32>;
for x in gen {
    ...
    continue 123;
}
If the generator resumes with anything other than () then the continue is mandatory at the end of every path in the for-loop body. If the generator does resume with () then we can continue to use continue with no argument, or to omit it entirely and the end of the body. This makes the usage of continue resemble the usage of break and return with regard to ().

What's wrong with the way regular way Rust returns a value from a block?

let gen: impl Generator<Resume=u32>;
for x in gen {
    ...
    123
}

troiganto · January 23, 2018, 12:30pm

Something the motivation section somewhat glazes over is why it would be preferrable to make for-loops based on generators instead of implementing IntoIterator for a subset of generators. (i.e. why to prefer the conversion iterators->generators over generators->iterators) To me, the latter seems more backwards-compatible, easier to implement, and easier to gather community support for.

Maybe the answer is obvious to someone more intimate with the issue than me, but I think an eventual RFC should also cover this question.

jnicklas · January 23, 2018, 12:53pm

Why not simply impl Iterator for Generator where Return=()? That seems like a far simpler solution, and it maps well to working with functions which yield some type and return (). Kind of like this:

impl<T> Iterator for Generator<Yield=T, Return=()> {
    type Item = T;
    
    fn next(&mut self) -> Option<T> {
        match self.resume() {
            GeneratorState::Yielded(value) => Some(value),
            GeneratorState::Complete(_) => None
        }
    }
}

Playground link: https://play.rust-lang.org/?gist=c594c65f258ca5a97e5d79acb4d0e176&version=stable

jdahlstrom · January 23, 2018, 1:14pm

Simpler, yes, but less expressive. You couldn't extend the for loop syntax to support generators with non-() return type (or would have to ignore the return value), or those that take resume arguments.

newpavlov · January 23, 2018, 1:15pm

I’ve updated pre-RFC text with alternatives section, in which I’ve tried to answer question from @troiganto and @jnicklas.

jdahlstrom · January 23, 2018, 2:27pm

I'd add that the biggest drawback with using else specifically (yeah, I realize this is a bikeshedding issue) is definitely the fact that in python which has the exact feature it has never been intuitive to anyone what the else branch does (mostly people expect it to be evaluated if the loop never executes). And the construct's rare enough that every time you encounter it you have to go and check what the semantics were again.

idanarye · January 23, 2018, 6:31pm

Instead of adding an else clause, why not use an enum to determine if the for has finished or if it was terminated with a break? Something like this: https://play.rust-lang.org/?gist=f2a1eb4cd59ebba46485c7d3278a9bff&version=nightly

nikomatsakis · January 23, 2018, 6:49pm

(Disclaimer: I've not read the full thread yet, just the first post.)

I find this idea simultaneously very elegant -- using Generators return value as the result of a for loop is quite clever -- and simultaneously quite confusing. This example certainly caused me to do a double take:

Where is this number_of_records value coming from? It looks like it's somehow being computed by the loop, but it's not, not really, it's being computed by the generator returned by parse_file I guess. It feels quite non-obvious.

(In contrast, when you break out of a loop with a value, the flow of the value feels more direct -- the same code that lets you reach uses of that variable gives you its value.)

I think my overall feeling is that this might be an interesting direction, but it's a step I would not want to take until we've gained a lot more experience with generators. It is an interesting idea to have around.

(For the record, I remain pretty opposed to things like else blocks being attached to for loops and so forth. It just feels like too much to me, and I've been persuaded time and time again that nobody knows what such constructs ought to mean.)

newpavlov · January 23, 2018, 7:04pm

The main point which I wanted to demonstrate with that example is an alternative to this pattern, which I don’t like much:

for record in parse_file(f) {
    let record = record?;
    // process record
}

With generators we can decouple error reporting channel and “business” data channel.

number_of_records here is just to demonstrate unwrapped Ok value. I’ll try to add a bit more explanation. But I get your point that for unaware user it can be non-obvious from where he receives result value, I guess it can be only solved by becoming accustomed to generator based for loops.

nikomatsakis · January 23, 2018, 7:52pm

Ah, I kinda' missed that at first, I admit. I was wondering what that ? at the end of the for loop was all about, kinda took it for a typo. =)

I'm not wild about the let x = x? pattern either.

@withoutboats had another proposal of ? patterns, though more for iterators (but I guess it applies):

for r? in ... {
}

Topic		Replies	Views
Some thoughts on Generators and For-Loops language design	13	1337	June 22, 2020
Generators in rust language design	12	5083	September 19, 2022
Rust generators: exploration of potential syntax language design	24	2975	March 12, 2022
Streaming Generators? language design	3	4791	March 25, 2019
About optimizations of `for` loops compiler	6	848	August 25, 2023