[Pre-RFC]: Generator integration with for loops

I like the idea of improving the interoperability between iterators and generators a lot. I also agree with your point that this would be best done by making Generator<Yield=T, _> and Iterator<T> as equivalent as possible in the eye of the type system.

I would strengthen the point that Generator -> Iterator conversion and Iterator -> Generator conversion are both very important. We have a rather large body of Iterator-based code (standard Iterator methods, itertools, rayon, etc…), which should not need to be rewritten in order to be able to consume generators. Similarly, iterators should be usable in any place where generators are expected in order to ease adoption of generators in existing iterator-based codebases.

I am less sure about the proposed for loop changes. It would make for loops more special with respect to “raw” loop statements, since break value; would be much more usable with the later than with the former. That is because in the raw loop case, the user controls the type of the loop return value, whereas in the for loop case, the underlying iterator/generator type controls it. I’m not sure how much of a problem that would be in practice, though, and the proposed use cases do look pretty nice.

4 Likes

Same thing that happens if resume is called after the generator has finished - it panics. (Though I’d prefer a Generator API based on move semantics rather than using &mut).

Move semantics for generators are insufficient. We want them to be able to self-borrow across suspension points, which means the trait needs to work with immovable types. Further, event loops are a major consumer of relatively large, heap-allocated generators that definitely shouldn’t be moved in and out of whatever collection they’re a part of, because that would mean a lot of unnecessary (de)allocation.

Even a partially move-based trait (say, fn start(A) -> Self and fn resume(&mut self, R) -> Y) isn’t great on this front. The trait could be implemented for G: Generator directly, forbidding self-borrows in start; or it could be implemented for Box<G>, forcing an extra allocation on the client.

Separate start and resume methods could technically have their ordering enforced via zero-sized token types, but that’s probably overkill. I do like the idea as a solution to resume arguments, though.

1 Like

Damn.

That’s an interesting idea. Without dependent types though there’s nothing to link the token to a specific instance of the generator. So it wouldn’t be fool-proof. Might still be worth protecting against inappropriate resumes though:

trait Generator {
    type Yield;
    type Resume;
    type Return;

    fn start(&mut self) -> GeneratorResult<Self>;
    fn resume(&mut self, arg: Self::Resume, token: ResumeToken<Self>) -> GeneratorResult<Self>;
}

enum GeneratorResult<G> {
    Yield(Self::Yield, ResumeToken<G>),
    Return(Self::Return),
}

struct ResumeToken<G>(PhantomData<G>);

for-loop integration would make juggling the ResumeTokens much less of a chore of course.

Now that you mention it, that reminds me of generative types. We already will probably need generative lifetimes to make self-referential types sound, so maybe we could apply the same thing to the resume tokens.

That is a lot of machinery to bake this low in the stack, though. Not sure if it’s worth it.

3 Likes

What’s wrong with the way regular way Rust returns a value from a block?

let gen: impl Generator<Resume=u32>;
for x in gen {
    ...
    123
}

Something the motivation section somewhat glazes over is why it would be preferrable to make for-loops based on generators instead of implementing IntoIterator for a subset of generators. (i.e. why to prefer the conversion iterators->generators over generators->iterators) To me, the latter seems more backwards-compatible, easier to implement, and easier to gather community support for.

Maybe the answer is obvious to someone more intimate with the issue than me, but I think an eventual RFC should also cover this question.

1 Like

Why not simply impl Iterator for Generator where Return=()? That seems like a far simpler solution, and it maps well to working with functions which yield some type and return (). Kind of like this:

impl<T> Iterator for Generator<Yield=T, Return=()> {
    type Item = T;
    
    fn next(&mut self) -> Option<T> {
        match self.resume() {
            GeneratorState::Yielded(value) => Some(value),
            GeneratorState::Complete(_) => None
        }
    }
}

Playground link: https://play.rust-lang.org/?gist=c594c65f258ca5a97e5d79acb4d0e176&version=stable

2 Likes

Simpler, yes, but less expressive. You couldn’t extend the for loop syntax to support generators with non-() return type (or would have to ignore the return value), or those that take resume arguments.

I’ve updated pre-RFC text with alternatives section, in which I’ve tried to answer question from @troiganto and @jnicklas.

I’d add that the biggest drawback with using else specifically (yeah, I realize this is a bikeshedding issue) is definitely the fact that in python which has the exact feature it has never been intuitive to anyone what the else branch does (mostly people expect it to be evaluated if the loop never executes). And the construct’s rare enough that every time you encounter it you have to go and check what the semantics were again.

8 Likes

Instead of adding an else clause, why not use an enum to determine if the for has finished or if it was terminated with a break? Something like this: https://play.rust-lang.org/?gist=f2a1eb4cd59ebba46485c7d3278a9bff&version=nightly

(Disclaimer: I’ve not read the full thread yet, just the first post.)

I find this idea simultaneously very elegant – using Generators return value as the result of a for loop is quite clever – and simultaneously quite confusing. This example certainly caused me to do a double take:

Where is this number_of_records value coming from? It looks like it’s somehow being computed by the loop, but it’s not, not really, it’s being computed by the generator returned by parse_file I guess. It feels quite non-obvious.

(In contrast, when you break out of a loop with a value, the flow of the value feels more direct – the same code that lets you reach uses of that variable gives you its value.)

I think my overall feeling is that this might be an interesting direction, but it’s a step I would not want to take until we’ve gained a lot more experience with generators. It is an interesting idea to have around.

(For the record, I remain pretty opposed to things like else blocks being attached to for loops and so forth. It just feels like too much to me, and I’ve been persuaded time and time again that nobody knows what such constructs ought to mean.)

18 Likes

The main point which I wanted to demonstrate with that example is an alternative to this pattern, which I don’t like much:

for record in parse_file(f) {
    let record = record?;
    // process record
}

With generators we can decouple error reporting channel and “business” data channel.

number_of_records here is just to demonstrate unwrapped Ok value. I’ll try to add a bit more explanation. But I get your point that for unaware user it can be non-obvious from where he receives result value, I guess it can be only solved by becoming accustomed to generator based for loops.

1 Like

Ah, I kinda’ missed that at first, I admit. I was wondering what that ? at the end of the for loop was all about, kinda took it for a typo. =)

I’m not wild about the let x = x? pattern either.

@withoutboats had another proposal of ? patterns, though more for iterators (but I guess it applies):

for r? in ... {
}
2 Likes

I’m wary of a pattern syntax that matches the unwrapping/projection syntax rather than the constructing syntax like everything else (except ref, which is confusing and something we’re moving away from for precisely this reason), but I do like the idea of integrating ? into for loops somehow, since it can’t be just applied to the iterator like with other patterns.

3 Likes

This is an interesting perspective, but wouldn’t it be better if the solution worked with Iterators of Results, which are already not uncommon today?

It would also ideally work with Streams; today in the futures-await craite, #[async] for elem in stream implicitly ?s the value, which I think most people would agree is probably wrong. Changing this would make using #[async] for with streams essentially the same as an Iterator of Results.

Iterator<Item=Result<T, E>> and Generator<Yield=T, Return=Result<(), E>> have different semantics. First one can produce valid results after encountered error(s), and second will produce values until either source will be successfully exhausted or will terminate on first error, without any ability to continue iterations. I think significant amount of code uses the first variant while implying the second one (i.e. you should stop iterating on the first encountered error) and leaves it up to users to properly shortcircuit errors.

In a way your proposal is orthogonal to one in the OP, but if suggested for integration will be implemented it will probably cover a significant amount of potential use-cases for your proposal. (although I am not ready to comment about async part of your message.)

3 Likes

An iterator/stream adapter along the lines of collect::<Result<Vec<_>>() might be sufficient so we wouldn’t need a new ? language feature, though such an adapter would need to be implemented as a generator and would be most useful with something like this proposal.

It might be a crazy idea, but wouldn’t it be possible to somehow add Return associated type to Iterator trait which defaults to () and remove Generator from std completely? Merging them into one trait would simplify things IMO.

Besides that, awesome RFC

3 Likes