Syntax for generators with resume arguments

Not a very baked idea at this point, but what if syntax for generators with resume arguments were something like this:

generator {
    resume x,y,z;
    ...
    yield x resume u,v,w;
    ...
    while u < v {
        ...
        yield a + b * c resume u,v,w;
        ...
    }
    ...
    return w;
}

  • All resume's must bind the same number of variables.
  • Variables bound by resume live until the next yield.
  • resume must be the first statement of the generator body, and must follow each yield (but not return). It may be omitted for argument-less generators.

I think this nicely side-steps the issue of unpacking tuples returned by yield and is also less surprising than implicit re-binding after each resumption.

Previous discussions involving this topic:
[RFC] Unified coroutines a.k.a. Generator resume arguments
[Post-RFC] Stackless Coroutines
Pre-RFC: Generator resume args

If we want this style of syntax, here's an alternative that avoids the resume semikeyword on yield statements: yield $expr => a,r,g,s;.

Note that this syntax would require an edition boundary to turn at least generator into a keyword. resume could probably get off as a weak (context-sensitive) keyword, but I'm not exactly certain.

The main problem with resume arguments is the asymmetry between the initial control flow entrance and resuming at yield points. I don't think any syntax can possibly be "obviously correct" unless it corrects for that incongruity.

Personal opinion

I still personally think that resumable closures ("argument re-binding") is the best approach, and am slowly collaborating on an RFC putting forward that position with its full picture. Not to derail this thread, though.

The TL;DR is that || return 5 and || loop { yield 5 } are theoretically equivalent, so what falls out of actually unifying them, and |x| return x with |x| loop { yield x } as well?

1 Like

Yes, but I think some new keyword for generators is pretty much unavoidable, unless we want to go with something like yield fn { ... }.

And the above does that, does it not?

But a lot of people felt that was too magical, hence this proposal.

There's still asymmetry between the initial resume $arglist; and later yield $expr resume $arglist.

Unless you'd like to suggest separate yield and resume statements?

And the version of that proposal I'm working on is quite unmagical, so long as I can explain it correctly. Every proposal in this space is matter of what direction you look at it from.

If some kind of explicit resume syntax will be made, I have a general opinion on its syntax (I still prefer the implicit way).

Because the "resume argument" position expects a pattern (it can be restricted to the simplest variable patterns but semantically it is a pattern, not an expression), it should be similar to the existing syntax expecting a pattern.

yield $expr => arg is not good at this because existing match arm syntax $pat => $expr expects the pattern at the left-hand side of "=>", not the right-hand side.

I don't have good alternative syntax, though. Maybe let $pat yield $expr or something.

Not quite. I believe it would take FnArgs, not Pat. (i.e. a,r,g,s is not a pattern, (a,r,g,s) is; a,r,g,s is an argument list (without types).

I meant that the argument position syntax is more of a pattern than an expression, relatively. I know that is not precisely a pattern, but with the type ascription pattern, an argument list (in a closure, ignoring self thing) is a list of patterns separated by commas.

What's wrong with let (u, v, w) = yield x; ?

5 Likes

How would you get the resume arguments before the first yield?

You could just treat it as a normal function call until the first yield.

However, I do like @CAD97 proposal of modifying the parameters.

I wrote a blog post on the first-yield/last-yield asymmetry a while ago if people are curious about some of the difficulties. @vadimcn's proposal handles it pretty well on the whole! It suppors multiple arguments, explicitly shows the location of the first resume, and it's even compatible with "magic mutation" in the sense that the resume ... postfix could control which parameters are re-assigned across yields. Magic mutation used to scare me but I now think that it is the right approach for a few reasons:

  • It isn't really mutation. Think of it more like re-assignment of the parameter binding.
  • It does allow parameters to be moved. Just as let x = ...; loop { drop(x); x = ...; } is allowed, so is |x| { drop(x); yield; }.
  • In practice, generators rarely care about old inputs. Futures are strictly forbidden from using old contexts, streams tend to fully process each item after moving onto the next, etc. Usually people yield because they are finished with whatever they already have and need something new.
  • I can do stuff like |c: char| while c.is_whitespace() { yield; } (or generator { resume c; while c.is_whitespace { yield resume c; } } under this proposal) which becomes really second-nature after a while.
  • Rust is really good at preventing odd mutations at odd times from having odd effects. If you try to hold a reference to the input across the yield, you will get told to move it to a dedicated binding first.

The proposal myself, @CAD97, and @pcpthm are working on is here if people have feedback. The more input the better!

1 Like

Meh... what's wrong with

|mut c: char| while c.is_whitespace() { c = yield; }

That makes it much clearer that c is changing.

Edit: But I do look forward to seeing the details of the proposal you come up with.

4 Likes

Does anyone have non-toy examples where resume arguments are helpful? Some algorithm expressed using resume arguments which does not send and receive values like ‘foo’ and ‘bar’ just because it can, but something actually useful? It doesn’t have to be very sophisticated, just something beyond ‘A’, ‘B’, ‘C’; and preferably it should not be equally easy to express in some other way. (A ‘map’ function expressed using resume arguments would fulfil the former criterion, but not the latter.)

Because to be frank, I don’t believe the biggest problem with resume argument is the syntax. Contemplating resume arguments for some time led me to think that resume arguments are not such a great feature after all. I don't like how there seems to be no good answer to all the state machine mismatch problems (e.g. where should the value go if the control flow hasn’t arrived at the first yield yet). I also don’t like how resume arguments tie sending a value to the coroutine with receiving a value from it; sometimes when you want to receive a value, you don't necessarily have one ready to send, but using yield for this purpose forces you to make one up anyway. Even in languages which already have it, I tend to regard resume arguments as a hack, which often necessitates further hacks to make up for those deficiencies; see for example the function.sent proposal in JS.

So if I am to be convinced that resume arguments are useful, I think I’d have to see some relatively concrete examples of it.

Right now I'd be more favourable towards something like asynchronous iterators: having asynchronous functions that could either block (suspend without generating a value), yield (suspend with a value) or return (finish execution with a value). This feature would subsume resume arguments just by having two such asynchronous iterators receive each other’s endpoints and send values to each other. It would also nicely capture the symmetry between receiving and sending, which in the generator proposal is broken by having one endpoint be the ‘caller’ and the other the ‘callee’.


(Much, much later edit: apparently I am not alone in doubting how useful this feature is in the first place.)

2 Likes

The best example that I can think of is a parser. It takes in input and "saves" the DFA state in the control flow of the generator.

How about some code? Need not even be Rust. The point is that I want to see how the semantics of the resume-with-value operation are taken advantage of by the algorithm.

The main use case I have in mind for generators is video games (and more generally, any program that needs something complicated to be done across multiple iterations of an update loop).

An example using a resume argument would look like:

let enemyAi = |enemy_self: &Entity, terrain: &Geometry, player: &Entity| {
  'main_loop: loop {
    // ...
    if enemy_self.can_see(&player) {	
      while distance(&enemy_self, &player) > SOME_DISTANCE {
        yield terrain.compute_movement(&enemy_self, player.pos);
      }

      while !player.is_dead {
        if !enemy_self.can_see(&player) {
          // Must've been rats. They're everywhere.
          continue 'main_loop;
        }
        yield ActionType::Shoot(player.pos);
      }
    }
    else {
      yield ActionType::StandAround;
    }
  }
}

In this case, the interest of having resume arguments is that the values this function uses (eg the player's position) will need to change across yield points.

Of course, you could achieve the same thing using refcells, but:

  • Refcells are icky.
  • Refcells induce a tiny bit of overhead and make it harder to reason about eg multithreading.
  • The above approach Just Works. The function body only needs to manipulate POD types without worrying synchronization or destructors. Composition is as easy as calling functions.
4 Likes

A parser example:

fn* lexar(i: Input) -> _ {
    loop {
        if i.is_whitespace() {
            yield None;
        }

        if i.is_numeric() {
            let mut store = i.into::<usize>();
            loop {
                yield None;
                if i.is_numeric() {
                    store += i::into::<usize>();
                    continue yield None;
                } else if i.is_whitespace() {
                    yield Some(Ok(Num(store));
                    break;
                } else {
                    yield Some(Err("unexpected non-numeric"));
                }
            }
        } else {
             ...
        }
    }
} 

@felix.s You can see a real-world example here: https://github.com/drone-os/itmsink/blob/97e1791927d8d9d008d065c6c477a068bfb9a158/src/itm.rs

Currently to pass values to the generator, it uses a trick with Rc<Cell<T>>, which is an ugly hack (look at the next_byte macro). But still feels better than writing the state machine manually.

1 Like

I don’t know how the lexar example by @Nokel81 is supposed to be understood. And while I like the other examples for showing how resume arguments may be useful, they don’t really provide an answer to how to design this feature:

  • The enemy AI example doesn’t look like it would care about the relative order of yielding and receiving values. It could receive the initial player position on the argument list just as easily as through the resume argument(s), and it seems it would be fine with discarding spuriously received values.

  • The drone-os parser example yields () and so seems entirely fine with spurious yielding (without also receiving); in a sense it’s not much of a generator at all, in its current form it could have been a Future instead. It could serve as motivation for some kind of consumer construct, dual to generators, that can suspend execution to receive a value instead of returning one. (I assume neglecting to call pump between resuming is an error.)

    (What is this static move syntax, by the way? I haven’t seen it before.)

Compare this with an implementation of map using generators with resume arguments. I’m going to write it in JavaScript (with function.sent), not because I love it, but for the sake of having some concrete, familiar syntax.

function* map(f) {
    for (;;) {
        yield f(function.sent);
    }
}

const gen = map(x => x * 2);
console.info(gen.next(1).value); // 2
console.info(gen.next(2).value); // 4
console.info(gen.next(3).value); // 6

This clearly demonstrates why decoupling receiving from yielding (what function.sent enables) is necessary: without it, map would have to either collect its first resumed value on its regular argument list (leaving the question of how to handle the case where there aren't any values is handled) or yield a dummy value at the start, which is how one would write in in JS now:

function* map(f) {
    let val = yield;
    for (;;) {
        val = yield f(val);
    }
}

const gen = map(x => x * 2);
gen.next(); // skip the dummy yield
console.info(gen.next(1).value); // 2
console.info(gen.next(2).value); // 4
console.info(gen.next(3).value); // 6

This little case study shows that conflating yielding with receiving is a built-in fencepost error. In a statically-typed language it’s going to be felt much more strongly: the Rust equivalents of the latter two workarounds would have to use Option<_> somewhere.

But that’s a silly example, because implementing map as a generator it doesn’t afford us any convenience over just using the function directly. I’m looking for a less silly one. Can such be provided?

The currently implemented design avoids any "start parameters" for generators, these are carried through the closed over environment. A direct translation of your example that works on nightly rust looks like (playground):

fn map<T, U>(
    mut f: impl FnMut(T) -> U,
) -> impl core::ops::Generator<T, Yield = U, Return = !> + Unpin {
    move |mut t| loop {
        t = yield f(t);
    }
}

fn main() {
    let mut gen = Box::pin(map(|x| x * 2));
    dbg!(gen.as_mut().resume(1));
    dbg!(gen.as_mut().resume(2));
    dbg!(gen.as_mut().resume(3));
}

(The playground uses a little local helper to implicitly unwrap the yielded value from infinite generators like that).

2 Likes