This is less a response to the bikeshed request (though it contains some bikeshed for communication purposes), but actually a semantics argument. Apologies, this is only lightly copy-edited, and maybe could be (should be?) its own blog post. I have some previous discussion on a very draft RFC.
I don't want trait Generator<Args> { type Yield; type Return; }
. I want trait FnMut<Args> { type Output; }
. (And, well, FnPinMut
for semicoroutines[1] which require pinning.)
My core argument here is that semicoroutines[1] and FnMut
have the exact same interface from the caller's perspective:
- You have some state object
- You call a function with a unique reference to said state object
- You receive a return value, and the state object is potentially mutated
- In
FnMut
case,FnMut::Output
- In
Generator
case,GeneratorState<Generator::Yield, Generator::Return>
- In
These are the same API, thus, imho, should be represented by the same trait. (This is similar to how Future
was originally designed with Result
baked into it, with both an Item
and Error
type, but today was stabilized with just a singular Output
type. With today's Generator
, it's GeneratorState<Yield, Return>
, which would be unified into a singular Output
.)
In this world, the only difference between yield
and return
would be where execution resumes after execution. Whether you yield
or return
(or implicitly return
with e.g. ?
or tail expression), the yield
ed/return
ed value is returned as-is to the caller. If you yield
ed, then execution upon next call resumes after the yield
. If you return
ed, then execution upon next call resumes at the top of the function.
This also solves the -> Return
versus -> yield Yield return Return
problem nicely; there's still only a single "return" type which is returned to the caller to care about. It's the exact same relationship, so it uses the same symbol.
Per OP, I don't want to get into full semicoroutines, which could potentially accept arguments to each call. Instead, we'll stick to just that required to implement a generators feature: Args = ()
, and Output = Option<T>
.
In my envisioned semicoroutine feature, merge_overlapping_intervals
could potentially look like this:
fn merge_overlapping_intervals(input: impl Iterator<Interval>) -> impl Iterator<Item = Interval> {
// !1 !2
std::iter::from_fn(move || loop {
// !3
let mut prev = input.next()?;
for i in input {
if prev.overlaps(&i) {
prev = prev.merge(&i);
} else {
yield Some(prev); // !4
prev = i;
}
}
yield Some(prev); // !4
yield None; // !5
});
}
Notes:
- Our semicoroutine just produces
impl FnMut() -> Option<Interval>
. To turn this into an iterator, we useiter::from_fn
. - We wrap the body in a
loop
. This is for convenience: otherwise, the implicit tail expression wouldreturn
a()
, which is notOption<Interval>
, leading to a type error. Asloop
evaluates to type!
, this coerces to our closure's return type just fine. Alternatively, we could have written the last line as a tailNone
. Note thatreturn None;
would not work[4], as we would still have the implicit tail expression returning()
after it. -
?
works here. As our closure returnsOption<Interval>
,?
just works as normal. Ifinput.next()
yields aNone
, we resume from the top of the function the next time the closure is called. - We
yield Some(interval)
here, because this is a-> Option<Interval>
closure.[5] Throughfrom_fn
, this fills theIterator
contract of returning items viaSome(interval)
. - We
yield None
to indicate the end of the iterator stream. If and only ifinput
is fused, we could instead leave this off,loop
ing back to the start of the closure, andinput.next()?
wouldyield None
; however, because we don't requireinput
to be fused, we have to yield it ourselves to ensureNone
is yielded before callingnext
on theinput
iterator again. It also just helps with code clarity to do so explicitly. This could also be written as a tail-returnNone
rather than the closure being a bigloop
(see point 2).
Having closures with yield
in them implicitly become FnMut
(or FnPinMut
as necessary) when they yield
is fine; closures are already FnMut
objects if they capture state mutably. However, item fn
are never FnMut
; they're always stateless fn()
(which is more restrictive than even Fn
, which can have state). As such, item fn
obviously cannot just contain yield
; that'd change their signature, and the signature is important to be fully prescriptive[6].
So, we do the same thing async
does. async fn f(Args...) -> Ret { body... }
desugars (modulo lifetime specifics) to fn f(Args...) -> impl Future<Output=Ret> { async move { body... } }
. Similarly, we would now have yield fn f(Args...) -> Ret { body... }
desugar (modulo lifetime and pinning specifics) to fn f(Args...) -> impl FnMut<Output=Ret> { move || { body... } }
.
Note that this does not return impl Iterator
, it returns impl FnMut
(or impl FnPinMut
). This is, I think, the sharpest point in this model. This is the breakage point that could potentially convince me against this. We don't get gen fn
to declare impl Iterator
in a declarative form; we get semicoroutines, which are more powerful, but put a bit more salt in the way of the desired end goal of straightline generators.
This gap can fairly easily be bridged by a macro or other syntactic transform on top: all you have to do is: error on return
, replace yield $expr
with yield Some($expr)
, and add a tail expression None
. (Or, if you want poisoning[7], a tail yield None; loop { panic!("exhausted generator"); }
.)
In the name of proposing a minimized subset, rather than the fully general semicoroutines, the minimal stabilization target would be yield fn
. This is a bit more than just supporting creating impl Iterator
, but still I think represents a nicely contained surface level feature that stands on its own merit, even without "yield closures" to go with it. (And if yield closures are bundled with yield fn, for the love of Ferris, limit it to argumentless closures, to avoid the massive second bikeshed over semicoroutine resume argument handling.[8]) I'd even be happy to see it severely limited to only semicoroutines that are FnMut
, or being conservative and only generating FnPinMut
semicoroutines, so long as the path to providing both remains open. (Perhaps a blanked impl FnMut
for impl FnPinMut + Unpin
? Adding the ability to make FnPinMut
when previously limited to FnMut
is a lot easier, though the auto pinning requirement adds another auto trait like inferred trait to the return type.)
[1] A bit of formal terminology, paraphrased:
- "Subroutines" are normal single-entry/single-exit[2] routines: function calls. When you call them, execution starts at the top of the routine, and exits back to the caller upon
return
, potentially with a value. - "Coroutines" are multiple-entry[3]/multiple-exit routines, though structured enough to not just be unstructured
goto
. Multiple-entry[3] is familiar enough, since we've familiarized ourselves withyield
and async/generators enough: when you resume a coroutine, it resumes at the last place ityield
ed, rather than the top of the routine. Multiple-exit is the surprising part: rather than returning control flow to a single caller (control flow is a tree), the coroutine itself chooses which coroutine to yield control flow to next (control flow is a directed graph). This can allow for very powerful state machines to be written clearly (and I believe basic block IR to be a form of coroutines). - "Semicoroutines" pull back the full freedom of coroutines to maintain tree structured control flow, with multiple-entry[3]/single-exit. As this functionality is often used to implement iterables, it's also commonly called "generators." Specifically: when you call a semicoroutine, it resumes execution at whatever its last yield point was, and whenever the semicoroutine yields control flow, it returns to its caller.
I use the word "semicoroutine" because it semantically separates the concept of yieldable control flow from the idea of creating an iterable/generator, which yields some Result<Yielded, Finished>
.
[2] Note that the single-entry/single-exit principle does not actually mean "only one return
, no break
, continue
, etc" like many dogmatists try to claim. Rather, it refers to the core property of functions that we take for granted today: that they only "enter" at one place (callers always call
the start of the function, and don't jump into the middle; functions don't share asm tails), and only "exit" to one place (we always ret
to the caller). (Fun note: jmp
for tail calls breaks this principle if you have more than one tail call option, as your function could exit to more than one other label! But from an outside perspective, the principle is held, as it will eventually ret
.) This is about structured control flow and encapsulation of subroutines; the callee knows exactly how it's going to be called, and the caller knows that it will regain control in an expected manner.
[3] I use multiple-entry because formally it is correct -- the routine resumes in the middle -- but it's important to note that you still call the semi/coroutine in a uniform way, which then automatically picks the correct resume point for you. This form of "structured multiple-entry" is isomorphic to a single-entry which immediately checks the state and jumps to the correct execution point. (Similarly, multiple-exit can be implemented via returning to a trampoline which calls the next routine as indicated.)
[4] However, it can easily be made to work. Because the tail expression is unreachable, the type checker could be made to ignore the tail return (and only the tail return) if it is statically unreachable. This would allow an explicit tail return
to typecheck, as the implicit return is unreachable, thus suppressed. However, an unreachable explicit return
should still be a typecheck error. I believe this matches the behavior of straightline functions, so should be the default behavior. Then why have I made it a footnote extension instead? I guess maybe it makes my return
point clearer? ¯\_(ツ)_/¯
[5] This could even maybe be composed with try
. While try
blocks wouldn't be helpful (yield
/return
targets the function directly, rather than the block, and you can't yield
a block), a try
closure could provide Ok
-wrapping (Some
-wrapping) for yield
/return
ed values, and yeet
[9] could specify the end case (None
). This only works for generator-like semicoroutines, though, where you have many yield
s terminated by a single yeet
.
[6] Well, auto traits for existential (impl Trait
) return types…. These are, in my opinion, the exception that proves the rule. Still need to find a solution for trait functions, though…
[7] I consider explicit control over poisoning semantics a benefit of this scheme, by the way.
[8] For what it's worth, I believe this model works best with the "magic mutation" syntax, though I despise that it's called that. Rather, you just get a new set of argument bindings each time the closure is resumed. The old ones are just dropped or shadowed (I honestly don't know which is better semantics).
[9] throw
, bail
, etc: https://areweyeetyet.rs/