[Pre-RFC] Coroutines


#1

Hi All,

Given resurgence of interest in async IO libraries, I am thinking or re-introducing my RFC for coroutine support in Rust, updated to keep up with the times (though coroutines are useful for more than just async).

Anyhow, here 's the latest draft. I have a feeling it is coming off a bit terse… I’d appreciate feedback on what seems unclear and what sections should be expanded.

Thanks!


#2

Seems nice, but can you iterate a generator using the regular for syntax? Plesase add an example.

I guess “yield” is a contextual keyword? Specify this in the FRC. I don’t think the “coro” contextual keword is needed.

In Python they have added a new syntax that allows to yield all from an iterator. This in Rust could lead to a nice efficiency optimzation: https://www.python.org/dev/peps/pep-0380/#optimisations

F# has a similar feature.


#3

One minor problem is maybe that you can’t explicitly specify the type of the yielded values.

For example if you do this:

fn takes_coro<Y, R, F: FnMut() -> CoResult<Y, R>>(f: F) { ... }

takes_coro(|| yield Default::default());

You will get an error because Y couldn’t be inferred. This isn’t a problem with regular closures because you can specify the return type with || -> Ret. Instead with coroutines you’ll have to invoke takes_coro<String, _, _>(...) which can be confusing.

However I don’t know in practice if that’s just a minor nit or if it’s really a problem.


#4

I suppose that one of the major usage of coroutines is for async servers, where yield would yield a Future.

For example something like this, where we query a row from the database then load the content of a file:

fn query_database(sql: &str) -> impl Future<DatabaseRow> { ... }
fn load_file<P: AsRef<Path>>(path: P) -> impl Future<Vec<u8>> { ... }

listen(move |http_request| {
    let db_row = (yield query_database("SELECT foo, bar FROM foo LIMIT 1")).unwrap();
    let file_content = yield load_file("foo.txt");
    return build(response, &db_row, &file_content);
});

But this wouldn’t work, since yield must always have the same type.

How do other languages handle that?


#5

And the previous post made me think of this question: what would be the precedence between yield and the ? operator?

Sometimes you want yield (a?) and sometimes you want (yield a)?.


#6

Yes, if you impl Iterfator<Item=T> for CoResult<T,()>: see Motivating Example #1.

No, yield is already reserved. I’ll mention that.

I’ve considered that, but then we’d need to add a third variant to CoResult. And how would that work with parameterized coroutines? Also, we’d probably have to box the inner iterator.
I am hoping that Rust doesn’t need this, because, unlike Python, it should be able to inline iteration over the inner.


#7

I didn’t explicitly mention this, but you should be able to write || -> CoResult<String,()> {...}, just like for regular closures.

Please see Motivating Example #3. We’ll probably need some syntax sugar for this to be ergonomic, but Rust syntax extensions should be able to cover that.

I am thinking that yield would have the same precedence as return; anything else would be confusing. So you’d have to parenthesize in the latter case: (yield a)?


#8

Oh right, my bad. I got a bit confused by the fact that there are two “return types”: the type you put behind return, and CoResult.


#9

Firstly, thanks for this (pre) RFC, it looks really promising :smile:

Wrt. the Iterator adapter wouldn’t following be better, or did I miss some thing?

impl<G,T> Iterator for G where G: FnMut() -> CoResult<T,()> {
    type Item = T;
    fn next(&mut self) -> Option<T> {
        match self.call() {
            Yield(x) => Some(x),
            Return(*) => None
        }
    }
}

Lastly I wonder restrictions of borrows across yields might sometimes be a bit surprising (through needed) if e.g. the borrow is to some value on the heap guarded by a hoisted struct or e.g.:

//in coro
|outer| {
    for inner in outer.iter() {
        // isn't now the OuterIter on the stack (=> hoisted) and
        // and inner has a borrow to it
        for element in inner.iter() {
            // now inner and InnerIter live over a yield, which they are 
            //not allowed to because they have a borrow to the hoisted OuterIter?!
            yield element;
        }
    }
}

I probably misunderstood/overlooked something, but if not wouldn’t this be kind of a problem (Coros would still be helpfull, but a bit less and they also would be more confusing)?

EDIT: is just noticed there is an error |outer| would be a parameter returned from yield, so correct would be if outer is brought into the scope of the corotine by closing over it.

Correction:

type ElCoro = FnMut() -> CoResult<El, ()>;
fn random_fn<'a>(outer: &'a Some2DCollectionOfEl) -> impl ElCoror + 'a {
    || {
        for inner in outer.iter() {
            // ...
            for element in inner.iter() {
            // ...
                yield element;
            }
        }
    }
}

#10

Indeed, it would! Those examples are more like sketches, I doubt they’d compile. :slight_smile:

Depends on the thing you are iterating over:

  • If it’s something like Vec<Vec<i32>>, this should be fine, because the outer iter returns a &Vec<i32>, and the inner iter would reference into that value (which lives in the outer collection), not into the outer iter.
  • If, however, you are iterating by value over, say, Vec<[i32;10]>, the outer iter would return a [i32; 10], which would get hoisted, and, yes, we’d have a problem.
    IMO, this case should be infrequent and is easy to fix by switching to iteration by reference.

#11
while let Yield(x) = coro1 {
    print!("{} ", x); // prints "0 1 2 3 4 5 6 7 8 9 " 
}

@vadimcn Is the absense of parentheses intended ? coro1() seems more logical and readable.

Should coroutines be introduced with a special keyword,- to distinguish them from regular closures? For example, coro |a, b| { … }

I would prefer so, even if coro sounds a bit weird and a better name might be prefered. I think this would be more readable:

fn iter(&'a self) -> impl Iterator<T> + 'a {
    coro {
        let mut i = 0;
        while i < self.len() {
            yield self[i];
        }
    }
}

as || could become optional in that case. And when the body of a function is only a coroutine it would allow a sugar like:

coro fn iter(&'a self) -> impl Iterator<T> + 'a {
    let mut i = 0;
    while i < self.len() {
        yield self[i];
    }
}

Now I’m a bit disatisfied about the idea of encoding async/await as a combination of functions and macros above generators. I would like so much all monadic types benefit from the same sugar. It is still unclear to me if we could allow something as powerfull as F# computation expressions without relying on pure source transformation.


#12

I think we need to support efficient iterator nesting. An interesting approach is described in this paper from Spec#, in section titled Translation of nested iterators. This approach allow nested iterators to perform in linear time. ¿are you considered this approach?


#13

@burakumin: I think the absence of parentheses is no intended there, wrt. the the coro (or similar) keyword I think that it is neither good nor bad, but I’m not so biggest fans for the “more readable” variants you provided. For one think I’m not the biggest fan of unesseasry special cases like dropping the || in the case where there are no parameters to the coro (coro {...} instead of coro || {...}) it’s just not worth the 2 less characters to type. Also I prefere to have nothing in a function signature which is not part of the actual signature, this include some struct destructing in the parameter list, the mut in (...,mut foo: Bar,... and your coro function prefix. (Through I still use the mut in the end :wink:). While this is just a very subjective preference, the main reason against it is, that yield is already a reserved keyword, while coro isn’t (and using yield there also seems suboptimal)


#14

TL;DR: nested loop/coro optimazation: yes! But as a implementation detail

@gabomgp I don’t think it is necessary to include any langue/API elements for this, in many cases (assuming -> impl FnMut() -> CoResult<...> can be used) it is possible for the compiler to have all necessary information and freedoms to do any necessary optimization.

Through it is true that e.g. in following case:

|| {
    let another_coro = some_how();
    for e in another_coro {
        yield e;
    }
}

it might nice to have some additional optimazation, it should be possible to do so with a MIR pass which detects such yield from patterns and transforms the code accordingly. Through yes, it might not necessary be supper easy to detect such patterns, so to give the compiler a hint and for usage convinience a yield_from!(coro) macro could be introduced, which is defined to be semantically equivalent to the while let loop from the RFC (not depending on the Iterator adapter). Note that this might need a bit of extra work wrt. to cases which include some “wrapping” similar to the await! example.

E.g. (not I rarely write macros, so there might be some error in it)

macro_rules! yield_from {
    ($coro:expr) => {
        let hoisted_coro = $coro;
        while let CoResult::Yield(e) = (hoisted_coro)() {
            yield e;
        }
    }
}

//... in a function or similar
|| {
    // normal use, compiler get a hint for optimization
   let some_coro = get_it();
   yield_from!(some_coro);
}

#15

@vadimcn: I wonder about a implementation detail wrt. hoisted variables. If I’m not mistaken for a variable being hoisted can mean it’s living on the heap. Now when a coro is continued, will the variable be used from there, moved back ontop of the stack or a mixture of this? Both seems to make sense(1) depending on the actual coror and I’m not sure if we can rely on LLVM to optimize this correctly.

(1): just another function call before the next yield -> stais where it is (heap?), more complex coro -> gets back on the stack


#16

I through it could make sense to post might notes about borrow over yield:

  1. borrows to a possible hoisted variable can’t be allowed
    • hoisted value are moved, so their memory position can change
  2. borrows which lifetime “outlifes”(>=) the corutines lifetime are ok
    • else &'static str would not be allowed
    • they clearly can’t be hoisted in this corutine
  3. technically a &T based on a hoisted Box<T> can be done
    • because it does not point to the (stack) memory of Box<T>, but to memory this memory points to/owns
    • basically this works for most smart pointers and handles
    • but we can’t allow this: rust is defined through the interfaces, not their implementation
      • basically we can’t really on how Deref is implemented for Box/Rc and so on
    • to allow this we have to have a way to tell the compiler that’s ok
      • this would be another feature, possible another RFC
      • it can be done by somehow “tagging” methods returning a reference (or their lifetime)
        • which would include Box so we can just box values where references across yield is needed
        • but it also would work with all kind of custom smart pointers Rc's etc.
        • other mainly unsafe code could also profit from having a non-stack borrow guarantee
        • this might be more generalizable to not just non-stack but also borrows which “just live long enough”
      • or we could make this a property of the Box lang item
        • so if we want borrows to life cross yield we have to Box what we borrow
        • less flexible than the first method but also less complex
      • in both cases extra care might have to be taken wrt. returning such borrows

#17

No, just a typo. Invocations should use parens, of course.


@gabomgp, @naicode: I think optimization of nested iterators should be done by the compiler. We might not even have to do anything special, because inlining passes should take care of tight inner loops over nested iterators.

The hoisted variables live in the coroutine environment struct, just like captured vars of regular closures do. The environment itself may end up on the heap, but coroutine code doesn’t need to worry about that, it will always access them through the self pointer.


#18

Submitted this to rfcs.

Thanks for you comments and suggestions!