An idea for TCP closures and rust's effect system

I know that I made thread messy, sorry for that.

There are basically two cases:

  • everything is unboxed, inlined:
    In this case compiler directly knows where any yield statement yields, and generates one-big-state-machine to store all the logic, just like async fn does.
  • something is not\ cannot be inlined. The very case where we can't inline effectful functions is usage in trait objects.
    That's why I'm playing with labels - I want to make a mechanism which allows to "borrow" the coroutine context( i.e take its resume and yield from it) dynamically.
    Then, there might be cases when optimized would rather not to inline state machine, so described mechanism will come to play.

The example you specifically referred to is going to be converted into one state machine.

As of there in fact no enum involved (initially I imagined that there would, but that just doesn't play well with what @SkiFire13 brought up here), so no use :grinning_face_with_smiling_eyes:.

I'm the slow one here, please bear with me.. I really want to understand so this may take a long time and entail lots of questions from me.

Is the state machine not an enum? State machine genereated for async fn is an enum. What is this state machine? What trait does it implement? How do you use it? I'm trying to understand the simple optimistic case that does work. I'm less interested in the complex case with multiple layers of functions.

No, usually it's enum. Yes. Same, the state which generator effects produce is state machine, backed by enum. However, it's not intended to be creatable by hand, instead this state is stored inside of that of consumer function.

By example:

gen fn consumer() -> () -> bool {
   let effectful = || {yield 'fn true};
} //it produces an `impl Generator<...>`, which is enum containing state

Here, effectful also has state - its state is backed by an enum which is stored inside our impl Generator<...>. This way unboxed effectful closures are just parts of their owners state machines.

If we:

  1. extract out closure in separate function clo
  2. and will instead write this:
let effectful = Box::new(clo('fn));

then we will work with dynamic effectful closure(or function, which in case of lazy evaluation strategy can be turned into kind of closure), as specified above, in last post regarding labels.

Is Generator exactly same as Generator in std::ops - Rust ? I'm sorry I'm not sure.. is this already part of the language? Is there already some way to invoke it? How would I call such an fn?

That's not exactly in scope of thread's topic, but I assumed the generators to have such trait:

trait Generator<R=()> {
   type Yield;
   fn resume(
        self: Pin<&mut Self>, 
        arg: R
    ) -> Option<Self::Yield>; //None is for end of execution.

gen fn as a notion is also not entirely new design.
The used syntax is gen fn(InitType)->ResumeType->YieldType. There are no return type.

An example of usage:

gen fn g(msg: &'static str)->u64->String{...};

fn main() {
   let mut generator = g("Hello world"); //got a state
   //the we have to `Pin` it
   let mut pg: Pin<&mut _> = Pin::new(&mut generator); //or unsafe unchecked version
   //now we can resume the gen
   pg.resume(0) //gives Option<String>.

The feature is on nightly, under development, works akin to what I've shown.

You forgot the Return type

What if ResumeType is fn()? That would become ambiguous

I'm in favor of MCP-49 and gen fn's extended to support specifying resume type, so the core is in fact generalized semi-coroutines while user still faces pretty syntax.
Also gen closures can be made to only specify types, not the bindings, in headers(|here|): we walkaround the first yield problem, but have to decide when the code before the first yield is executed...

My take on the topic is that Generator trait would only be used in case of mixing both yield and return, otherwise FnPinMut (not both at the same time).

Fair point. We solve this by requiring a mandatory round brackets around the type with only exception of () - the unit type.

Not to be rude, but this design is an entirely different topic - I intentionally stated that these are asumptions. If you want to discuss it - let's proceed to a different thread.

I would rather ask you:
What if we do non-local returns by stack unwinding? Given that we know:

  1. The precise location of an "exception" handler is known from label; (no search for handler)
  2. The transfer of return data is done by pointer write prior to unwinding process. (no data transfer and "exception objects")

If I understood correctly, only thing it has to do is to just unwind all stack frames prior to one where we returned and call their destructors.
Would that reduce unwinding overhead down to tolerable?

Also, what would happen if we do non-local return from a destructor?

The only way this can happen, is via a call to closure that captured a label.

In this case we obviously don't run a destructor further, leading to logical and soundness issues.

There are no clear way to mitigate it.

At least:

  1. runtime traps for this process. (debug builds only)
  2. make return to not actually pass control to a label, but still have data write happen.
  3. ???

As a last resort we can say that doing non-local returns from inside of destructors is UB.

Edit: As an another way we can make non-local returns a kind of deferred statement:
When we face such a return, we

  1. write returned value by ptr::write,
  2. store an address where we are supposed to return, *
  3. proceed(continue) running destructors of current stack frame (whether non-local return was triggered inside of a destructor or in plain code), **
  4. when destructors are ran, we unwind stack frames further, running destructors in their scopes until we reach destination of a label. **

* We need to store the address somewhere, so it's accessible by unwinder:

In register

  • In an llvm one?
  • One of a real processor's?

In memory

  • no thread-locals or heap, because no_std crates don't have this.
  • no globals, cause this is not thread safe.
  • somewhere on stack...

** If we are in situation when a code run by unwinding lib triggers another non-local return, we don't start a new unwinding process, but just change the destination address of current, this way only the last non-local return issued takes effect - this both simplifies implementation and makes sense.