An idea for TCP closures and rust's effect system

... yinz realize you are literally in the process of reinventing the C++ exception unwinder, including the never-actually-resolved-people-just-got-sick-of-arguing-about-it question of how it should interact with thread cancellation panic.

1 Like

Share the exact same machinery with panics? Probably not, that's optimized for the "doesn't unwind" case, which is the opposite of what would happen with non-local returns.

This also doesn't handle generators/async.

In particular this feels very non-zero cost.

Shouldn't cost more than catch_unwind? On each step you compare frame pointer to the one on which you want to stop or something to the same effect.

And you know on which you want to stop because the &PanicInvoker<T> you passed down the call stack has the address of the base frame on which you want to stop. A bit different from C++.

Certainly

It probably costs a bit more than catch_unwind, but the problem is that catch_unwind is already expensive.

Important detail to notice.

We simply forbid non-local returns from async and gen fn's - closures are allowed to return to local fn's scope producing a value. Then rest of code in scope can do something reasonable with labels fn was provided with..

I see, we'll need to store these states inside of their callers states. Representation for labels was only to allow to not inline most of code (not state) to not duplicate it in monomorphic copies.

Now I see that it's smaller optimization than I originally thought.

Fundamentally there should be nothing preventing you to have non-local returns from generators/async, so this sounds like an artificial limitation that creates inconsistency.

Closures are pretty much inlined anyway, so does it change anything if you don't require closures with non-local control flow to be inlineable?

1 Like

Indeed, they can be safely allowed.

My initial concern was like "what if a future is spawned on an executor, and then, after being moved to another thread returns to a non-async code from a runtime's code" that would break runtime as of one of its threads terminated to sync scope.

However, my concerns were just overstatement, as of these futures cannot be spawned: they have local lifetime.

Generators also share this property.

Another point: TPC closures aren't Send nor Sync.

Now, interactions between Asynchrony and Multiplicity.

My view is that asynchrony dominates multiplicity, so consumer of async fn(gctx: label(u64->bool)) will use it like:

...
while let GeneratorState::Yielded(it) = gctx.resume(64).await {
   ...
}
...

Pretty syntaxes are not there yet.

And this brings us to their "full" counterparts. Probably, together they should result in stream literals... But that's outside of current topic.

Edit: what's about denoting asynchrony effect by label(async)?

1 Like

dyn safety and fn pointers:

There are no place for state, produced by asynchrony or multiplicity in case of fn pointers => cannot allow.

Thereof, fn pointers only can have non local returns.

In case of explicit boxed dyn Trait, we allow Fn trait to accept only non local return effect (it does not mutate anything); FnMut is allowed to have rest of effects as long as their state is Unpin. AFAIK boxed things have stable location in memory even in face of allocators API, Unpin here is actually not necessary.

This is also the case for a representation of labels:

The destructors should be run before returning (setting the value, then jumping).

The information passed in the second repr. is indeed insufficient, my bad, but... - what if we do it like that:

(
   *const fn(*mut ResumeTy), /*a callback for setting where to place resume arg*/
   (
   *mut YieldTy /*place for a value*/,
   *mut *const fn()->bool /*next resume address, also tells whether parent coroutine should yield*/
   ) //`yield`info
)

Then:

  1. When a consumer invokes an effectful function, the first thing function does, is set up resume argument address via callback. A place for a next resume argument is set up by via this callback on each (effectful) yield, if the address has changed;

  2. When effectful function yields, it sets up yield value, and a pointer to code where to pass control (it returns).

Effectively, this is "runtime place resolution" like thing.

Lifetimes:

for now, because of the rules, labels always have either lifetime of the caller, or local stack frame lifetime.

In case of lifting the rules and allowing full non local control flow...

In this case we treat them as a structs with references (what they in fact are), any label has one lifetime parameter which can be specified like label<'a>(...) when it's necessary.

Lifetime in signatures are elided, if they are not specified (elision failed => annotation needed).

asynchrony effect syntax:

As of syntax for capturing async env., but not producing a future, I would suggest smth. like fn<async>(...)... as a notation. I don't think we should allow an async fn or an async closure to accept this effect.

What if a function 1 passes a closure (that does a non-local return) to a function 2, which in turn passes it to a function 3. Now the moment the function is run and the return is triggered it needs to run the destructors of function 3's local variables, then function 2's local variables and the function 1's local variables. How can it know it?

I'm not entirely sure that would work (it probably does? not considering the destructors), but anyway this looks a bit too overhead for the average case where callee and closure can just be inlined.

Edit: I'm sorry, re-read the definition of label, I updated it (at least, will) - there were no way to tell when to yield from parent coroutine.

This is necessary for dyn Fn(label(A->B)) and fn<async> only. Any other use case is probably to be inlined.

As of usage, consider an example:

gen fn exmp4(data: impl Iterator<Item=char>) -> () -> [char;2] {
   data.map( //imagine that this takes box
      gen |_: char| loop {
         let ch1 = yield; let ch2 = yield; // here we accumulate 2 characters.
         yield 'fn [ch1,ch2]; //here we yield these right from original context.
      }
   )
}

Given the types from signature, the label type here is:

struct Label<'a>( fn(&'a mut char),(&'a mut [char;2],&'a mut fn() -> bool ) );

original generator uses it like following:

...
let mut r_place: &mut char;
let mut rp_upd = |ch: &mut char| r_place = ch;
let mut f: fn();
let mut yield_slot: [char;2]
//construct a label (all the variables from above),
// pass it into a (boxed) closure; 
//closure inits everything right
while let Some(ch) = data.next() {
   *r_place = ch; //pass data in;
   let should_yield = f(); //resume inner coroutine, at this point `yield_slot` may not be initialized;
   if should_yield  {yield yield_slot;} //else { continue; }
}
...

Non local returns

By example:

fn c(clo: fn()->u64) -> u64 {
   clo();
   unreachable!()
};
fn b(clo: fn()->u64) -> u64 {
   let t: DropType = ...;
   c(clo) + 1
}
fn a(){
   let res = 'lab: {
      let clo = || {return 'lab 10;};
      b(clo);
   };
}

The best solution I could come up with is magic, which is the following:

  1. The closure has captured a return label. => closure's struct has it as a field.
  2. The function b got this closure, sees* captured label, lets call it 'cap => we generate a new label 'replace inside of b, where we:
    1. store 'cap locally;
    2. run destructors;
    3. proceed up by the label 'cap.
  3. When the closure clo is passed down to c it's captured label 'cap is replaced by 'replace.

* Such insight is only possible if such closure comes as a generic parameter where (fn a<F: Fn(label(u64))>(f: &F)...) => no effects for function pointers at all :frowning_face:.

The example then becomes:

fn c<F: Fn()->u64>(clo: F) -> u64 {
   clo();
   unreachable!()
};
fn b<F: Fn()->u64>(clo: F) -> u64 {
   let t: DropType = ...;
   c(clo) + 1 //new, anonymous, label is created => closure is patched.
}
fn a(){
   let res = 'lab: {
      let clo = || {return 'lab 10;};
      b(Box::new(clo)); //original label
   };
}

The advantages of the approach:

  • it chains;
  • it's not (that) expensive;
  • it's transparent;
  • it's still possible to use with existing APIs like map<F: Fn...>.

The disadvantages of the approach:

  • it's transparent:
    We start to generate silent code, it's akin to drop glue, but feels fishy...
  • impossible to use with function pointer and trait objects (without a crazy additional implied trait, which I'll try to design). (I (still) want to allow dynamic use case, just for HOFs and (in future) dylib use trait objects) but the approach below is suitable for them.
  • special implicit(!) trait in trait object for extracting captured labels. only if we want to somehow abstract this magic.

In case of trait objects, I would like to use unwinding machinery for this.

Generator effects

Around a design of label representation I came up with few important points:

  1. how code which is only resumed\called by an fn() can know where the actual state of effectful function\closure is stored?
  2. the consuming coroutine should itself be responsible for providing effects (redirecting resumes) to an effectful code.

So because of these I have to (once again) alter the representation of label:

(
   //Same
   *const fn(*mut ResumeTy), /*a callback for setting where to place resume arg*/
   (
   //Same
   *mut YieldTy /*place for a value*/,
   //Changed
   *mut *const fn(*mut State)/*next resume address, takes state, returning from this indicates readiness to yield*/
   ) //`yield`info
)

So by the logic, now, calling a "resume" function pointer logically borrows corresponding effect. This borrow lasts until effectful function\closure doesn't return.
This mean that having two closures capturing the same label in the same time is not possible.

Now an interesting case:

gen fn a() -> u32 -> u32 {
	let genu = 'upper: gen |u64| -> u64 {
		let geni = |inp: usize| -> usize {
			loop {
				let b32 = yield 'fn 0u32;
				let b64 = yield 'upper 0u64;
				println!("{} and {}",b32,b64);
				if (b64 + b32) >= 200 {return inp};
			}
		}
		//use geni somehow
	}
	//use genu somehow
}

How geni would be implemented?\

  1. geni has captured two effects ('fn and 'upper), thus borrowed two scopes, these borrows are activated only by calling the closure, and last until it returns.
  2. geni, because of coroutine transform, has got its State, which is stored inside of surrounding state (if we were to box genu, this state would reside in an allocation).
  3. on each generator's yield-resume pass geni is resumed with a pointer to its State, resume argument is passed implicitly.
  4. by the time geni returns, scopes it has captured are released, and are again available for use.
    This means that the next user will have to call the callback with pointer to a place it wants the resume argument to be placed, so that consuming coroutine will be able to redirect resumes there.
    The price is one function call and one memory assignment.

Hey, I've totally lost the plot I'm afraid..

So this gets converted into some sort of enum? Like async fn? What trait does the resulting enum implement? How do you use it?

P.S. This is an interesting way to spell the "type", with an extra -> ..

I know that I made thread messy, sorry for that.

There are basically two cases:

  • everything is unboxed, inlined:
    In this case compiler directly knows where any yield statement yields, and generates one-big-state-machine to store all the logic, just like async fn does.
  • something is not\ cannot be inlined. The very case where we can't inline effectful functions is usage in trait objects.
    That's why I'm playing with labels - I want to make a mechanism which allows to "borrow" the coroutine context( i.e take its resume and yield from it) dynamically.
    Then, there might be cases when optimized would rather not to inline state machine, so described mechanism will come to play.

The example you specifically referred to is going to be converted into one state machine.

As of there in fact no enum involved (initially I imagined that there would, but that just doesn't play well with what @SkiFire13 brought up here), so no use :grinning_face_with_smiling_eyes:.

I'm the slow one here, please bear with me.. I really want to understand so this may take a long time and entail lots of questions from me.

Is the state machine not an enum? State machine genereated for async fn is an enum. What is this state machine? What trait does it implement? How do you use it? I'm trying to understand the simple optimistic case that does work. I'm less interested in the complex case with multiple layers of functions.

No, usually it's enum. Yes. Same, the state which generator effects produce is state machine, backed by enum. However, it's not intended to be creatable by hand, instead this state is stored inside of that of consumer function.

By example:

gen fn consumer() -> () -> bool {
   let effectful = || {yield 'fn true};
   ...
} //it produces an `impl Generator<...>`, which is enum containing state

Here, effectful also has state - its state is backed by an enum which is stored inside our impl Generator<...>. This way unboxed effectful closures are just parts of their owners state machines.

If we:

  1. extract out closure in separate function clo
  2. and will instead write this:
...
let effectful = Box::new(clo('fn));
...

then we will work with dynamic effectful closure(or function, which in case of lazy evaluation strategy can be turned into kind of closure), as specified above, in last post regarding labels.

Is Generator exactly same as Generator in std::ops - Rust ? I'm sorry I'm not sure.. is this already part of the language? Is there already some way to invoke it? How would I call such an fn?

That's not exactly in scope of thread's topic, but I assumed the generators to have such trait:

trait Generator<R=()> {
   type Yield;
   fn resume(
        self: Pin<&mut Self>, 
        arg: R
    ) -> Option<Self::Yield>; //None is for end of execution.
}

gen fn as a notion is also not entirely new design.
The used syntax is gen fn(InitType)->ResumeType->YieldType. There are no return type.

An example of usage:

gen fn g(msg: &'static str)->u64->String{...};

fn main() {
   let mut generator = g("Hello world"); //got a state
   //the we have to `Pin` it
   let mut pg: Pin<&mut _> = Pin::new(&mut generator); //or unsafe unchecked version
   //now we can resume the gen
   pg.resume(0) //gives Option<String>.
}

The feature is on nightly, under development, works akin to what I've shown.

You forgot the Return type

What if ResumeType is fn()? That would become ambiguous

I'm in favor of MCP-49 and gen fn's extended to support specifying resume type, so the core is in fact generalized semi-coroutines while user still faces pretty syntax.
Also gen closures can be made to only specify types, not the bindings, in headers(|here|): we walkaround the first yield problem, but have to decide when the code before the first yield is executed...

My take on the topic is that Generator trait would only be used in case of mixing both yield and return, otherwise FnPinMut (not both at the same time).

Fair point. We solve this by requiring a mandatory round brackets around the type with only exception of () - the unit type.

Not to be rude, but this design is an entirely different topic - I intentionally stated that these are asumptions. If you want to discuss it - let's proceed to a different thread.

I would rather ask you:
What if we do non-local returns by stack unwinding? Given that we know:

  1. The precise location of an "exception" handler is known from label; (no search for handler)
  2. The transfer of return data is done by pointer write prior to unwinding process. (no data transfer and "exception objects")

If I understood correctly, only thing it has to do is to just unwind all stack frames prior to one where we returned and call their destructors.
Would that reduce unwinding overhead down to tolerable?

Also, what would happen if we do non-local return from a destructor?

The only way this can happen, is via a call to closure that captured a label.

In this case we obviously don't run a destructor further, leading to logical and soundness issues.

There are no clear way to mitigate it.

At least:

  1. runtime traps for this process. (debug builds only)
  2. make return to not actually pass control to a label, but still have data write happen.
  3. ???

As a last resort we can say that doing non-local returns from inside of destructors is UB.

Edit: As an another way we can make non-local returns a kind of deferred statement:
When we face such a return, we

  1. write returned value by ptr::write,
  2. store an address where we are supposed to return, *
  3. proceed(continue) running destructors of current stack frame (whether non-local return was triggered inside of a destructor or in plain code), **
  4. when destructors are ran, we unwind stack frames further, running destructors in their scopes until we reach destination of a label. **

* We need to store the address somewhere, so it's accessible by unwinder:

In register

  • In an llvm one?
  • One of a real processor's?

In memory

  • no thread-locals or heap, because no_std crates don't have this.
  • no globals, cause this is not thread safe.
  • somewhere on stack...

** If we are in situation when a code run by unwinding lib triggers another non-local return, we don't start a new unwinding process, but just change the destination address of current, this way only the last non-local return issued takes effect - this both simplifies implementation and makes sense.