Proposal: Bounded continuations - non-local control flow without the mess

I’d like to propose a mechanism for exposing stack-unwinding as a bounded continuations API in Rust. This would make available a limited, type-safe, and explicit non-local control flow mechanism. I believe the mechanism proposed would allow for first-class non-local control flow, but without many of the concerns that have been expressed about classical exceptions.

I’ll lay out the basic signatures in the API first:

API & Usage

A continuation type, parameterized to a value type, which has a divergent resume method on it

pub struct Cont<T>;
impl Cont<T> {
  pub fn resume(&self, value: T) -> !;
}

A top-level function, call_cc of the following signature:

pub fn call_cc<F, T>(f: F) -> T
  where F: FnOnce(&Cont<T>) -> T

In usage, this would look like the following:

...
let result = call_cc(|cont| {
    ...
    if some_condition {
        // Equivalent to "return value;"
        cont.resume(value);
    }
    ...
    // Pass continuation to some helper function
    // which may call cont.resume(), which would
    // perform a non-local return from this closure.
    try_compute_result(cont);
    ...

    // Return from the closure without using "cont"
    value
});
...

Benefits

Firstly, this API leverages Rust’s type system to ensure that the reference to the continuation cannot escape the dynamic scope of the closure activation. The closure receives an immutable reference to an opaque Cont<T>, and Rust’s borrow-checker semantics will prevent that reference from escaping.

Secondly, every instance of a Cont<T> corresponds to exactly one occurrence of a continuation resume point on the stack. Exceptions have the downside that the connection between the throw and the catch is opaque and nonspecific - a catch can occur basically anywhere up the call stack, and the thrower has no idea or real control over where it’s caught. This approach doesn’t exhibit that problem since every Cont<T> value corresponds to a specific instance of call_cc in the callstack.

Thirdly, this approach avoids the problem of a caller not knowing where exceptions can be thrown within a callee. Even with checked exceptions, a caller only knows that the callee may throw an exception of a particular type, not where that exception is caught and handled. This approach guarantees that the programmer can easily understand both which callee functions may invoke a non-local resume, and also an exact understanding of where those resumes lead. For example:

let result = call_cc(|cont1| { // *1
    let result2 = call_cc(|cont2| { // *2
        // The following call cannot perform any non-local returns,
        // since neither cont1 nor cont2 are passed in.
        some_func_1(x, y, z);

        // The following call may resume to *1, but not *2
        some_func_2(bar, bang, cont1);

        // The following call may resume to *2, but not *1
        some_func_3(cont2, quux);

        // The following call may resume to either *2 or *1
        some_func_4(cont1, cont2, ...);
    });
});

In all cases, the programmer writing the code is explicitly aware of which pieces of their code are able to perform non-local returns, and where those non-local returns lead.

Issues Addressed

One of the big itches i’ve had with the Result<T,E> (and Option<T>) mechanisms currently available is that in many cases, it feels like I’m forced to use them for the cosmetic purposes of wrapping values which I want “bubbled up”.

In many cases, what I want (conceptually) is non-local control flow where I resume to a previous point higher up the call stack, but the only way I have of expressing that is to wrap values with disjunctive types like Result or Option, and then explicitly clutter my code to process those using try! macros, match, or if let.

Making stack-unwinding explicit via this mechanism would address many circumstances where I’ve had to begrudgingly wrap values with disjunctives and manually thread them up the call stack.

It would also reduce compiled code size in these circumstances - fewer explicit conditionals for the sole purpose of bubbling up values. Additionally, it would enhance performance by reducing the number of branches in the code included solely for the purpose of bubbling up values.

8 Likes

Dear @kvijayan, Could you, please, please provide some example that shows how would you use it? Currently I don’t know why we need it.

Best Regards.

Sure. So one of the places where I ran into issues was when I was writing an incremental tokenizer, where the input buffer could be “incomplete”.

I’ll use pseudocode to describe this since it’s a bit clearer. At the top-level of the tokenizer, the next_token method would have the following signature:

Tokenizer {
  next_token() -> (Token | Error | EndOfInput | NeedMoreData)
}

Where we either get a next token, run into a tokenizing error (bad input), reach the end of input on a complete buffer (EndOfInput), or read the end of input on an incomplete buffer (NeedMoreData). If you get NeedMoreData as a result, then you wait until you have more input data, add it to the tokenizer, and do next_token() again.

Internally, the core of this code will revolve around a next_char() method that pulls out the next char.

  next_char() -> (char | End | Incomplete | Err) {
    if (self.cur < self.end) {
      // Try to parse a unicode char from the buffer.
      // Return err if you find an invalid UTF-8 sequence
      ...
          return *self.cur; // success path;
      ...
          return Err; // failure path, bad UTF-8 sequence
    } else if (self.buffer_complete) {
      End
    } else {
      Incomplete
    }
  }

In-between next_token() and next_char(), we’d have the actual tokenizer implementation. I’ll just throw down a simple one:

  next_token() -> (Token | Error | EndOfInput | NeedMoreData) {
    // Read first char.
    let ch = match self.next_char() {
        End => return EndOfInput,
        Err => return Error,
        Incomplete => return NeedMoreData,
        ch => ch
    };
    if (is_ident_start(ch)) { return self.read_identifier(); }
    if (is_digit(ch)) { return self.read_number(); }
    if (is_whitespace(ch)) { return self.read_whitespace(); }
    Error
  }

  read_identifier() {
    loop {
      let ch = match self.next_char() {
          End => return Token(Identifier),
          Err => return Error,
          Incomplete => return NeedMoreData,
          ch => ch
      };
      if is_ident_cont(ch) { continue; }
      if is_whitespace(ch) { self.rewind_char(); return Token(Identifier); }
      /* No valid next char. */ Error
    }
  }

  read_digit() {
    /* similar to read_identifier */
  }
  read_whitespace() {
    /* similar to read_identifier */
  }

The key point is this: each arm of the matches above, corresponds to a conditional in the code. In every single case where we match on self.next_char(), the handling of Err and Incomplete are the same: bubble up the result. We can write a macro to help with this, but it doesn’t hide the underlying problem that these branches are present in the emitted code, and clutter up the code cache, instruction count, and stress the branch machinery. Any tokenizer implemented like this will spend a significant amount of time in the next_char() == Err and next_char() == Incomplete checks.

Additionally, conceptually what we are trying to do is a non-local return from next_char() when we see an incomplete buffer or a bad UTF-8 sequence. We know for a fact that next_char() is being called as part of fetching the next token, and we know for a fact that the thing that should happen when we get to the end of an incomplete buffer, or we see a bad UTF-8 sequence, is that the tokenization should be stopped entirely where it was started. Semantically, non-local return is what the programmer is trying to express, but the language doesn’t afford the programmer the tools to express it directly. We’re forced to bundle up these as values in a disjunctive type, and then forced to manually bubble them up through a call-stack of read_whatever() calls.

So there’s both a performance and a readability issue here. Custom macros help with the readability, but not the performance.

With proper first-class support for non-local returns, we can express this very explicitly:

next_token() -> (Token | Error | EndOfInput | NeedMoreData) {
  call_cc(|cont| {
    let ch = match self.next_char(cont) {
      End => return EndOfInput,
      ch => ch
    };
    if (is_ident_start(ch)) { self.read_identifier(cont); }
    if (is_digit(ch)) { self.read_number(cont); }
    if (is_whitespace(ch)) { self.read_whitespace(cont); }
    Error
  })

  next_char(next_token_cont) -> (u8 | End) {
    if (self.cur < self.end) {
      // Try to parse a unicode char from the buffer.
      // Return err if you find an invalid UTF-8 sequence
      ...
          return *self.cur; // success path;
      ...
          next_token_cont.resume(Error); // failure path, bad UTF-8 sequence
    } else if (self.buffer_complete) {
      End
    } else {
      next_token_cont.resume(NeedMoreData);
    }
  }

  read_identifier(next_token_cont) {
    loop {
      let ch = match self.next_char(cont) {
          End => next_token_cont.resume(Token(Identifier)),
          ch => ch
      };
      if is_ident_cont(ch) { continue; }
      if is_whitespace(ch) {
        self.rewind_char(); 
        next_token_cont.resume(Token(Identifier));
      }
      next_token_cont.resume(Error);
    }
  }
}

We’ve eliminated two type variants entirely (Err and Incomplete), that were only there for the purposes of bubbling up those results. We’ve also eliminated two comparisons and two branches from every call to next_char (which will be called at least once for every character in the token). Comparisons and branches that will be extremely rarely taken, but still pollute the runtime execution of the tokenizer.

This is just one example, but in general, non-local returns allow us to express the concept of “I’ve conceptually reached the termination of some computation that was started somewhere up the call stack, and I’d like to yield that termination value now”. It has bonuses of emitting smaller code (fewer conditional arms in the return-value checks of method calls), and faster code (fewer branches in the branch predictor table, comparisons eliminated).

1 Like

I can’t understand. Could you simplify it? For example, I don’t know what self.next_char(cont) will return.

Ok. I can’t do that immediately - have some more pressing work to take care of. I’ll write up the example with proper Rust syntax, but it might be a day or two. Is that ok?

Of course. If your example will be simple it is good. Also I’d like to see comments above functions. What functions returns for what case.

Do I understand correctly that this only allows for non-local returns, and not the other fancy continuation-based constructs that a callCC enables in other languages, like expressing non-deterministic computations?

You are telling about some callCC. What is it?

For those that haven’t seen call/cc before, this explanation was useful to me. Scheme and SML/NJ implement this feature. Essentially, the mechanism is stronger than exceptions, in that you could easily perform C+±style exception handling using call/cc. It is also quite confusing to people that are not accustomed to programming in a continuation passing style.

For us to support something like this, we’ll have to support garbage collected stack frames. I think that is going to be the biggest hurdle for this kind of stuff.

2 Likes

While Scheme’s call/cc requires GCed call stacks, the bounded continuations proposed here do not. They can’t escape their stack frame, so are really just like a type-safe longjmp with proper unwinding.

4 Likes

That’s also my understanding, so probably no engines.

But still this proposed feature would be very helpful IMHO as shown in the code above.

I’m not familiar with bounded continuations myself, but a talk at Curry On 2015 (see also the accompanying blog post) discussed “scoped continuations” as a more general, composable, and imperative-language friendly alternative to monads for things like error-handling. Is that the same thing as these “bounded continuations”?

Not an expert, but I have some concerns about this; I don’t see how this achieves most of the stated benefits over panic!.

  • It’s not less destructive than panicking: you have to do this with unwinding, and unwinding means poisoning locks and such because you can’t know whether it’s safe to blindly unwind.
  • It’s only marginally more “obvious” than panicking: you can hide a continuation behind a closure, so you can’t always know where one might happen, or where it might lead.
  • I don’t see how it would improve performance, since panic! already imposes a perf penalty due to how it messes with the optimiser.

I mean, if I understand this proposal correctly, you could implement this right now by catching a panic with a unique payload type.

The one advantage from this interface would be that you can target a specific catch site, but I still don’t feel it’d be appropriate for “normal” flow control.

1 Like

I don't think so. As far as I can tell so far, this 'only' allows non-local returns, while scoped continuations let you do much more elaborate things.

I'm not sure what you mean by "destructive"; but semantically speaking, this should do the same thing as early returning/breaking out of a scope within a function, so yes, destructors will be run, but mutexes wouldn't be poisoned - just unlocked. Unwinding is possibly not super from a performance perspective though.

My concern is this:

fn do_thing<F>(f: F) where F: FnOnce() {
    begin_something();
    f();
    finish_something();
}

(Usually, this is in the context of temporarily breaking apart some value into multiple, disjoint borrows.)

With this, the call to finish_something() can be skipped without panicking. I don't see how you can introduce a totally new form of non-local control flow without breaking code like this.

Unless you do what panic! does and taint/destroy everything it unwinds through... but then we're back to "how is this different to panic!?"

1 Like

That seems a valid concern. “Called functions can return outside their callers” is both the good and the bad. (This seems like the kind of thing you’d use a type system to track - but I’m not sure how?)

I wonder if the other advanced control-flow things that’ve been under discussion like async/await and coroutines and so on don’t have analogous issues?

This is kinda getting into the difference between Go coroutines and Python coroutines. In Go, coroutine suspension can happen pretty much anywhere. In Python, it can only happen if you have an explicit yield or yield from.

Sorta like the difference between these continuations and, say, oh I don't know, try!. :stuck_out_tongue:

1 Like

Explicit stackless “coroutines” (i.e. what everyone else calls “generators”) are indeed free of unwinding problems just like Result skips exception handling nightmares.

It’s the transformation from depending on an intrinsic stack&code traversal to always branching on the result to decide which path should be taken next, that straightens things out.

Moreover, we can actually do this to panicking: add an extra pointer return value to all functions that could panic and switch to the cleanup path if not null.

cc @Manishearth who’s interested in this sort of arcana.

2 Likes

@eddyb I feel like your first paragraph is talking about semantics, and the next two are about implementation, which is orthogonal…? Or am I misunderstanding?

[quote="kvijayan, post:1, topic:4270"] It would also reduce compiled code size in these circumstances - fewer explicit conditionals for the sole purpose of bubbling up values.[/quote]

Not clear to me. Unwinding generates sides-tables, you may just be trading code size for table size here.

If less code is generated, you may lessen the pressure on the instruction cache... maybe.

Doubtful.

As far as I know Rust uses the Zero-Cost Exception model, in this model you have:

  • much better performance on the non-exceptional path
  • terrible performance on the exceptional path

It's fine for exceptions, since they are by definition NOT supposed to occur in general.

In your case, performance will probably suffer.

Now, beyond performance, I would like to note that the approach seems to lead to really clean code, and maybe sacrificing some performance is a good trade-off?


That being said, @DanielKeep raises a good point: once the Cont<T> is embedded inside another value (struct, lambda, ...) it is completely invisible to the user of that outer value that a method call may diverge.

This is annoying because this means that unlike your claim the control-flow is NOT always visible at a glance.

1 Like