pre-RFC: Allow return, continue, and break to affect the captured environment

I’d like to see something like break 'outer;. This could be very handy in some cases. This shouldn’t conflict with the possibility of adding break with values later on though.

3 Likes

Often when I have found myself craving for a feature like this, it has been because handling an exceptional result or an error inside a closure is a pain. But this feature itself doesn’t fit into Rust as it currently stands because it may need to unwind deep from the stack to be able to return, and that is an exceptional and surprising control flow to anyone that has been handed over a closure – the same as panicking which we should discourage.

Instead, I wish that the standard library had some helpers to ease the common cases when handling errors inside closures. I’ve implemented a try_map and flip methods in crate try_map for mapping cases like Option<Result<T>> to Result<Option<T>> to be able to handle errors from inside the closure as the outer layer. Stuff like that would be nice to have in std too.

I think that break 'outer; would be nice (I just some time ago felt the need for something like this, and used returning from an inline closure a stop-gap measure, and got scolded by clippy.) but it would need to be essentially a local control flow.

Not surprisingly, this is something that I’ve thought about a lot, though I don’t have a ready answer. I agree that the current setup doesn’t play well with Result propagation – basically every consumer of a closure must anticipate whether its callees will want to propagate errors. But I’m very wary of trying to “split the world” of closures.

First, I want to point out one subtle consideration. Allowing callee closures to “break out” across stack frames magnifies exception safety concerns in a big way. You may recall when we were discussing catch_panic that one of the major concerns was that, so far, safe Rust could by and large ignore exception safety (this is not true for unsafe Rust). Exception safety essentially means “keeping in mind that a function might panic” – put another way, it means that all of your cleanup needs to go in destructors. If we add TCP-preserving closures, then if one of them chooses to return, the same considerations apply, only they apply outside of a panic scenario and just during regular execution (the possibility of this becoming commonplace is precisely why catch_panic was controversial). So imagine I have this function:

fn foo<F>(&mut self, f: F) where F: FnOnce() {
    self.counter += 1;
    f();
   self.counter -= 1;
}

This kind of logic is fine today, but if f() may in fact “skip” the remainder of my function, then self.counter will get out of date.

This leads me to conclude that a TCP-preserving closure really wants to be another kind of return value. This also means we can avoid building “longjmp” into the language. The return value might look something like this:

enum ControlFlow<T> {
    // indicates that a TCP-preserving closure did not return, instead executing a break/return/etc
    Nonlocal,

    // indicates the TCP-preserving closure completed normally
    Local(T)
}

Now the intermediate frames can propagate this naturally, just as they do a Result:

fn foo<F>(&mut self, f: F) -> ControlFlow<T>
    where F: FnOnce() -> ControllFlow<T>
{
    self.counter += 1;
    let r = f();
    self.counter -= 1;
    r
}

This seems nice, but there are a couple of (maybe) problems. First, we want the compiler to understand that when I do break in such a closure it has special meaning. This suggests we probably want some kind of syntax to express that (seems reminiscent of coroutines to me). This special syntax would also allow the compiler to insert some code that wraps the location where the closure is used to execute the non-local control flow (e.g., returning a value etc). We actually had basically this mechanism long ago and it worked ok; I imagine we’d consider something similar.

Second, it means that these closures can easily interoperate with functions that don’t understand their special semantics. This is good and bad. It’s nice that existing closures work, but they don’t know to interpret ControlFlow specially and avoid extra work. So you might imagine that you want different traits, which will start to split the world into “control-flow-aware” and not control-flow-aware. Sounds vaguely dystopian for me.

Finally, one major use case here (I believe) is things like iterators etc, and their signatures already exist, so we should consider if there are ways to backwards compatibly allow them to accept either regular closures or TCP-preserving closures.

So, I don’t have anything close to a solution here, but I suspect that the right one will (a) involve some return types that are understood; (b) allow some kind of syntax to have callees “opt-in” to a different interpretation of return/break and © consider how to distinguish an ordinary closure from a TCP one, and whether we can gt some kind of back-compat.

6 Likes

Your enum ControlFlow<T> bears a strong resemblance to Result, and I have a feeling that a lot of the cases where people want early returns are going to involve ?. I wonder if we might get 80% of the desired feature from just making sure that all the combinators in the stdlib do something sensible with closures that can spit out a Result.

For example: BufRead::lines() iterates over Result<String>, so it's natural to want to do this:

fn get_all_uncommented_lines<B: BufRead>(file: B) -> io::Result<Vec<String>> {
    Ok(file.lines().map(|l| l?)
        .filter(|l| !l.is_empty() && !l.starts_with("#"))
        .collect::<Vec<String>>())
}

This doesn't work for several reasons, starting with "|l| l? is an identity function". You have to write either

fn get_all_uncommented_lines<B: BufRead>(file: B) -> io::Result<Vec<String>> {
    let lines = file.lines().collect::<io::Result<Vec<String>>()?
    Ok(lines.into_iter()
        .filter(|l| !l.is_empty() && !l.starts_with("#"))
        .collect::<Vec<String>>())
}

or

fn get_all_uncommented_lines<B: BufRead>(file: B) -> io::Result<Vec<String>> {
    let mut ret = Vec::new::<String>();
    for line in file.lines() {
        let line = line?;
        if line.is_empty() || line.starts_with("#") { continue; }
        ret.push(line);
    }
    Ok(ret)
}

I'm not currently seeing a way to make .map(|l| l?) DWIM, but if we could think of one, that might be enough. It needn't be spelled .map(|l| l?), I think people might be happier with .try_filter!() even...

You could also do something along the line of:

    Ok(file.lines()
        .filter_ok(|l| Ok(!l.is_empty() && !l.starts_with("#")))
        .collect::<Result<Vec<String>, _>>()?)

Douglas Crockford writes about it: http://java.sys-con.com/node/793338/

It turns out that the Correspondence Principle is descriptive, not prescriptive. He uses it to analyze the (by now forgotten) Pascal programming language, showing a correspondence between variable definitions and procedure parameters. Tennent does not identify the lack of correspondence of return statements as a problem.

3 Likes

Why doesn’t this work?

// Inside FromRequest::from_request method from Rocket
let connection = match get_connection().map_err(|err| Outcome::Failure((Status::InternalServerError, error.into())))?;

And my thoughts on changing the semantics of return or anything similar is that Scala tried this and just ended up with people being advised to never use return. I think I’ve used it once in production code and that was only because Scala also doesn’t have a way to break from a loop.

I’m mostly in favor of ‘simply encouraging people to design better API’s with closures in them’.

I’ll add my voice here against this proposal. I don’t believe return inside a closure should return the outer function – you created a new scope for return when you make a closure. It’s actually used in some cases, where otherwise you would end up with a bunch of if/else chains that look ugly.

Now if you wanted to add a scope parameter to return allowing it to specify which scope it’s returning from, sure. But that’s a whole different ball of wax.

4 Likes

This is interesting. I googled "scala return" and this is the first relevant link I saw: tpolecat

According to that post, there are two unforced errors in the design of Scala's return that make it more problematic than it inherently needs to be due to the nonlocal control flow itself:

  • It is implemented by throwing an exception... which can be caught by user code, including accidentally by a "blanket catch"! This is obviously really bad. (This kind of design error shows up all over the place, such as how Haskell lets you catch a ThreadKilledException, and so on. In any case, if we want to do this, we should do it so that catch_unwind can't accidentally catch it. Which means it probably shouldn't be implemented using unwinding at all, because an unwind passing through catch_unwind into FFI code would also be really bad.)

  • A non-locally-returning closure is allowed to pass out of the scope which it was to return from, and then be called, resulting in the aforementioned exception blowing up the program. I suspect this would not be an issue for Rust, because only borrowed closures would be allowed to nonlocally return, and they would not be able to escape the given scope. Am I right?

The rest of the post just seems like surprise at return returning nonlocally in the first place, when locally might've been expected (a concern to be sure), and a preference for functional style (no separate control flow) over procedural style (using continuations a.k.a. control flow).

If you have any other references for why return in Scala is considered harmful, which are about problems inherent to nonlocal return itself rather than Scala's particular design mistakes, I'd be very interested.

In Kotlin, non local returns work out quite nicely (docs 1, 2)

They are allowed in lambdas which are explicitly marked for inlining and are quite important for the language as a whole because a lot in Kotlin is based on [extension] lambda DSLs.

2 Likes

This works in Kotlin because only functions marked as inline can have control transferred through them in such a way. Furthermore, closure parameters to them may not be stored unless marked noinline.

2 Likes

I wouldn’t mind having a way to implement solution with those restrictions and perhaps without using the return keyword, though obviously the keyword used is a bikeshed. :stuck_out_tongue:

Related: Ruby reject! accidentally quadratic.

2 Likes

I’d really like to avoid having to consider non-local exits when writing any function which takes a closure.

2 Likes

I learned about this topic through the convenient "Rust-internals summary". I think that the proposed design does not work, and will explain why (both on a practical and theoretical level). The topic may have been left to sleep instead, but in my experience language questions tend to come back as they stay on people's mind, so it is interesting to gather feedback and criticism in a central place.

First, I would like to point out that the proposal's justification in terms of a Correspondence Principle is built on a misinterpretation or misuse of said principle.

The principle says: abstracting a function and then immediately calling it ((|| { ... })()) is the same as running the code directly. This is a special case of an operation that has been know since the 1930s, called beta-reduction (the greek letter β), that says that a function call can be understood by replacing it with the function body, where the formal parameters have been replaced by the value of the call's arguments. This is all just fine. But the following translation, part of the first post, is not:

For a given expression expr, || expr should be equivalent.

Notice that on the right side, we are now talking about (|| expr), not (|| expr)(). This is completely wrong! This equivalence fails as soon as expr is a printing statement for example (the left expression prints, the right doesn't). This might be a typo, but this typo is necessary to understand the proposed code example

fn main() {
  // `return` would be a divergent expression and return from `main`
  let id: i32 = "321".parse().or_else(|| return);
}

In here as well, || return is not applied.

Then, a general point on language design: as soon as you start doing things that are not obvious, discussing between several choices (where to return, for example), a generally safe approach is to start naming things. goto is a feature worth complaining about in various ways, but it did get one thing exactly right: when you have non-local control flow, it is safer if you name the places where you want to go (labels). If you want to introduce non-local control flow in Rust, any rule such as "well if it's an anonymous function we could say that return goes to the outer one" should be killed with fire. If you don't do the normal things, you should be more explicit about it, through naming. (return from main would be an idea, but it would be even more proper to explicitly name the return point independently from the function, as this design also scales to label for loops break/continue etc.).

(This is all completely orthogonal to the implementation considerations, and the design considerations of whether adding control effects is a good idea for your language.)

Finally, a bit of theory. There are theoretical programming languages out there with a notion of "co-variable" (the dual of a variable) that represent the return points of a program. They are often called "abstract machine calculi"¹, and are studied for two different reasons: they allow to understand fundamental things about programming languages (for example how to combine call-by-value and call-by-name evaluation order in a type-directed way), and they allow to give a beautiful computational meaning to classical logic (excluded-middle, or proof by contradictions, are non-local control operators); pretty cool.

In those theoretical languages, it is natural for functions to declare not only variable for their formal argument(s), but also one co-variable for its return point. For example¹ in System L, the natural notation for anonymous function (usually (λx.t) in lambda-calculus) is (μ(x·α).c), where α is a co-variable denoting the return point of the function. Now, it is known that while this syntax can express all of classical logic (control operators), it is possible to restrict it to recover intuitionistic logic (pure functional programming), by imposing that co-variables be used linearily (consumed exactly once, as a linear resource).

You can restrict your type-system to check this linear restriction on co-variables, but it is also possible to express this purely syntactically, by imposing that there is a single co-variable, traditionally written ★ (star), that occurs in the program. each binding occurrence of ★ shadows all previous occurrences, making non-local return syntactically inexpressible. This explains why the λ-calculus does not need to talk about co-variables explicitly: you can always implicitly assume that all co-variables are ★.

Introducing a control-flow-break return in this model is a reasonable extension: it implicitly refers to the ambiant co-variable ★, and there is only one. (break and continue muddy the picture a bit as you would distinguish function co-variables and loop co-variables.) But as soon as you want something more expressive, to be able to return to "other functions than the current one", you should absolutely resort to explicitly naming your co-variables again.

¹: some (theoretical) bibliographic references would be Polarised Intermediate Representation of Lambda Calculus with Sums, Guillaume Munch-Maccagnoni and Gabriel Scherer, 2015, A dissection of L, Arnaud Spiwack, 2016, and all of Paul Downen's recent works with Zena Ariola and co-authors.

This does sound very theoretical but it coincides with actual language designs that get this right: in abstract machines, co-terms correspond to stacks and continuations, and co-variables thus correspond to continuation variables in the Scheme community. While call/cc's continuations are in general more powerful/expressive (and dangerous) than mere return points (they can escape and be duplicated), Racket for example has a call/ec construction for "escape continuations" that correspond to return points, and a let/ec construction that precisely lets you give a name to the current return point.

14 Likes

Finally, one major use case here (I believe) is things like iterators etc, and their signatures already exist, so we should consider if there are ways to backwards compatibly allow them to accept either regular closures or TCP-preserving closures.

Why not keeping the result type as existing (so no new enum) and let the new trait handle the cleanup?

pub trait FnOnceTcpPreserving<Args> {
    extern "rust-call" fn call_once_by_preserving_tcp<F>(self, args: Args,
        cleanup_closure: C) -> Self::Output where C: Fn();
}

pub trait FnOnce<Args> : FnOnceTcpPreserving<Args> { … }

fn foo<F>(&mut self, f: F) -> usize
    where F: FnOnceTcpPreserving(usize) -> usize
{
    self.counter += 1;
    let args : (usize,) = (self.counter,);

    // never called if f is not tcp-preserving
    let cleanup_closure = || { self.counter -= 1 };

    let r = f.call_once_by_preserving_tcp(args, cleanup_closure);
    self.counter -= 1;
    r
}

fn bar(&self) -> &'static str
{
    // called with a FnOnce(usize) -> usize
    let c2 = bar.foo(move |n| n * 2);

    // called with a FnOnceTcpPreserving(usize) -> usize
    let c1 = bar.foo(move |n| {
        if n < 10 {
            10 - n
        }
        else {
            // 'fn return' is used to exit the enclosing function:
            fn return "get out"
        }
    });
    
    …
}

To be clearer the new fn call_once_by_preserving_tcp(args, cleanup_closure) would just returned a normal value to the caller code if either no control flow break happens (keywords fn return in the example) or if the closure is just a usual one. If a break happens, call_once_by_preserving_tcp never returns but the cleanup_closure is called to run any finalization operation, then the flow is restarted at the level of the caller function that directly returns.

A hack to do this: https://docs.rs/control-flow/

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.