[Pre-RFC] Safe goto with value

Summary

Allow goto 'a and goto 'a expr to jump to a labeled block or block with arguments. Allow 'a: |args| { ... }; or 'a: { ... } to declare a label that can be the target of a jump or jump with arguments.

Motivation

The intent of this feature is to allow control flow patterns that cannot be expressed directly in terms of break 'a value, continue, and loops. This is needed when writing code with irreducible control flow, which is common in state machines.

Here is an example where we rewrite a recursive dfs algorithm as an iterative algorithm with a stack. The jumps in the resulting code cannot be expressed in terms of loop and match without added overhead.

fn dfs_recursive(graph: &[&[usize]], mark: &mut [bool], i: usize) {
  for &j in graph[i] {
    if !mark[j] {
      dfs_recursive(graph, mark, j)
    }
  }
  mark[i] = true;
}

fn dfs_iterative(graph: &[&[usize]], mark: &mut [bool], mut i: usize) {
  let mut stack = vec![];

  'iter: |mut it: std::slice::Iter<'a, usize>| {
    if let Some(&j) = it.next() {
      if !mark[j] {
        stack.push((i, it));
        i = j;
        goto 'start
      } else {
        goto 'iter it
      }
    } else {
      goto 'done
    }
  };
  'done: || {
    mark[i] = true;
    match stack.pop() {
      None => return,
      Some((j, it)) => {
        i = j;
        goto 'iter it
      }
    }
  };
  'start: {
    goto 'iter graph[i].iter()
  }
}

This is also useful for syntactic transfomations of other languages which can be expressed as a CFG but not necessarily with reducible control flow to Rust: MIR to Rust, C to Rust, LLVM to Rust. Even in the case of MIR, the printed syntax is very rust-like but has to invent syntax for goto because it doesn't exist in the surface language.

Detailed design

The syntax goto 'a and goto 'a value are new expressions, as well as 'a: block and 'a: |(pat: expr),*| expr (this builds on the feature for labeled blocks).

  • A labeled block executes immediately, while a labeled closure does nothing but can be used as a target for gotos.

  • Block and closure labels are function scoped, so any goto can target any label in the same function from the point of view of name resolution, but there are scoping restrictions on when such a jump is valid.

  • A goto 'a or goto 'a expr can target any labeled block, labeled loop or labeled closure provided that the scope of the goto expression (in the sense of MIR Scope) extends the scope of the label. That is, backward jumps are okay, as well as forward jumps, provided they do not jump over a let binding which would put a variable in scope while skipping that variable's initialization.

    • Variables in the goto scope but not the label scope are dropped in reverse order after the goto expression is evaluated but before the label is entered, as with all other non-local control flow.
  • The expression goto 'a targets labeled blocks, loop, while, and nullary closures; goto 'a (expr,*) targets labeled closures only. (for loops cannot be targeted because they have a loop counter that would violate the scope restriction.)

  • Labeled closures act more like blocks than closures, in that return targets the containing function, ? jumps out to the function (or enclosing try block), and so on. (You can wrap the body of the closure in another closure if you want to redirect these operations.)

  • Labeled closures implicitly have return type () (and the closure expression itself also has type ()); if control reaches the end of the block, the code after the closure is executed.

This is desugared to control flow on MIR, so in particular this is not an unsafe feature and does not circumvent the borrow checker.

Drawbacks and Alternatives

  • Gotos are considered harmful, after all. However, as they say, "this isn't your grandfather's goto". In particular, the ability to have a fully safe, type- and borrow-checked control flow construct means that this does not have the same dangers as its C namesake.

    • That said, wow this construct has a lot of bad press. Perhaps it's worth finding a different name just to get out of the stigma. Alternate suggestions: jump, tail_call, call_cc, or reusing continue (since the two functions can mostly coexist in the same keyword).
  • It is possible to get the semantic equivalent to this kind of goto using loop { state = match state { ... } } where state is an enum containing all of the labels and their arguments. However:

    • Currently, this is not optimized very well. #80475 might be able to solve the issues here without language changes, but sometimes the optimizer just doesn't do what we need it to, and if this is a critical loop (like the core loop of an interpreter), then it is better to be explicit about these things.
    • The direct goto version is better for borrow checking, which is implemented as a data flow pass and would be able to see through gotos but not state match loops like this. For example, the following code using the match state idiom is a borrow check error:
      fn main() {
          let x = (String::new(), String::new());
          enum State { A, B }
          let mut state = State::A;
          loop {
              state = match state {
                  State::A => { drop(x.0); State::B }
                  State::B => { drop(x.1); return }
              }
          }
      }
      
      but the equivalent with goto is not:
      fn main() {
          let x = (String::new(), String::new());
          'a: { drop(x.0); goto 'b }
          'b: || { drop(x.1); return }
      }
      
      because it's equivalent to straight line code that just does drop(x.0); drop(x.1);.

Unresolved questions

  • goto is not a reserved keyword, so there are the usual issues about keyword reservation, edition changes, etc. Maybe this isn't the final syntax anyway.

  • There is a syntactic ambiguity regarding goto 'a (expr,): is this passing expr or (expr,) to the 1-arg closure? My suggestion would be to always treat goto 'a (expr,*) as being a list of arguments, and goto 'a expr where the expr is not a tuple only works with 1-arg closures, even if it has a tuple type. So 'a: |a,b,c| ...; let args = (a,b,c); goto 'a args would not be legal.

  • Should labeled closures return ! instead? The presently described behavior makes labeled nullary closures vs labeled blocks most similar, but perhaps it would be better to enforce that they don't reach the end of the body anyway to avoid surprises about where the control flow goes after that (it certainly can't return to after the goto like a regular function call because these are tail calls, they forget their caller).

Future possibilities

This does not cover computed goto, which require some extensions to the type system. (You can simulate them with an array of function pointers but this incurs function call overhead.) Unlike this proposal, computed goto does not have a natural translation to MIR, so would require changes at that level as well.

There may also be a market for unsafe goto similar to C's, but I think that any goto that does not match the safe patterns here will be quite difficult to make not always trivially UB. Plus the goto considered harmful crowd will probably pick that version apart before it gets very far.

6 Likes

Rust has become as a reserved keyword.

I think that your example can be rewritten with it, instead of with goto.

I'm on mobile, so writing the complete example is tedious. But it can be something like this:

impl DFA {
    fn start(&mut self) {
        become self.iter(self.graph[self.i].iter())
    }

    // More states
3 Likes

Isn't this the same as your example? (Where's the overhead?)

fn dfs_iterative(graph: &[&[usize]], mark: &mut [bool], mut i: usize) {
    let mut stack = vec![];
    let mut it = graph[i].iter();

    loop {
        if let Some(&j) = it.next() {
            if !mark[j] {
                stack.push((i, it));
                i = j;
                it = graph[j].iter();
            }
        } else {
            mark[i] = true;
            match stack.pop() {
                None => return,
                Some((j, other)) => {
                    i = j;
                    it = other;
                }
            }
        }
    }
}
4 Likes

Note that anything with irreducible control flow does arguably have the core problem that goto does. Loop-break-value's RFC was accepted in part because it maintains the "progress of the process remains characterized by a single textual index" property, the lack of which is the core argument against goto in the famous paper -- and even that has proven too controversial to get through stabilization.

I'm skeptical of any direct goto feature (whatever the keyword), but things like the state machine use are definitely interesting. Perhaps other things could be explored first? Like encoding these with guaranteed TCO, or some syntactic form (or macro) that's more restricted and thus can be codegen'd better?

(For example, there have been complications with fully-general TCO, but TCO that's only guaranteed between a specific set of inner functions seems comparatively plausible. It reminds me of how let rec in SML bounds the problem to a known set of items.)

8 Likes

Hm, I hadn't heard about that. I'll add a section in alternatives about become. It looks like it would involve new functions though, which means that it has the same issues as other optimization-based approaches; it wouldn't be able to handle the lifetime example. Plus this arguably involves a much greater code change to support since you have to put all your locals in a struct and pass it around.

I'll concede the point on that example. The only advantage I can see in the goto version is that the graph[i].iter() code is not repeated. I will try to find another example, but suffice it to say, not all code can be expressed with reducible control flow.

I'm not really sure what "progress of the process remains characterized by a single textual index" means. It is always true that the state of the machine is given by the state of memory and the program counter. As that applies to source languages, that means you need to know the values in locals and the position in the code. That's just as true for irreducible control flow as for reducible. In particular, the match state idiom has equivalent behavior to the goto program, and goes through the same motions too, so I don't see how it is innately any easier to reason about, they are equivalent.

That said, perhaps this is an argument for keeping the goto name, if only to keep people from using it unless they need to.

For me, the killer feature of this incarnation of the goto proposal (not mandatory tail calls, not optimized match state) is how it interacts with the borrow checker. We will be able to do really cool intraprocedural analysis with this that simply isn't possible if everything becomes function calls, because function calls have additional semantics beyond just encapsulating code. Partially moved objects are only an intraprocedural concept, and stack protectors in stacked borrows mean that outlining is not always safe.

2 Likes

See the paper. It even addresses looking at the state of the machine.

8 Likes

I took another look at the paper, and David Tribble's annotations on it also shed some light on this terminology. In any case, I disagree with Dijkstra that one cannot find "suitable coordinates" for goto programs written in this style: the coordinates are the position in the program and the values of the locals (especially the label parameters). Dijkstra didn't have this kind of goto to look at though so it's hard to say to what extent he would agree with this characterization.

But as an academic who does software verification, I definitely disagree that one can't put invariants on goto programs to make them sensible - you just have preconditions on all the labels just as if they were function calls. Finding a suitable decreasing metric can be domain specific but that's no different than when dealing with while loops that aren't just bounded for loops. I think it is sad that most programmers have taken only one thing out of this article, about banning some syntax, when the points are about understanding invariants in code.

2 Likes

You may want to look at label-break-value. It would, however, be nice if labels were lexically scoped like everything else in the language.

We can rewrite anything goto-based into a label-break-value-based equivalent - which is lexically scoped.

Sure, that's why I left "some syntactic form" as another option here, not just TCO.

For example, if loop { state = match state { ... } } works but is optimized poorly, what might, say, a special combined special-fsm-loop-match state { ... } look like? Perhaps it could it continue to different arms? (Lots of bikeshedding possible here, obviously.)

As I read the proposal, it seems to me like it would, for example, enable the goto fail; pattern as mentioned in Tribble's post. It's not clear to me that that would be a good thing, even if it were a case that passed all the MIR validation checks for variable initialization and such.

Does it need to be that open? Could it be a more restricted syntactic form while still being sufficiently usable? Are there cases other than the FSM and external-CFG-translation examples that it would make sense for people to write using a fully-general goto?

3 Likes

I don't know whether writing interpreters counts as a distinct category of example; in that case, the complex control flow is not because the target language has complex control flow but because it has an explicit stack.

Actually, a lot of explicit stack examples have a tendency to need complex control flow. It's really hard to make the equivalent of the call stack's borrow structure in your own functions, because you get self-referential objects, but you are also stashing partial work in a stack and jumping to some other code. Manually written async fn's have all the same characteristics.

Why am I rewriting things the compiler does for me manually, I hear you ask? Sometimes the data layout isn't great, maybe I want information about the virtual call stack for my interpreter, maybe I need something that isn't really a Future but still yields like one, or it's a general coroutine / something not yet supported by Rust.

I'd be fine with that, since it's equivalent in power. One way I like to present this is as a mutual let rec construction in functional style:

let foo a = bar (a+1)
and bar a = if a = 10 then () else foo a
in foo 0

The important constraints here are that the "function calls" only appear in tail position. This is equivalent to the proposal, since these blocks correspond to a sequence of labels in a row:

'foo: |a| goto 'bar (a+1);
'bar: |a| if a == 10 { return } else { goto 'foo a };
goto 'foo 0

To transform in the other direction, you just take all labels with the same scope and coalesce them into a block, and put them into a mutual let rec which scopes until the end of the enclosing block.

I guess the equivalent in your proposed syntax would be something like

special-fsm-loop-match state {
  State::Foo(a) => continue State::Bar(a+1),
  State::Bar(a) => if a == 10 { return } else { continue State::Foo(a) },
}

While this could in principle support arbitrary values of type State in the continue expression, it is important for parity with the given proposal that if the provided value is a constructor like State::Foo, then it compiles to a direct jump to the corresponding match arm.

I don't think there is anything wrong with the goto fail; pattern, but I think most Rustaceans will instead reach for destructors if they want this. We already have a mechanism for LIFO cleanup blocks, and while yes, you can use gotos for this (you can use gotos for anything so this isn't particularly surprising), we can and should discourage the use of goto for anything where another language feature will also work.

Personally, I just want feature parity with MIR, at least as far as control flow is concerned. I'm not too fussed about the exact syntax; the syntax I've proposed here is the most logical I could think of but anything with equivalent power is fine by me.

2 Likes

Funnily enough I was mulling over a similar sort of proposal today too. Though mine was more about continue 'label; where 'label must be lexically "after" where you jump from (similar to what Soni mentioned).

Though I definitely think that for the DFA problem, mapping it with a guaranteed TCO loop { state = match state { ... } } or something might be better. This has been proposed by a form of "labels in variables" which GCC exposes as a compiler extension (iirc).

I think having an array of functions that are of the form FnMut(State, Input) -> State which either call the next one itself, or is guaranteed to TCO correctly, would solve this specific problem.

1 Like

Doesn't break-with-value completely cover this situation? You can write

'label: loop {
  ... break 'label; ...
  break
};

where break 'label is anywhere in a sub-scope. As long as you are jumping "down" and "out" in the code, this is sufficient. (You can also do "up" and "out" using continue 'label, although you can't pass a value with continue.) clippy::never_loop will get angry at you if you do this, but I use this trick quite often when the goto problem is sufficiently restricted that it applies.

Well sort of, continue 'label; would a sort of non-lexical form. But yes it is only a minor improvement.

Edit: actually I forgot, continue would have different drop semantics namely you don't drop things.

Won't that cause problems? In rust, everything that exits a scope drops all values in the scope except for the ones that are moved out. If you don't do this, you get resource leaks all over the place (it's not unsafe but it's certainly "unclean").

I don't really want to coop this post but since you asked ...

Consider the following code:

let x = 0;
continue 'foo;
let y;

if x > 0 {
   y = 10;
}

'foo;
// A

At point A the binding y is "maybe-initialized" so it cannot be either used or read from (since it is not mut. If you tried to use it, a compiler error would occur.


Consider the following code:

let x = 0;

if x == 0 {
    let j = String::from("hello world");
    let mut k = true;
    ...
    if k {
        continue 'foo; // A
    }
}

...

'foo;
// B

If the jump at point A occurs, then before that happens all relevant drops are called. Since the point that you are jumping to is statically known. This is a solved problem.


And finally, consider this code.

let x = 0;
let k: bool = ...;

if k { // A
    continue 'foo;
}

let y;

if x > 0 {
   y = String::from("eh");
} else {
    y = String::from("A");
}

// C
'foo;
// B

Here is another interesting case. Even if k is false at point A once the control flow has reached point B the binding y is still "maybe initialized" so it can not be read or written too (since no mut).

Now consider your point about drops. Since y is "unusable" after point 'foo (because after that point it might have gotten there by way of the continue. The drop would be placed at point C. This prevents memory leaking without having even more code being run after the jump.


At least this is what I had in mind. It is also the major reason that I didn't post it. These rules seem to be required but they are also subtle which may cause it to be frustrating. Namely I am not terribly convinced to its usefulness.

1 Like

I see. The goto proposal would not allow your third example, because you are jumping into a sub-scope, since y is introduced after the continue. However I had not considered that we could still support this, where all variables that have been brought into scope are considered unitialized at the language level, in the same way as let y;. In any case you can still make it work with my proposal by moving the let y up before the continue:

let x = 0;
let k: bool = ...;
let y;

if k { // A
    goto 'foo;
}

if x > 0 {
   y = String::from("eh");
} else {
   y = String::from("A");
}

// C
'foo: {}
// B

As for destruction order, I think you have it wrong. y is not destroyed at point C, it is destroyed after point B, when the enclosing scope ends, if it was initialized. Rust actually tracks initialization state with auxiliary variables so it can determine this. For an example that actually compiles on today's rust:

let x;
if cond {
    x = HasDrop;
}
println!("hi");
// x is destroyed here if `cond` is true

We use RAII transaction guards and label-break-value extensively here: datafu 0.0.7 - Docs.rs

Any goto can be replaced with a RAII transaction, and it'll instantly look a lot nicer and be more readable. Now if only there was a crate that made it easier to build RAII transaction guards. https://crates.io/crates/scopeguard exists but it's not designed for transactions - it doesn't expose a "forget" method on the guard, or anything designed to play nice with Result/Try. (huh, thought this was a different crate, oops.)

Anyway there's a lot that can be explored here but our opinion is that unscoped/function-scoped labels need to go, and we should have block-scoped labels instead. Honestly unscoped labels are just annoying to work with in our experience.

This is false. It is true for the goto fail; idiom, but RAII only helps for cleanup code that is executed in LIFO order relative to the scope in which it was defined. The examples where goto makes a difference are in state machines and similar things where the main control flow itself jumps between multiple different states. For the equivalent of finally blocks you should be using destructors, they are more likely to be used reliably on any exit from the block and so map better to resource cleanup activities (which is what they were designed for, after all).

I'm sympathetic to wanting to get rid of function-scoped labels; this proposal uses function scoped labels mainly because that's what is already implemented for loops and labeled blocks. The "correct" scoping pattern for labels here is equivalent to the let rec example:

// 'a, 'b, 'c not visible
{
  // 'a, 'b, 'c visible
  'a: { expr /* 'a, 'b, 'c visible */ }
  'b: || { expr /* 'a, 'b, 'c visible */ }
  'c: loop { expr /* 'a, 'b, 'c visible */ }
  // 'a, 'b, 'c visible
}
// 'a, 'b, 'c not visible

This is the same as scoping for item-likes inside a function. But note in particular that 'b has to be visible inside the body of 'a and vice versa, or else it won't have the full expressivity of let rec.

1 Like

Ah yeah for state machines you can use label-break-value.

I know this adds little to the general discussion, but the repetition can be avoided, too:

fn dfs_iterative(graph: &[&[usize]], mark: &mut [bool], mut i: usize) {
    let mut stack = vec![];
    let mut it;

    'start: loop {
        it = graph[i].iter();
        loop {
            if let Some(&j) = it.next() {
                if !mark[j] {
                    stack.push((i, it));
                    i = j;
                    continue 'start;
                }
            } else {
                mark[i] = true;
                match stack.pop() {
                    None => return,
                    Some((j, other)) => {
                        i = j;
                        it = other;
                    }
                }
            }
        }
    }
}

In fact, any state graph with 3 states that’s missing one (non-self and not back to the start) transition is reducible (in your case, 'start to 'done is missing)

// missing C to B transition (or missing B to C by swapping B, C)
fn no_c_b() {
    'a_end: loop {
        'a_c: loop {
            'a_b: loop {
                // state A
                /*
                goto A by continue 'a_b or continue 'a_c or continue 'a_end
                goto B by break 'a_b
                goto C by break 'a_c
                terminate by break 'a_end
                */
            }
            'b_c: loop {
                // state B
                /*
                goto A by continue 'a_c or continue 'a_end
                goto B by continue 'b_c
                goto C by break 'b_c or break 'a_c
                terminate by break 'a_end
                */
            }
        }
        'c_a: loop {
            // state C
            /*
            goto A by break 'c_a or continue 'a_end
            goto B IMPOSSIBLE
            goto C by continue 'c_a
            terminate by break 'a_end
            */
        }
    }
}

// missing A to C transition (or missing A to B by swapping B, C)
fn no_a_c() {
    'a_end: loop {
        'a_b: loop {
            // state A
            /*
            goto A by continue 'a_b or continue 'a_end
            goto B by break 'a_b
            goto C IMPOSSIBLE
            terminate by break 'a_end
            */
        }
        'b_a: loop {
            'b_c: loop {
                // state B
                /*
                goto A by break 'b_a or continue 'a_end
                goto B by continue 'b_c or continue 'b_a
                goto C by break 'b_c
                terminate by break 'a_end
                */
            }
            'c_a: loop {
                // state C
                /*
                goto A by break 'c_a or break 'b_a or continue 'a_end
                goto B by continue 'b_a
                goto C by continue 'c_a
                terminate by break 'a_end
                */
            }
        }
    }
}
1 Like