[Pre-RFC] Safe goto with value

Arbitrary jumps are usually not a good Idea... Unavoidable in Low-level languages, but Please Don't... A Case against the GO TO Statement.

1 Like

Is this a reply to something lower down in the thread? If not, my reading of the pre-RFC would describe these jumps as far from arbitrary. The section on Drawbacks and Alternatives addresses this directly.

1 Like

The main drawback we see with this is the non-lexical(?) scoping of labels, which is just confusing.

Also: label-break-value (not to be confused with loop-break-value) isn't stable because it feels too similar to goto. Maybe stabilize that first?

Indeed, this RFC subsumes label-break-value, in the sense that it assumes (and requires) that 'a: { code } is valid syntax. So that feature can be considered a baby version of this proposal. But label-break-value doesn't let you do anything that can't be done in other ways (it is equivalent to 'a: loop { break code }, except for issues around the target of an unlabeled break), while this version makes it possible to write code that is currently impossible to express.

I will go out on a limb and say that changing label scoping to item-like is independent of this RFC - the goto stuff can be implemented the same way in either case, and function-scoped labels are already observable on stable.

@JRAndreassen I have no interest in responding to links to Dijkstra's paper that don't come with more specific explanations of what you are taking away from it. This is a tired argument, and it is already addressed specifically in the RFC text. Also, I will be the first to agree that "arbitrary jumps are usually not a good idea", with emphasis on usually. The purpose of this feature is to serve the unusual cases. Indeed, I think a clippy restriction lint or similar is worthwhile to ban the use of this feature for folks who agree with Dijkstra here.

1 Like

This. Too much talk of control flow primitives (calling it ā€˜discussionā€™ would give it too much credit) is ultimately reducible (ahem) to subjective impressions and thought-terminating clichĆ©s like ā€˜is this like gotoā€™, without addressing any of the reasons why supposedly being like goto is bad.

That said, this proposal is bad, and the bad press goto has is very much deserved. The labelled-block feature at least maintains the hierarchical character of control flow, and allows reading code mostly sequentially, only introducing extra early exit points; all it added was the equivalent of a return statement scoped to the block. This proposal makes it difficult to visually establish where execution begins. This is already visible in the example in the motivation section, as Iā€™ll discuss below. This is not a good sign.

The ā€˜labelled closureā€™ syntax is especially misleading and inconsistent. The rest of the proposal makes you believe that goto 'a; 'a: X will execute X. But if X is a ā€˜labelled closureā€™, itā€™s instead equivalent to X(). And while youā€™re calling them ā€˜closuresā€™, you already special-case return in them to target the ambient function, unlike how it works with actual closures. Itā€™s also apparent to me they canā€™t actually be handled like closures in any other context. Otherwise, how do you expect this to work?

pub fn foo(f: impl FnOnce()) {
	if random() % 2 {
		f()
	}
}

/* crate boundary */

fn bar() {
	loop {
		foo('a: || { goto 'b; });
	}
	'b: || println!("Where is your god now?");
}

With an example like this, the obvious question that leaps to mind is: why not just write the straight-line code? I feel like this corner case is only going to come up in toy cases like this, where one state inevitably leads to the other, so you might as well refactor the code to merge them.

The DFS example is somewhat better at motivating the feature, but now the readability problems become apparent. Not being used to the syntax, it took me a while to realise that the algorithm starts at 'start and not 'iter, which appears first in the source code. Itā€™s also not too hard to come up with a goto-free equivalent. Itā€™s shorter, and itā€™s much easier to read too: itā€™s obvious where execution starts and why the algorithmā€™s called ā€˜iterativeā€™ in the first place. (Though on closer inspection, itā€™s just recursion in disguise.)

I think that would be my preference. But since creating an ad-hoc enum type can be a bit unwieldy, I think this could be wrapped in a new generalised looping construct:

fsm! $variant($payload: expr /*, ... */) {
	$variant($payload: type /*, ... */) => $body,
	/* ... */
}

This could be thought of as desugaring to the loop/match idiom with an ad-hoc-defined enum E whose variants are defined by the left-hand sides of each =>. Each $body would return the next state to transition into. A break expression would terminate the fsm! loop (of course, it should be able to take an expression to return); a continue with an expression e: E would enter state e. The initial state would be specified right after the fsm! introducer. (I am specifically not making it a keyword; I feel like this construct is perhaps too niche to warrant a keyword, though not niche enough to be excluded entirely.)

I am not sure if this would allow the compiler to be more cognizant of lifetimes to the point where the drop example compiles. But as above, Iā€™m sceptical that this will be actually an issue.

And finally, as a tangential remark,

Given how the labelled-block proposal has been received (despite formally being accepted, it has been filibustered into the ground at stabilisation), and how people were itching to call it goto from the very start, I think the name should be the least of your concerns. The biggest hurdle is convincing people that non-sequential control flow can be useful in the first place.

5 Likes

Thank you for the solid review.

It's true that these "labeled closures" are not really closures at all. That's simply a fact, regardless of the syntax used to denote them, so the open question is whether using closure syntax for them provides more helpful associations (for example, labeled closures are not executed when the surrounding block is) or more harmful associations (return doesn't target them). Incidentally, we could make return target them, but I contend that this behavior is strictly less useful than making it target the function, because of the double closure trick I mention in the RFC.

One way to address the goto 'a; 'a: X issue is to require that labeled closures get called with parentheses:

goto 'a; // can target 'a: loop, 'a: while, 'a: {}
goto 'a (); // can target 'a: loop, 'a: || {}
goto 'a (e); // can target 'a: loop, 'a: |x| {}
goto 'a (e, e2); // can target 'a: loop, 'a: |x, y| {}

I don't know whether it is better to use a space before the parentheses or not; if the syntax is to be modeled after function call syntax rather than break-with-value then maybe it's better to skip the space. Or even the goto itself? 'a(e1, e2) is a possibility, but I wouldn't be surprised if it conflicts with something.

Regarding the example, that would be a type error because 'a: || { goto 'b; } is an expression that evaluates to (), not a closure. Alternatively, it could be a statement, like let x = 1, meaning that it is a syntax error to use it directly as the argument to a function. You could put it in a block, as in foo({ 'a: || { goto 'b } }), but it would still pass () to the function. Note, however, that these examples are easily detectable by a lint that can explain the situation.

I freely concede that this is a potential for confusion, and I'm open to alternative syntaxes for "labeled closures" that retain the impression that this is a function-like thing taking arguments and able to refer to bindings from the outer context.

Note that 'start can come first, if you put it first. Originally the code was written to use an extra label and call 'start after the all the labels were declared, but this sort of thing is mostly up to the author to arrange, much like the order of function declarations.

Would you still think that iter comes first if it said fn iter(...) { }? Inner functions in rust are not that uncommon, but they suffer from the same issue.

That example was certainly oversimplified to be a roughly minimal example to substantiate the claim "the borrow checker does not treat these cases equivalently", not to motivate the added power that goto programs get. The DFS example is a much better motivation in that case, because when translating the original recursive program to a goto program a lot of the control flow linkage gets obscured by the transformation, and it is very hard to express one variable that is valid in states 1, 2, and 3 and another one that is valid in 2, 5, and 6, without goto making the connections explicit. If you lift all the variable declarations to the top level initialization checking becomes nearly impossible without a compiler-tracked goto.

Yes, it's the same algorithm, so it is "just recursion in disguise". But consider that the rewritten version will use less memory and can handle much deeper recursion, because it doesn't need to keep space for the return values (since there is only one kind of stack element in this version) or all of the registers in the calling convention. It's using the heap as the stack, and Vec handles growth automatically, so this can traverse billions of nodes deep without a "stack overflow".

Regarding fsm!: Like I said to @scottmcm, personally I don't really care what syntax is used, and if it helps grease the wheels I'm completely okay with accepting a macro-based syntax for this. addr_of! is a recent example of a compiler feature with unstable syntax getting stably exposed as a macro. But this would have to be a compiler supported feature, since you don't get the borrow checking stuff any other way, and the optimization stuff is still not yet supported and may never be sufficiently reliable.

With my PL researcher hat on, I am a bit flummoxed as to how to answer this question. Some programs don't have reducible control flow. This is just a mathematical fact - not all control flow graphs have that property that all structured programs have. So if your problem has an irreducible solution, then it can't be expressed today in Rust. It can in Haskell, it can in SML, it can in C, but not in Rust. There are some ways to encode something that may possibly optimize to something equivalent to the irreducible program, notably the state = match state approach and mutual tail recursion, but neither of these occur reliably.

I have a state = match state program in the core loop of an interpreter, and this interpreter's performance is currently the bottleneck of my whole program. It's hard to tell how much I stand to gain by getting this correctly optimized, but 10-20% would not surprise me. Perhaps others can chime in with their own experiences with programs that need complex control flow.

5 Likes

FWIW, I think the loop { state = match state { pattern is a pretty good candidate for further MIR peephole optimization. It wouldn't be a perfect solution, but it would improve the situation.

4 Likes

Somewhat surprisingly, given the example set in Fortran 77 for impenetrable and hard to trace use of gotos, many modern languages have goto some way or another, in many cases some may not realize it mostly because the processes of learning these languages, and how they've been documented, implicitly discourages using goto.

And many modern languages have garbage collection, shared mutable state, dynamic typing, whitespace-sensitive syntax, null pointers, the list goes on. Rust has none of these because they conflict sharply and irresolvably with its dual goal of safety and performance.

Putting a feature in a language just because other languages have it is the anti-thesis of progress. Sure, there are many features which are pretty much necessary for any sort of productive programming in general, and which have been proved to be more useful than dangerous over time. But then one should include those features with regards to their merits, not because every other language is using them.

Goto is pronouncedly not such a generally-useful (or more useful than confusing) tool, and arguing that we should have goto becuse I don't know, C# has it too amounts to getting causality and correlation mixed up.

5 Likes

On a purely syntactical matter, I suspect that a syntax such as

'label => expr

would be clearer than a fake closure, as it can be traced back to match expressions.

That being said, I suspect that we can go very far with

loop { state = match state { ... } }

loop takes a block...

could have loop match {} and compile the state transitions to goto internally, on a best effort basis? e.g.

loop match Thing::Start {
  Thing::Foo if thing => Thing::Bar,
  Thing::Bar if something => break false,
  Thing::Start if whatever => Thing::Foo,
  _ => return Err(...)
}

this could be neat we guess. usually when you do an interpreter tho you just have a large switch/case. we don't get why language support for state machines is necessary.

This was also my first reaction when I read the code. If goto was added to Rust, I think that:

  • those closure-like should look like something that is not executed immediately
  • the initial jump should be explicit
  • given what you need is more of a limited form of tail call optimization than unstructured goto, the goto call should looks more like a function call.

Are a pure experimentation:

  • all closure-like could be declared with 'label => |argsā€¦| { body }
  • the initial start could be an explicit goto 'start ()
  • parenthesis could be added to all goto 'label (argsā€¦) calls.
This gives for the initial example
fn dfs_iterative(graph: &[&[usize]], mark: &mut [bool], mut i: usize) {
  let mut stack = vec![];

  'iter => |mut it: std::slice::Iter<'a, usize>| {
    if let Some(&j) = it.next() {
      if !mark[j] {
        stack.push((i, it));
        i = j;
        goto 'start ();
      } else {
        goto 'iter (it);
      }
    } else {
      goto 'done ();
    }
  };
  'done => || {
    mark[i] = true;
    match stack.pop() {
      None => return,
      Some((j, it)) => {
        i = j;
        goto 'iter (it);
      }
    }
  };
  'start => || {
    goto 'iter (graph[i].iter());
  }
  goto 'start ();
}

Also, if such closure-like syntax would be added to Rust, I think that we should consider if we could re-use themĀ¹ to write closure that interact directly with the control flow of the parent.

Silly example
fn foo() {
    bar('self => |x| {
        if something {
            return; // return from foo directly
            // or maybe
            //    break 'self
            //    return 'self
            //    goto 'self
            // ā€¦ you get idea
        }
        // do other stuff
    });
}

Ā¹: if and only if we want to add them, Iā€™m not advocating for them


My whole post is nothing more than the idea of a thought, do not consider anything I said as something I would want Rust to become, just possibilities we should think about before probably discarding them.

I think a better syntax would be something like this:

let result = arbitrary_control_flow! {
    'start => goto 's1 42,
    's1: n: i32 => {
        if n % 2 == 0 {
            goto 's3 (n / 2, n + 1);
        } else {
            goto 's2;
        }
    }
    's2 => goto 's1 84,
    's3: (x, y): (i32, i32) => match foo(x, y) {
        None => x + y,
        Some(n) => goto 's1 n,
    }
};

Given that goto is rarely needed, I think that a built-in macro is sufficient. There's no need to further complicate the Rust grammar for a niche feature. The macro also has the advantage that it can have an expressive name and its own documentation on docs.rs (contrary to something like top-level 'label: || {} or a keyword combination like loop match, which would be more difficult to search for on Google).

I'd also prefer if these labelled sections behave more like match arms: Execution starts with the first section (which mustn't have an argument); there's no implicit fall-through. A section may have at most one argument (which can be a tuple).

4 Likes

One could translate this into some kind of become-pseudo-function-call syntax, e.g.

let result = arbitrary_control_flow! {
    initial state start() -> i32 {
        become state s1(42)
    }
    state s1(n: i32) -> i32 {
        if n % 2 == 0 {
            become state s3(n / 2, n + 1)
        } else {
            become state s2()
        }
    }
    state s2() -> i32 {
        become state s1(84)
    }
    state s3(x: i32, y: i32) -> i32 {
        match foo(x, y) {
            None => x + y,
            Some(n) => become state s1(n),
        }
    }
};

Thereā€™s probably also the open question of whether one would prefer

  • no return values at all (so nothing like the x + y expression in

    ), or

  • return values of arbitrary (but all the same type), where

    • the goto/become calls are type !, or
    • the goto/become calls are the same type as their return-type and must be in some kind of terminal/tail-call position

    (my code example above supports both of these)

If thereā€™s good reasoning about variable initialization, one could also ask whether goto with value is even necessary. Something like this could be sufficient

let result;
let mut n: i32;
let (mut x, mut y): (i32, i32);
// in case of integers itā€™s not possible,
// but these mutable variables could also
// be moved out of on the receiving side
arbitrary_control_flow! {
    'start: {
        n = 42;
        continue 's1
    }
    's1: {
        if n % 2 == 0 {
            x = n / 2;
            y = n + 1;
            continue 's3;
        } else {
            continue 's2;
        }
    }
    's2: {
        n = 84;
        continue 's1
    }
    's3: {
        match foo(x, y) {
            None => {
                result = x + y;
            },
            Some(n1) => {
                n = n1;
                continue 's1;
            }
        }
    }
};
// result is initialized at this point
2 Likes

This sounds like a good idea. If it turns out to be too restrictive, goto with value could still be implemented later, but goto without value is sufficient as a MVP, and maybe even long-term.

I don't like that this gives the syntax continue 'label a new meaning. Consider this code for example:

arbitrary_control_flow! {
    'start: {
        'label: loop {
            continue 'label;
        }
    }
    'label: {}
};

I guess the parser in this macro will need to track locally introduced labels and make sure that continue 'label does not get replaced by the ā€œgotoā€ in this case. Actually there is no ā€œmacroā€ since it has to be compiler-build-in, I guess the compiler can do this. The more relevant question would be if something like

'label: loop {
    arbitrary_control_flow! {
        'start: {
            continue 'label;
        }
        'label: {}
    }
}

is confusing.

Though there are other language features in Rust which do this and are discouraged through clever language design and documentation.

The goto is not something I'm advocating per se [as someone who's, for a project, very painfully studied legacy F77 routines for numerical analysis based around goto]. These algorithms were written at a time before using goto in high-level code was considered a sign of very bad logic. Bad logic meaning: a hot mess.

So my point was that while the example from F77 or Dijkstra is to forbid the goto [which Fortran famously did and which hasn't ever been correlated to present day outcomes for that language] since then the goto has largely survived and gone largely unnoticed through some combinations of historical accident, good language design, documentation, and developer best-practice. Does that make it a necessary feature of a language? No. Is it the evil Dijkstra and others foretold? Yes. Can it's harm be contained? Yes.

The macro syntax used in the last few posts is growing on me. I think that helps to signpost the weird closure syntax enough that people won't confuse them with actual closures, and it also allows for more function-call like syntax.

Regarding the initial state thing, I have a marginal preference for having an unnamed initial state that can't be jumped to, so that it matches the let rec style.

let result = arbitrary_control_flow! {
    state s1(n: i32) {
        if n % 2 == 0 {
            become state s3(n / 2, n + 1)
        } else {
            become state s2()
        }
    }
    state s2() {
        become state s1(84)
    }
    state s3(x: i32, y: i32) {
        match foo(x, y) {
            None => x + y,
            Some(n) => become state s1(n),
        }
    }
    become state s1(42)
};

The unfortunate thing with this is that it's less obvious how to put the initial state first, since rust doesn't have the equivalent of haskell where. I suppose you could support initial state as well, although I don't like the parentheses in start() in @steffahn 's version since nothing calls that "function".

As for return values, the "correct" answer is that all states return the same type as the arbitrary_control_flow! expression itself (which solves the issue with fallthrough between states in the RFC, which I agree is not great). It's a bit sad if this syntax means we have to declare all the types though; I think it should be possible for type inference to infer all the types in this example, although if I leave them off in a function call syntax like this it looks like it's returning (), which isn't an issue shared by closure syntax (well, that's rust being inconsistent I guess).

It's true that you can use uninitialized variables instead of goto-with-value. MIR already desugars all let x = e; statements to lift let x; to the top of the file, so I'm sure this will work. But I think that having a giant block of let statements at the top of the function is less clear and a throwback to old C programs. (Believe it or not I do care about this syntax being as clear as it reasonably can be given the constraints. Goto programs need all the help they can get.)

EDIT: Actually there is another issue with lifting the let binding, which is that destruction doesn't happen until the end of the outer scope. With local let bindings the variables will be destroyed as soon as you jump to another state, unless you pass the variable to the other state.

If we're going with block scoped labels, then that should be legal and the inner 'label will shadow the outer one. If they are function scoped (like currently), then that would be a duplicate label error. Since the compiler is handling the parsing it should be possible to do it either way. Although I think that having two different labels in nested scopes is confusing enough that we might want to ban it entirely; I don't think the case for name shadowing is as strong here as with let bindings.

Unlike those examples though, I don't think goto "conflicts sharply and irresolvably with [Rust's] dual goal of safety and performance". Indeed, I think Rust can bring an entirely new spin on goto because of the way it interacts with the borrow checker: it really is a great match of safety and performance. I don't think goto will ever be a popular tool, but it doesn't deserve all of the hate thrown on it, and if we can put it through some rehab and turn it into a respectable (if eccentric) citizen that would be a good thing for everybody.

This unfortunately isn't an option. I have definitely wanted this for a long time, however a closure that "returns" directly to its parent isn't returning, it's using longjmp, and this is fabulously unsafe due primarily to the fact that it doesn't run the destructors of intervening call frames. Even if it somehow did (and if it was somehow scope-bound to the stack frame of the parent), it would break scoped locking patterns in the same way as panics.

The other piece that I'd add here that's good about it is that it's nicely restricted in terms of the targets of the control flow. It scopes them all nicely into that block, it keeps you from having to "hunt" for them all through the function. It removes all the questions about jumping into things like the middle of for loops. It weakens the temptation to use this for goto fail structures. Etc.

Basically it's just EBBs, like cranelift uses.

4 Likes

I think that the exit point should be more visible (since the control flow is already complicated). So I think we should use an explicit return x+y, or (in this case) something like break x+y.

Another possibility is to anotate the state itself:

state s3(x: i32, y: i32) -> break { // either a keyword like `exit`/`break`/`return` or an explicit type like `i32`
    match foo(x, y) {
        None => x + y,
        Some(n) => become state s1(n),
    }
}

Other that that, I really think that the current iteration in very readable.