[Pre-RFC] Safe goto with value

digama0 · April 16, 2021, 7:22am

I thought about using return there, but that would prevent the use of return anywhere in the expression with the meaning "return from the function", which is a big expressive loss. break is a reasonable choice if we think of this as a kind of generalization of loop.

With the -> break thing, you could also use that to allow indicating the "default action" of fallthrough, for example you could put break 'a or continue 'a or become state s1 and it acts as though the whole function is wrapped in a block which does that. Then again... you could just wrap it in a block that does that.

One option for the type inference issue is to bring back the => like this:

state s3(x: i32, y: i32) => {
    match foo(x, y) {
        None => break x + y,
        Some(n) => become state s1(n),
    }
}

and if you want to put the break first it would be

state s3(x: i32, y: i32) => break {
    match foo(x, y) {
        None => x + y,
        Some(n) => become state s1(n),
    }
}

which by coincidence looks very similar to your second example. If the break is mandatory then that means the expressions all have type !, which means we don't have to worry about annotating the type on the state expression.

steffahn · April 16, 2021, 9:05am

Can be fixed by starting each block '𝑠𝑡𝑎𝑡𝑒 that’s supposed to have “parameters” 𝑎, 𝑏, 𝑐, … with some statements

'𝑠𝑡𝑎𝑡𝑒: {
    let 𝑎 = 𝑎;
    let 𝑏 = 𝑏;
    let 𝑐 = 𝑐;
    …
    // contents
}

Some advantages of a value-free version might be:

closer to already-existing break/continue for loops, and also syntactically related to labeled blocks that might exist in the future
more minimalistic feature
a with-arguments version could be added later or even implemented with macros alone in some crate

digama0 · April 16, 2021, 9:09am

While I don't think this is necessarily a good thing, I can see an argument for making the compiler macro as minimal as possible while retaining the arbitrary CFG goodness, and relying on a proc-macro to get the rest of the feature, especially if the compiler macro is difficult to implement and/or doesn't easily fit into the rest of the compiler.

digama0 · April 16, 2021, 9:14am

In fact, it's possible to get arbitrary CFG goodness with zero syntax changes: just make this work:

{
  'a: loop { continue 'b }
  'b: loop { continue 'a }
}

These two are already in scope of each other, but this particular combination is disallowed. Then again, this has the same issues as were previously levied against non-macro solutions: it might be unexpected that we're using loop as a form of goto without any keywords to signal that something funny is going on.

robinm · April 16, 2021, 10:25am

steffahn:

Can be fixed by starting each block '𝑠𝑡𝑎𝑡𝑒 that’s supposed to have “parameters” 𝑎, 𝑏, 𝑐, … with some statements
'𝑠𝑡𝑎𝑡𝑒: {
    let 𝑎 = 𝑎;
    let 𝑏 = 𝑏;
    let 𝑐 = 𝑐;
    …
    // contents
}

I really dislike this, because it makes it much harder spot an error (like if you forgot to do let c = c), and arbitrary control flow are already very hard to reason about.

However it could be possible that this would feature be just the addition of an non_local_jump block attribute that would allow non-local jump

#[non_local_jump] {
  'a: loop { continue 'b }
  'b: loop { continue 'a }
}

and then arbitrary_control_flow! could be a regular macro, provided in std, that wraps #[non_local_jump]. I’m not even sure that #[non_local_jump] needs to be visible outside of std.

EDIT: I’ve used an attribute for non_local_jump, but it could totally be a pseudo-macro instead.

Aloso · April 16, 2021, 12:37pm

Another solution to make exit points explicit is by annotating each state block's return type:

let result = arbitrary_control_flow! {
    state s1(n: i32) -> ! {
        if n % 2 == 0 {
            become state s3(n / 2, n + 1)
        } else {
            become state s2()
        }
    }
    state s2() -> ! {
        become state s1(84)
    }
    state s3(x: i32, y: i32) -> i32 {
        match foo(x, y) {
            None => x + y,
            Some(n) => become state s1(n),
        }
    }
    become state s1(42)
};

Which makes the state blocks look even more like functions. This is nice because it makes the syntax easier to remember. However, we should clearly distinguish between functions and these state blocks, since they're very different concepts.

My other concern right now is the become state syntax. It is rather long, it consists of two words, and also isn't very intuitive. If people see it and search for "become", they probably won't get relevant results. That's why I'm in favor of a new keyword. With RFC 3101, a new keyword can be reserved in a new edition, but still be used in earlier editions with the k#keyword syntax. My first idea was k#goto, but that word carries a lot of baggage. Other options are k#proceed and k#transition.

Note that the state identifier at the start of state blocks doesn't need to be reserved, since it can be parsed unambiguously by the arbitrary_control_flow! built-in macro. Only the state transitions need a keyword, since they can appear in arbitrarily nested expressions.

P.S. Perhaps a better name for arbitrary_control_flow! would be state_machine!.

lordan · April 16, 2021, 12:42pm

Perhaps this is obvious, but if you separate the parameters from the state machine this way, the end result seems reeeeally close to closure code generated for async (and generator) functions/blocks. IOW, it would be great if you could go from one to the other seamlessly.

Aloso · April 16, 2021, 12:51pm

That sounds like a good idea, but I'd like to stress that I wouldn't want #[non_local_jump] to be exposed to crate authors. It should just be an implementation detail.

Soni · April 16, 2021, 3:07pm

Labels should be scoped to their inner expression tbh, not the outer expression. Stuff like this is just confusing.

simmet · April 16, 2021, 8:39pm

Is there an unsafe goto in Rust?

Mokuz · April 16, 2021, 9:40pm

you can always use asm! and insert arbitrary jmp instructions

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=59f1640e8da8c13baacba6598ecbaede

quaternic · April 16, 2021, 10:51pm

Some have proposed ideas without any fallthrough, but that would make writing a simple loop with multiple entry points very annoying. You'd have to split the sections between the entry points into separate blocks and then chain those together with unconditional gotos. I find the first one of these much clearer:

contains_safe_goto! {
    match k {
            0 => goto 'a,
            1 => goto 'b,
            _ => goto 'c,
    }
    loop {
        'a: a();
        'b: b();
        'c: c();
    }
}

state_machine! {
    'start => {
        match k {
            0 => goto 'a,
            1 => goto 'b,
            2 => goto 'c,
        }
    }
    'a => {
        a();
        goto 'b;
    }
    'b => {
        b();
        goto 'c;
    }
    'c => {
        c();
        goto 'a;
    }
}

I think it's a good idea to have this restricted to some clearly limited scope with something like a macro. However, where enabled, the feature should be convenient to use. If you require unstructured control flow in the first place, a state machine won't always be the best way to organize that piece of code.

CAD97 · April 16, 2021, 11:13pm

EBBs are formally studied and used by basically every lowish level IR at this point, including MIR. (Well, some use BBs and phi nodes exclusively, such as LLVM, rather than having EBBs, but those are isomorphic.)

Providing a statemachine syntax I can see making it into Rust, as it exposes a low level concept to the surface language that already exists in MIR, and is materially useful for code generation that goes through an EBB form, or other algorithms with nonreducible control flow.

goto into blocks is how you enable Duff's Device, and at that point you aren't going to win against the people who saw "goto considered harmful" and didn't internalize the core of the argument (using structured control flow makes for more maintainable code; use goto in a structured way if you have to use it) as well as you might've, just that goto is bad.

EBBs is the building block that enables all sorts of things to build on it. goto into blocks is an extra footgun that is not needed for the functionality, so should be avoided. A userland transform can potentially implement a multiple entry loop on top of EBB functionality, without completely destructuring the control flow.

digama0 · April 16, 2021, 11:17pm

I think it is confusing that the different states return different types, because they can't actually return different types - the constraint is that all types in the return values of the states have to be coercible to the type of result, which works as long as they are all either ! or T = i32 in this example but suggests more flexibility than actually exists. We don't put types on the individual branches of a match or if either for this reason.

By the way, hand-writing async functions without compiler magic was one of the intended use cases of this feature. Obviously it's better to use async directly when it is an option, but for example if you want generators on stable, or coroutines, you will have to do the transformation yourself, and right now the syntax to do so in the general case doesn't exist.

That should be possible. Maybe rename it to #[rustc_non_local_jump]. As Soni says, it's confusing to have a loop label that scopes beyond the loop. This works today because of function-level scoping, but I have come close to filing a bug along the lines "why doesn't 'a: loop { break 'a }; 'a: loop { break 'a }; work?" before running across threads like #24162. So while I acknowledge the existence of the loophole that would allow #[rustc_non_local_jump] to work, I don't think it is a very good choice at all, and I definitely don't want anything other than the compiler or a proc macro to have to write code in that style.

I think implicit fallthough starts to touch on the actual "unstructured programs" issue that was originally brought up by the "goto considered harmful" essay. This is actually obscuring control flow, not just enabling control flow patterns that can't be mimicked with lots of labeled loops. I don't think making Duff's device easy to write is a net positive. If you really want that, see the plutonium::fallout macro, which is a joke for a reason.

comex · April 17, 2021, 6:06am

I once wrote a code generator targeting Rust, which naturally produced 'irreducible' control flow and would have benefitted from being able to generate gotos.

Well, not technically irreducible. The control flow wasn't expressible in any natural structured way. But it only ever needed to branch forward, rather than looping. Which made it possible to use the hilarious strategy of implementing goto in terms of labeled breaks:

'goto_stmt9: {
    'goto_stmt8: {
        'goto_stmt7: {
            'goto_stmt6: {
                'goto_stmt5: {
                    'goto_stmt4: {
                        'goto_stmt3: {
                            'goto_stmt2: {
                                stmt1;
                            }
                            stmt2;
                        }
                        stmt3;
                        break 'goto_stmt5; // example
                    }
                    stmt4;
                }
                stmt5;
            }
            stmt6;
        }
        stmt7;
    }
    stmt8;
}
stmt9;

…I actually implemented this. The output had at least 100 levels of nested blocks, I forget exactly how many. Unfortunately, this approach resulted in massive compilation time, probably because something in the compiler wasn't expecting such a huge nesting depth.

As much as I was proud of generating Rust code that was such a work of art, perhaps surpassing even Duff's device in beauty, it did seem that a less creative goto-based approach could have been more easily digested by both compilers and humans, were the feature supported.

felix.s · April 17, 2021, 9:08pm

I’ll note that I did, at one point, contemplate having something like

fn foo<T, 'a: break(T)>(val: T) -> ! {
    break 'a val;
}

fn bar() {
    let x = 'a: {
        foo<'a>(5)
    };
    println!("{}", x); // 5
}

Which is why I chose fsm! for my version of it, as after all one of things it stands for is ‘finite-state machine’. Here’s your example translated:

let result = fsm! s1(42) {
	s1(n: i32) => {
		if n % 2 == 0 {
			continue s3(n / 2, n + 1)
		}
		s2
	},
	s2 => s1(84),
	s3(x: i32, y: i32) => match foo {
		None => break x + y,
		Some(n) => continue s1(n)
	},
};

No new keywords necessary, and it can be easily read from the top, just like a match expression. I think it could be implemented on top of loop and match as a procedural macro. A simplified version (without continue and with slightly different initial state syntax) is easily implementable with a pattern-match macro.

(Aside: the funclets proposal for WebAssembly seems related, although it is a bit more general: by the look of it, it should be able to express things like binary tree traversal, unlike fsm!.)

robinm · April 17, 2021, 9:32pm

This syntax cannot work. The whole content of a macro must be contained into a single pair of braces (parenthesis, curly, or square), otherwise Rust parsing would become non-deterministic.

felix.s · April 17, 2021, 9:40pm

How come macro_rules! works, then?

robinm · April 17, 2021, 9:42pm

It's special. That's the only macro that works like this and is known by the compiler.

felix.s · May 17, 2021, 12:38pm

After some thinking, I think a slight modification to the syntax could be made to work, initially as a special case like macro_rules!, and later available to all macros:

fsm! s1(42) => {...}

Now, doing the latter would require the macro call sites to be always parseable as any of the following forms:

m! (...)
m! IDENT (...)
m! IDENT(...) => (...)

where (...) is a balanced token tree contained in a matching pair of brackets. That’s easily done: look at the next token after the macro invocation. If it’s an opening bracket, it’s the first form, parse it as before. If it’s an identifier, look for an opening bracket after it, consume the token tree and the closing bracket that follow, then look if the next token is =>; if yes, consume that as well, and the token tree that follows.

I think this might even be generalised further, but the above should be enough to make it work.

Now, making macro_rules! be able to consume the latter two of those forms can be solved independently.

Topic		Replies	Views
Why does Rust not support goto statements? language design	16	16110	December 6, 2021
Gotos in restricted functions language design	16	13944	March 25, 2019
Idea: continue-to-else	10	1111	March 25, 2019
Loop Match - a syntax for iterative evaluation language design	13	715	October 11, 2024
Idea for safe computed goto using enums language design	8	620	November 10, 2024

[Pre-RFC] Safe goto with value

Related topics