Pre-RFC: Labels in variables

Instruction is that enum as seen in the link.

If you mean a loop around a match statement then that is what montivated this. Since closures are a zero-cost abstraction then they can be thought of like portable code blocks.

I looked to them for a possible solution to this because of that.

As long as you are passing naked closures, this is impossible.,You have to have a type for infinite recursion structres. This is actually a linked list. You enum isn't the type that represent potentially infinite recursion. You need something similar to a linked list for sure

IMHO it's better to have an extension that allows to compile matches to direct jumps by making the enum values be the addresses of the labels.

#[repr(jump)]
enum State
{
  A,
  B,
  C
}

fn foo(..) {
  let mut state = State::A;
  loop {
    #[jump]
    match state {
      State::A => {state = ...},
      State::B | State::C => {state = ...},
  }
}

Rules:

  • Exactly one #[jump] match can exist for each type with #[repr(jump)], and it must be in the same crate
  • #[jump] match can only be used on types that are #[repr(jump)]
  • #[repr(jump)] can only be applied on C-like enums with no assigned values
  • If the output IR supports indirect jumps, the enum values must be the addresses of labels in the match arm and compiler must implement #[jump] match as an indirect jump instruction; if there is an "or pattern", then the compiler must emit extra nops or near jumps so that each enum value gets a distinct value and jump address
  • If the output IR does not support indirect jumps, then the repr of the enum is as if the jump specification was absent, enum values are assigned as sequential integers in the order they are mentioned in the match starting from 1 (this results in the ordering and non-zero-ness being the same as compilations using jumps)

Having several indirect jumps to the same set of places can be accomplished by having multiple CFG edges to the indirect match statement and applying the "jump threading" optimization, possibly adding the possibility of specifying an #[inline] attribute on the match to force it (but LLVM should hopefully be able to always do that anyway).

2 Likes

If this was used then it might be easier for the compiler to optimize a loop { match expr { ... } } into a self referential jump table loop (or spaghetti loop)

For anyone that is interested, I have managed to get my version to compile and with -C opt-level=3 it produces the expected tail call optimized multi-jump location version.

Here is what most people would write:

From this I can tell that rustc can const eval loops rather easily (since if you #[inline(never)] the fn foo it just returns 16. But it is nice to see that we can describe this problem without any additional features.

3 Likes

Unfortunately, tail call optimization is not currently guaranteed, so your code could be compiled into something that overflows its stack – e.g. on a different target (wasm) or a different version of the compiler.

Also, although the compiler seems to have optimized away all stack manipulation in the functions in your example, that wouldn't work if the functions were more complex. The goto-label version allows register and stack allocation to be done globally across all cases.

So there still seems to be a need for this sort of feature?

I can see why some people think that this could just be an optimization on matches. I could even see #[jumps_in_branches] as a marker on a loop that only contains a match on an tag only enum.

That could solve this simple case but wouldn't really allow for more complicated situations.

Agreed on all counts. Not sure what the best design would be, but it would be good to have something.

I'm not sure I would want this to solve a more complex problem because that would be too easy to make spaghetti code.

1 Like

Makes sense I guess and this is really just supposed to be a guarantee of a specific optimization and not for making bad code.


So I was thinking that it would have the following requirements:

  1. Enum MUST be tag only (like a c-style enum)
  2. Enum MUST NOT have spaces in its values (when specifying the values manually)
  3. The macro must be on a loop that contains only on a variable of such an enum type (not Option<E>)

Why these restrictions? I don't see any reason for normal enums to be incompatible, or even ones with gaps. I think a better restriction is no if guards, because that would break this optimization.

match, not macro?

No macro, since I envision the following to be how the feature being used:

fn foobar(v: Iter<Enum>) {
    #[jumps_in_match_arms]
    for next in v {
        match next {
            Enum::Var1 => { ... },
            ...
        }
    }
}

The reason why I said that was because if there are holes then the jump table would still have to have space for those values (making it more inefficient in terms of size).

I agree that no if guards should also be a restriction.

Note: for loop uses Iterators, and therefore uses Option

Yes, but it stops when the iterator returns None and auto unwraps it

You said earlier,

Which means that it can't work with iterators given the rules you specified.

I mistyped, I meant "... contains only a match on ..."

The key part is you disallowing Option, which means that Iterators can't work under your model

I don't quite see what you mean. The following code compiles and the match is not on an option:

use std::iter::Iterator;

fn foo(iter: impl std::iter::Iterator<Item = u32>) -> u32 {
    for i in iter {
        match i {
            0 => {},
            1 | 2 => println!("hello world"),
            3..=10 => return 0,
            _ => {}
        }
    }
    
    1
}

For loops are desugared something like this,

for x in iter {
    // work
}

To

let mut __iter = IntoIterator::into_iter(iter);

loop {
    match __iter.next() {
        None => break,
        Some(x) => {
            // work
        }
    }
}

So there is that implicit match in the for loop desugarring.

I think it would be better to make the guarantee for the loop loop first because it is simpler, the for loop can be handled later.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.