Pre-RFC: Labels in variables

Preface: I know that there is sometimes little appetite for drawing direct things from c++ but I figured that lifetimes and the bwrc might help make this feature even better (and safer).

So on to my actual proposal:

A new type Label in core somewhere that is specially know by the compiler.

Allow for the getting of this Label type from labels I also propose a new syntax for it let l = &'label; which currently does not compile.

For using these variables (which can also be put into arrays) I proposed the following syntax continue *l. This is distinct enough to be eye catching and I believe is not currently possible.

Safety:

The value in a variable of type core::label::Label is Copy but it is unsafe to move or take a reference to it. If it is possible to check this then maybe it shouldn't be unsafe for moves and borrows within the same function.

Since labels are only allowed on blocks (not expressions) I don't believe that it is possible to break the borrow rules with this.

I think it would be helpful if the Pre-RFC included some information on what this feature is and what the expected use cases are for it.

3 Likes

Sure that makes sense.


This would allow for creating jump tables in code (that don't rely on match) since it can be more efficiant (from the branch predictors point of view) to have many "jump" points instead of one (each jump point can get associated with a different pattern)

Ok, so this is GCC's "Labels as Values" feature (aka Computed GOTO)?

4 Likes

Could you explain this a bit more? Why is it unsafe to reference or move the Label? Is it unsafe to continue *l?

Yes it is basically that feature (which also exists in clang)


I would say that it is unsafe to move or take a reference to a label if that actions allows it to be transferred out of the current function context. However, if the move is just within the same function then it would be safe.

Example (not safe):

use core::label::Label;

fn is_unsafe(l: Label) {
    // initialize some variables on stack
    continue *l;
}

fn foo() {
    let k = &'iceloop;

    unsafe { is_unsafe(k); }

    let mut l = 0;

    'iceloop: 
    for i in 0...10 {
        // do something with `l`
    }
}

I am pretty sure that this is unsafe because it could much up the offsets for local variables

Safe example:

fn foo() {
    let k = &'iceloop;

    let o = k; // move

    // POSITION A

    let mut l = 0;

    continue* k; // POSITION B

    // some more computation

    'iceloop: 
    for i in 0...10 {
        // do something with `l`
    }
}

I believe that at POSITION B the continue* is safe to do but it wouldn't be safe to do so at POSITION A.

So the proposed rule would be that it is safe to do so if there are no references to any variables defined after the continue* that are possibly used in any block that the continue* could jump to. Otherwise leave it up to the programmer and mark it as unsafe

It feels weird to me that creating the variable is unsafe but using it is safe. Raw pointers, for example, are totally safe to create, it's using the pointer that's unsafe.

1 Like

Yes I would agree that it would be unsafe to create it. I don't think I said that (went back and checked).

What I am trying to get at is when it would be "unsafe" to use such a label variable.

Sorry, I inferred that from your opening post but you're correct, you didn't say it. However, I don't think it should be unsafe to create a Label variable, I think it should be unsafe to use it (always). This would be inline with raw pointers where creating, moving, copying, etc the pointer is always safe, but using it (deref) is always unsafe.

Psudo code:

fn main() {

  let b = rand::get_next_bool();

  let l = if b { &'label1 } else { &'label2 };

  let l2 = l; //move or even Copy

  //do stuff

  unsafe { continue *l2; }

  'label1: {
    println!("foo");
  }
  'label2: {
    println!("bar");
  }
}
2 Likes

That might make it better. In the sense that it would be easier to implement.

In terms of implementing the actual continue 'label, cost is effectively none: MIR already uses goto 'label for control flow IIRC. (For printing; in memory it's a control flow graph.)

Making the continue 'label operation unsafe is "enough" to make it not unsound to include. But what more can we do? e.g. Cranelift's EBB model is fully correct for their SSA values.

(All code here should be treated as in an implicit unsafe context.)


Using a label from a function that does not define it is insant UB, as it trashes the stack by jumping the call stack without actually modifying the stack:

fn outer_ub() {
    let l = make_label();
    continue *l;
}

fn main() {
    let l = &'label;
    ub(l);
    'label: { }
    return l;
}

fn inner_ub(l: Label) {
    continue *l;
}

The UB in outer_ub can be categorically prevented just by making the type of l &Label and tying it to a function-local lifetime. The UB in inner_ub is not possible to be prevented with just current Rust's checks, and would require some extra magic to prevent label references from being passed to other functions. If we have that extra magic, then it wouldn't make sense to apply a lifetime to label references as that same magic can prevent it escaping with better error messages.

Perhaps just making the type unnameable would be enough, so long as it can't be captured into an existential.


Jumping to a label that uses a place that is not yet initialized is UB.

continue 'label;
let o = 5;
'label: { dbg!(&o); }

In a best case scenario, this accesses an uninitialized place and that's it. In a worst case scenario, this jumps over the stack pointer offset for the o place on the stack and gets the stack pointer out of sync.

This could be prevented by disallowing taking a reference to a label that has declarations between the point of reference and the label, but doing so is overly restrictive from the actual requirement of not having any declarations (or even temporaries, potentially?) between the labelled continue and the label it ends up jumping to.


It is UB to jump out of a scope with stack slots unless we teach the compiler to (run drop glue and) do stack deallocation before the labelled continue (which is fully possible, and moves it away from "just" a computed goto, but is worth mentioning here).

{
    let v = vec![0, 1, 2, 3];
    continue 'label;
}
'label { }

If we don't fix the stack to deallocate the v place, then it desyncs the stack.


TL;DR: I think that the complexities of when a labelled continue would be valid means that basically none other than what is equivalent to a match is actually sound, and we should just encourage that pattern instead and potentially make it easier (and make sure it optimizes well)

let j = compute_which_branch();
match j {
    Jump::One => { .. }
    Jump::Two => { .. }
    Jump::Three => { .. }
    Jump::Four => { .. }
    Jump::Five => { .. }
}

If that match can be implemented as a jump table rather than a series of jump-if-equal, it should be possible for the compiler to emit it, and probably should be possible for the developer to hint one way or another.

4 Likes

I understand the amount of possible UB, which is why I mentioned what I did about moving and references.

However, the optimization is in a loop surrounding a match. This is an optimization because each match branch would become a location for the branch predictor to branch from instead of always just one jump location.

Basically it would only be safe if you have a bunch of blocks that you jump between.

This seems to cover such a niche that if you actually need this, you should probably write some goto-ridden C or assembly instead. It is way too dangerous and not at all common enough to be worth the burden of adding it to the language.

(And even if many instances of potential UB can be protected against, as CAD97 pointed out, the restrictions make the Label type hopelessly special-cased and just outright awkward to learn and use.)

3 Likes

However, the optimization is in a loop surrounding a match. This is an optimization because each match branch would become a location for the branch predictor to branch from instead of always just one jump location.

I, too, have seen this CppCon conference recording and consider this a defect in llvm (the optimization by the compiler would be perfectly fine, it does not modify the semantics of the program afterall) and not one that we need to fix at the language level. Particularly not by introducing new constructs that are higly prone to UB. Also, any new feature introducing extra complexities with regards to control flow analysis effectively hinders other compiler optimizations, either by allowing for less invariants to be established on the CFG or through added implementation effort. (It does make a modest amount of sense in C++/C where goto/longjmp are already a thing that isn't going to be removed).

7 Likes

So what you basically want is EBBs at the source level. It sounds like this is handled by explicit tco (often called become).

Still, it definitely makes sense for the optimizer to be able to recognize the loop-and-match trampoline and choose to optimize it to multiple branch locations rather than one.

This Pre-RFC is digging into the technical details without actually motivating why we should do this.

5 Likes

I want to further point out that a match trampoline is probably more optimal (and optimizeable) than what is essentially setjmp/longjmp. I don't know much about optimizing longjmp, but I can't imagine that "single computed far jump" is going to perform any better than "computed near jump + static far jump" (which is how something like the match example would be naively compiled).

1 Like

Agreed, and I think the continuation passing style can be done completely with closure. There's no need to introduce those seems-unsafe feature.

And instead of using label as a variable, we can actually put Fn trait objects into variables, and that is exactly what the label variable used for (I mean jump/dispatch dynamically).

1 Like

I have tried setting up a rudimentary example using a array of closures but I couldn't seem to define the type required, even if each of the closures where capturing from the outside scope.

This is essentially what I tried: https://rust.godbolt.org/z/IVMNnx

But you problem isn't anything related to Rust itself. It simply because you way to represent continuation. Any statically typed language can't assign type for it, unless the number of steps is statically know.

However, if you wrap your continuation with a enum, there won't be any problem, right?