[Pre-RFC] Early exit from any block


#1

Update: now a pull request

  • Feature Name: label_break_value

Summary

Allow a break not only out of loop, but of labelled blocks with no loop. Like loop, this break can carry a value.

This depends on RFC 1624 landing. I proposed this here. An identical proposal was part of the explanation for trait based exception handling.

Motivation

In its simplest form, this allows you to terminate a block early, the same way that return allows you to terminate a function early.

'block: {
    do_thing();
    if condition_not_met() {
        break 'block;
    }
    do_next_thing();
    if condition_not_met() {
        break 'block;
    }
    do_last_thing();
}

Following RFC 1624, this, like return, can also carry a value:

let result = 'block: {
    if foo() { break 'block 1; }
    if bar() { break 'block 2; }
    3
};

RFC 1624 opted not to allow options to be returned from for or while loops, since no good option could be found for the syntax, and it was hard to do it in a natural way. This proposal gives us a natural way to handle such loops with no changes to their syntax:

let result = 'block: {
    for v in container.iter() {
        if v > 0 { break 'block v; }
    }
    0
};

This extension handles searches more complex than loops in the same way:

let result = 'block: {
    for v in first_container.iter() {
        if v > 0 { break 'block v; }
    }
    for v in second_container.iter() {
        if v > 0 { break 'block v; }
    }
    0
};

Detailed design

'BLOCK_LABEL: { EXPR }

would simply be syntactic sugar for

'BLOCK_LABEL: loop { break { EXPR } }

except that unlabelled breaks or continues which would bind to the implicit loop are forbidden inside the EXPR.

This is perhaps not a conceptually simpler thing, but it has the advantage that all of the wrinkles are already well understood as a result of the work that went into RFC 1624. If EXPR contains explicit break statements as well as the implicit one, the compiler must be able to infer a single concrete type from the expressions in all of these break statements, including the whole of EXPR; this concrete type will be the type of the expression that the labelled block represents.

Drawbacks

The proposal adds new syntax to blocks, requiring updates to parsers and possibly syntax highlighters.

Alternatives

This feature isn’t necessary; however in my own code, I often find myself breaking something out into a function simply in order to return early, and the accompanying verbosity of passing types and return values is often not worth it.

Another alternative would be to revisit one of the proposals to add syntax to for and while.

We have three options for handling an unlabelled break or continue inside a labelled block:

  • compile error on both break and continue
  • bind break to the labelled block, compile error on continue
  • bind break and continue through the labelled block to a containing loop/while/for

This RFC chooses the first option since it’s the most conservative, in that it would be possible to switch to a different behaviour later without breaking working programs. The second is the simplest, but makes a large difference between labelled and unlabelled blocks, and means that a label might be used even when it’s never referred to. The third is consistent with unlabelled blocks and with Java, but seems like a rich potential source of confusion.

Unresolved questions

None outstanding that I know about.


#2

Open questions I came up:

  • Should this only work with “vanilla” blocks, or do we also allow usage in if/match blocks? What about the proposed {} syntax for const generics params, does it introduce an ambiguity there?
  • What should this code do: loop { 'label: { break; } }? For consistency with normal unlabeled blocks (loop { { break; }}) (which we can’t change due to backwards compat), I’d say that unlabeled break inside a labeled non loop block should break past it, so it should break the loop here, not the block. However, this would create an inconsistency between loops and labeled blocks, as break always breaks to the inner most breakage point. With that in mind, IMO we should print a compilation error in the example above, to forestall any confusion.

#3
  • I’d propose this only for vanilla blocks.
  • You’re talking about const_generics? I’d be surprised if this made the parser unhappy, but we’ll only find out when we change it :slight_smile:
  • This is a very good point, thanks for raising it. I think breaking past it would be pretty strange and inconsistent; I think it’s OK for labelled and unabelled blocks to behave differently, and I’d be happier with the unabelled break going to the labelled block. However, that also seems strange, since it means a label might only be used once. A compile error is the conservative thing to do, then we can change our mind later :slight_smile: Have updated the proposal to match.

Thanks again!


#4

I think we should copy Java’s semantics here. If I recall correctly, the idea is that break with no label targets the innermost loop, but break with a label can target blocks.


#5

I am a big fan of allowing for labeled blocks. The compiler already internally supports them for use with catch { } (that is how the HIR represents catch). At the same time, I would like to enable the use of these labels as lifetimes – so that one could annotate lifetimes within a function body explicitly.


#6

I’d forgotten that Java has this (without the values) - thanks! But I think it made a slightly odd choice here and I’m hesitant to follow it.

Can the labels on loop blocks be used as lifetimes? If not, I think that allowing use of labels as explicit lifetimes should be a separate RFC.


#7

No, but they should be (this is why they were written with the notation 'a, in fact). I agree that perhaps this is a separate RFC. I would – as part of it – also want to extend the &expr borrow expressions to permit lifetimes: &'a expr.


#8

I don’t know, it doesn’t seem that odd to me. First off, I feel like if you give something a name, you should use that name later when referring to it. To that end, I’d actually prefer if code like this – which uses an anonymous break with a labeled block – got a lint warning:

'a: loop {
     break;
}

Secondly, breaking out of a block is quite unusual, so I think that it is a good idea to make it quite explicit what is happening.


#9

I think you give two very good reasons not to go for the second option, and indeed I agree about the lint warning. I’d be interested to know what you think about the choice between the first and third options.


#10

I think I like compiler error. I like the idea of avoiding confusion.


#11

The second option (bind to the innermost named block) has the consequence¹ that adding a label to a block may change its dynamic semantics (you have to inspect its code to update all unlabelled breaks to preserve the semantics). It also would not make sense for continue.

¹: if you agree that this is a problem, then I think that this also suggests that thinking of adding a label as implicitly inserting a loop statement is not a really good model (a direct definition of blocks and labels would be more reasonable). The loop-inserting translation explains this change-of-behavior, so if we find the change-of-behavior shocking it means the translation does not really match user expectations.


#12

Me too. If we chose option 2 (which I would not like, I prefer option 1), we should introduce anonymous labels like '_, that then don’t trigger an unused warning.


#13

About why I prefer option 1:

The entire point (for me) of this RFC is to make code more clear. E.g. take code like this:

loop {
    /*Some stuff here*/
    if /*something*/ {
        break;
    }
    /*More stuff*/
    // End the loop after one iteration
    // We only need the loop so that we can break above if needed
    break;
}

When seeing the loop you usually expect more than just one iteration in normal circumstances. Of course, you can put a comment // needed for break next to the loop for clarity, but you need to remember to paste it every single time, and its not really nice.

Options 2 and 3 allow unlabeled breaks at the top level of labeled blocks, with different semantics. I think they are bad because:

  • Its obvious what the labeled blocks feature does, there is no bad surprise here. If you only know about the general concept of labeled breaks, your first guess about behavior of labeled blocks would most likely be accurate when encountering code that use it. With code that options 2 and 3 allow, this is less certain. As I’ve outlined above, the main theme of the RFC (for me) is clarity, and enabling options 2 or 3 would mean the RFC adds a (small but existant) clarity burden.
  • I think early exiting from non loop blocks is already a niche feature, mostly justified by its high degree of clarity. Wanting to exit some higher level loop from a labeled block is even more niche. Therefore, I’d argue that paying the “you have to label it” price in the few instances you do need to break to a higher level loop feels okay.

And last, option 1 keeps it open to us to implement options 2 or 3 if we want to at a later point in time, even though I think it would be a bad decision and a step backwards.


#14

I would use this feature for something similar to duff’s device. The existing code is awkward.

#[inline(always)]
pub fn pause_times(spins: usize) {
    if 0 == spins {
        return;
    }
    let unroll = 8;
    let start_loops = spins % unroll;
    let outer_loops = spins / unroll;

    // Implement duff's device in Rust
    'do_0: loop {
        'do_1: loop {
            'do_2: loop {
                'do_3: loop {
                    'do_4: loop {
                        'do_5: loop {
                            'do_6: loop {
                                match start_loops {
                                    0 => break 'do_0,
                                    1 => break 'do_1,
                                    2 => break 'do_2,
                                    3 => break 'do_3,
                                    4 => break 'do_4,
                                    5 => break 'do_5,
                                    6 => break 'do_6,
                                    7 => {},
                                    _ => unreachable!()
                                }
                                pause();
                                break;
                            }
                            pause();
                            break;
                        }
                        pause();
                        break;
                    }
                    pause();
                    break;
                }
                pause();
                break;
            }
            pause();
            break;
        }
        pause();
        break;
    }

    let mut counter = outer_loops;
    loop {
        match counter.checked_sub(1) {
            None => break,
            Some(newcounter) => {
                counter = newcounter;
            }
        }

        for _ in 0..unroll {
            pause();
        }
    }
}

#15

gasche, thank you for mentioning continue, I had totally forgotten to account for that! I have added that to the RFC.

The loop-inserting explanation is not the best way forward, you are right. I had the mathematician boiling water joke in mind when I wrote this - it’s not very natural, but it does reduce this to a previously solved problem.


#16

Now an RFC pull request.