Reconsidering semantic significance of line-breaks after `ExpressionWithBlock`s

Here is a case where a human reviewer might misinterpret how code is actually executed:

fn main() {
    { false } || { true };
}

unless the code is conventionally formatted with line-breaks:

fn main() {
    {
        false
    }
    || { true };
}

In this specific case, the ambiguity is inadvertently pointed out by rustc because the author intended a logical OR expression, while the compiler follows two specific rules:

  1. the compiler always prefers parsing a block as a separate statement in ambiguous cases;
  2. the result type of a block expression must be () when it's parsed as a statement with its trailing semicolon omitted;

Combined together, a confusing type error is triggered which eventually points out the ambiguity (related issue: `{ expr1 } || expr2;` logical or / closure ambiguity · Issue #150552 · rust-lang/rust · GitHub)

But what if the author actually intended:

{
    expr1
}
|| expr2;

by writing { expr1 } || expr2;?

If expr1 evaluates to (), there is no compiler error. The unused closure that must be used warning could be turned off or lost in the noise. Meanwhile, a reviewer / collaborator might read the entire line before ; as a single logical OR expression. What can we do to reduce this kind of ambiguity?

A Naive Proposal

We could restrict the conditions under which a trailing semicolon can be omitted. Specifically, we could require that an ExpressionWithBlock used as a statement can only have its trailing semicolon omitted when it is the last statement before a newline.

So

{
    expr1
}
|| expr2;

can only have alternate forms like this:

{ expr1 }
|| expr2;

or this:

{ expr1 }; || expr2;

but not this:

{ expr1 } || expr2;

However, this would be a breaking change and contradicts the following rule from Whitespace - The Rust Reference:

Rust is a “free-form” language, meaning that all forms of whitespace serve only to separate tokens in the grammar, and have no semantic significance.

A Rust program has identical meaning if each whitespace element is replaced with any other legal whitespace element, such as a single space character.

Although in an intuitive sense, the claim that "A Rust program has identical meaning if each whitespace element is replaced..." is already slightly inaccurate due to line comments, where we would certainly break the program if we replace the line-break at the end of a line comment with a normal space character.

Discussion

Even if expr1 in { expr1 } || expr2; evaluates to (), the unused closure that must be used warning provides a hint. But is that enough?

I'm not sure whether this should be a hard error (or maybe even change the default parsing preference so that { expr1 } || expr2; is actually interpreted as a logical OR expression with its result implicitly discarded in future editions), given the cost of a breaking change and adding exceptions to existing rules. I’m curious to hear the community’s thoughts on whether the "free-form" nature of Rust should be strictly preserved, or if we should lean into line-break sementic significance in cases like this.

1 Like

While neither implicitly discarding a boolean nor implicitly discarding a closure is good practice, I don't think people who tend to write awful code should receive less attention - if not more.

1 Like

While I think that the problem is largely made up, since any human reviewer upon seeing { false } || { true }; will just tell you to format your code before requesting a merge, it is definitely inconsistent that you are allowed to omit the semicolon after a block, where any other expression would require one.

My proposal would be different, though: always require a semicolon in such cases, no matter what the expression is.

The following code:

fn main() {
    // `{ false } || { true }` below is an actual logical OR expression
    if { false } || { true } {
        todo!()
    }
}

is already in its formatted form and is perfectly viable with no compiler errors / warnings at all.

So no, seeing curly braces doesn't mean there must be line-breaks around.

The impact of that change would be enormous; almost everyone would need to refactor a vast amount of existing code when migrating to a supposed edition.

My proposal would only affect those who actually write multiple statements on the same line with separating semicolons omitted. Most codebases wouldn't need to change at all.

And my design aligns with the intuition of languages like JavaScript where you can omit semicolons at the end of a line, but not semicolons that separate multiple statements on the same line.

Inserting the semicolons is easily a machine-applicable fix.

It is in "technically rustfmt won't change anything" form, but I firmly believe that there isn't a single human reviewer that would let this be merged without factoring the blocks out.

It's obviously a simplified demo.

In real cases the curly braces might not be unnecessary, because the block scope affects when values are dropped.

Is this a real issue? Can you link some real life examples of where this caused a bug?

Every language can be written in obfuscated ways, you could just use single letter variable and function names all over the place and make the code pretty confusing.

There is even a test in rustc of some of those: rust/tests/ui/expr/weird-exprs.rs at cc08b553b899821331ddbfb970e243a7dd0957a3 · rust-lang/rust · GitHub

No, none of that is reasonable to write in production code. But you can't make a fully featured language that doesn't permit these sort of things.

Also see https://www.ioccc.org/ for C etc.

3 Likes