Make (Some) Separators Optional

Single-function trait declaration requires ; after the function. This particular case looks odd to me because fn{} must not have a semicolon.

Lua was mentioned and is interesting in that whitespace isn’t significant. The EBNF also fits on approximately one screen: https://www.lua.org/manual/5.3/manual.html#9

The following code is (I think) the only ambiguity due to having optional semicolons:

local a = 0
local b = a
("%d"):format(b)

Here, a("%d") is parsed as a function call rather than as assigning a to b and then calling format on the string “%d”.

However, in this case, the result of the format function is being ignored, which makes it contrived. I can’t really think of a non-contrived ambiguity, though I’m sure they exist.

My opinion about Rust and optional semicolons is that I’d prefer the redundancy, thanks.

1 Like

You might want to read some previous threads about the semicolons:

I've had to do a lot of Go programming lately and have been bothered by the semicolons in Rust :slight_smile:

Yes, and it turns out that style isn't just a recommendation, but actually necessary in some cases -- precisely due to the automatic semicolon insertion. That is, a program like this is illegal:

func main()
{
	fmt.Println("Hello, playground")
}

I happen to prefer the usual Go style, but this kind of gotcha can be confusing and annoying to people who are used to being able to format the code like they like. Removing semicolons is not free since it basically imposes some restrictions that weren't there before.

So if Rust were to make some semicolons optional, please take great care to handle code like above nicely.

I keep forgetting ; after a large let x = { … } block.

if foo {
   bar
} else {
   baz
} // no ;
let x = if foo {
   bar
} else {
   baz
}; // ';' required!

So I’d vote to make that one optional :slight_smile:

But that is because it's actually a let (of course). This case, too, would suffer from potential ambiguity, depending on what follows:

let z = {
    let _x = if foo { bar } else { baz }
    *p
}

Contrived, perhaps, but syntactically ambiguous without semicolons (multiplication or result value).

I personally don't like the proposal for the opposite reason: it prevents me from breaking long expressions into lines without having to worry that this might arbitrarily create an unintended split.

Hence I would argue the opposite, if anything: allow additional, superfluous semicolons where they don't hurt, such as at the end of structs:

struct Zorg;           // currently required
struct Zerg { ... };   // currently forbidden, could be allowed
6 Likes

Yes, absolutely this! I have been writing in rust for two years and still without fail make this mistake every time I turn a unit struct into a struct with fields. What's more is that this is allowed:

fn func() {
    fn inner_func() -> u32 {
        3
    };

    // the above is "allowed" because it actually means:
    //     fn inner_func() -> u32 { 3 }    // <-- an item...
    //     ;                               // <-- ...followed by a statement
}

which frequently appears as the result of turning a closure into a function, but causes errors once the inner function is pulled out to item level.

4 Likes

@ExpHP I would love it as well; I’ve also been bitten by struct Foo {}; many times.

However, there seem to be backwards-compat hazards: https://play.rust-lang.org/?gist=0db4692763e19e7c54c955f6bcd411f6&version=stable&mode=debug

Perhaps this is fixable by keeping in mind that a ; was matched as an item, and then the macro will see this as having matched the next infinite sequence of ; in matchers?

EDIT: A more refined rule would be to count the number of consecutive ;s matched as an item, and count exactly that many ;s as matched in a macro matcher.

If we can think of a good way to keep macros working I’d love to work with you on an RFC.

1 Like

I'm not sure what this is showing.

To be clear, my thought is for the single token ; to be a valid item. Is your example intended to show that having $a:item match the input ; is a backcompat hazard?

Yeah, that's my idea also.

Yes. If you naively just make ; match an item, then those currently working macros will break I think.

@Centril but they seem to work fine for :stmt matchers in statement context, so I’m not sure how they would be problematic for :item matchers in item context.

fn main() {
    macro_rules! i1 { ($($x:stmt);*) => {} }

    i1! { struct Foo {}; struct Bar {} }
    
    macro_rules! i2 { ($x:stmt; $y:stmt) => { } }
    
    i2! { struct Foo {}; struct Bar {} }
    
    macro_rules! i3 { ($x:stmt;; $y:stmt) => { } }

    i3! { struct Foo {};; struct Bar {} }
}

Rust currently gives a clear error for a semicolon at the end of the struct; it won’t pass silently. That makes it trivial to catch and remove.

error: expected item, found `;`
 --> src/main.rs:3:2
  |
3 | };
  |  ^ help: consider removing this semicolon

Along the same lines, we could and should detect empty statements inside a function, and lint against those. Clippy had a feature request to do exactly that, but punted it to rustfmt. Perhaps we should reconsider adding that?

1 Like

My concern is that in macro_rules! i1 { ($($x:item);*) => {} } the token ; would eaten as an item and then there is no ; to match as the separator. That is: given struct F {}; struct G {} it is interpreted as struct F {}; and struct G {}.

While the current diagnostic is better than nothing, I think this simple mistake occurs frequently enough to be in the way of writing flow; so making struct Foo {}; legal would make life easier. We can normalize this via rustfmt, removing any redundant ;s.

1 Like

Huh? I would think it matches just fine:

  • $x:item matches struct F {}, stopping before the ; because struct F {}; is not a valid item (or a prefix thereof)
  • $();* matches the semicolon nonterminal, and thus repeats
  • $x:item matches struct G {}
  • $();* does not match a semicolon, and thus ends
  • EOF; the expansion succeeds

They are not redundant – they are necessary for unambiguous parsing. The designers didn't put them in the language out of pure passion.

That's a call for all sorts of pain. Rust is intentionally not the """convenient""" whitespace language – those – basically – heuristics tend to introduce all sorts of hard to debug errors arising out of discrepancies between what's "obvious" to the human eye vs. what rules and exceptions and special cases the compiler handles.

I am a long time Swift user and I can tell you, life would be much simpler in Swift with semicolons. The language has all sorts of ugly assumptions about where expressions and statements end, and it's just extremely irritating.

2 Likes

But ; is now a valid item, so you’d get the following instead?

  1. $x:item matches struct F {} since it is a valid item
  2. $x:item matches ; since it is a valid item
  3. ; is expected as a non-terminal separator, but it has already been matched in 2. The matcher fails.

That would require it to attempt to parse two items right after the other, but $(<stuff>)<delim>* must match <delim> before it attempts to match <stuff> again.

1 Like

Oh I see; that makes sense. Thanks.

Right now, we have a very consistent rule: you need semicolons after any construct that doesn't take braces, and you need to omit the semicolon after any construct that takes braces. For instance, if {}, for {}, loop {}, if {} else {}, struct {}, enum {}, union {}, and extern {} all don't take semicolons. And for a variety of reasons we can't change most of those. Making some subset of them ignore unnecessary semicolons seems like a trap for users; better to document the pattern consistently.

I do think, however, we could make the error message much clearer. Right now, it just acts like any other parse error. We could recognize this specific case and very specifically say "don't put a ; after a struct" (and similarly for union and enum), to make the message more straightforward to understand and act on.

1 Like

You can have ; after {} in an expression context:

fn main() {
    struct Foo {};
    
    ;;;
    
    if true {} else {};
}

This is permitted since ; is interpreted as the expression ().

Why a subset? If ; is an item, then it is permitted after impl, extern, struct, union, enum, fn, trait consistently.

EDIT: @ExpHP want to join me in #rust-lang on IRC to work on an RFC perhaps?

1 Like