Semicolons and Statements

Havvy · June 29, 2018, 1:21am

I’ve a few questions on statements based on the following observations.

//! Statements and semicolons

// We can capture statements in macros.
macro_rules! stmts_semicolon { ($($stmt: stmt);*) => {}}
macro_rules! stmts_whitespace { ($($stmt: stmt)*) => {}}

// Single expression statements must not be terminated by semicolons when
// captured by a macro. This implies that the semicolon is **not** a part of
// the statement.
stmts_semicolon!{1}
stmts_whitespace!{1}

// Likewise with let statements.
stmts_semicolon!{let x = ()}
stmts_whitespace!{let x = ()}

// When we get to two expression statements in a row, we need the spearator.
stmts_semicolon!{1; 1}
stmts_whitespace!{1 1}

// Items are also statements
stmts_semicolon!{struct X{}}
stmts_whitespace!{struct X{}}

// And the separator is needed when multiple item declaration statements.
stmts_semicolon!{struct X{}; struct X{}}
stmts_whitespace!{struct X{} struct X{}}

// Some items require ending semicolons, and that's not a part of the statement.
stmts_semicolon!{struct X;}
stmts_whitespace!{struct X;}

// So we need double semicolons when separating them.
stmts_semicolon!{struct X;; struct X;}
stmts_whitespace!{struct X; struct X;}

// But note, an empty statement is not valid for this macro.
// stmts_semicolon!{;} // ~err: expected a statement
//                  ^

// If the expression is a block with unit return, it must not end with a semicolon.
stmts_semicolon!{{}}

// And if the expression ends with non-unit, it must not either.
stmts_semicolon!{{0i32}}

// But the actual semantics in blocks requires semicolons terminating blocks
// with non-unit returns.
fn blocks_with_semis() {
    {} // End of statement
    
    // {0i32} ~err: mismatched types
    //  ^^^^ expected (), found i32
    
    {0i32} /* not end of statement */ ; // End of statement
    
    () // Explicit block expression.
}

// Furthermore, within blocks, extraneous semicolons are allowed and ignored.
// The Rust grammar that's unused by the compiler calls these statements while
// the compiler will just detect and discard them.
fn extraneous_semicolons() {
    ;;;;;;;;;;
}

// So, is the following an extraneous semicolon?
fn maybe_extraneous_semicolon() {
    {}; // Extraneous or ends the block expression?
    
    1;
    
    ()
}

// And in blocks, not every statement needs to end with a semicolon.
// Specifically items that aren't semicolon terminated don't need a semicolon
// after them in a block either.
fn statements_without_semicolons() {
    struct Foo {}
    struct Bar {}
    ()
}

Playpen

So this leaves me in a weird spot with semicolons. In non-macro-land, they appear to be required as part of an expression or let statement, while they are forbidden in macro-land. Should I just ignore the statement macro matcher?

Furthermore, in blocks, are semicolons that aren’t strictly necessary “extraneous” or “empty”?

And finally, is a semicolon after a block or control flow expression of unit type one of those extraneous/empty statements or is it actually a part of the expression statement?

ExpHP · June 29, 2018, 1:48am

$ cd ~/.cargo/registry/src/github.com-1ecc6299db9ec823/
$ ls
adler32-0.3.0                 fnv-1.0.6                     open-1.2.1                              serde-1.0.24                        wincolor-0.1.6
adler32-1.0.2                 font-loader-0.4.2             openblas-src-0.5.4                      serde-1.0.27                        ws2_32-sys-0.2.1
advapi32-sys-0.2.0            foreign-types-0.2.0           openblas-src-0.5.6                      serde-1.0.29                        x11-dl-2.12.0
aho-corasick-0.5.3            foreign-types-0.3.2           openssl-0.9.21                          serde-1.0.30                        xattr-0.1.11
                  ................................
                  ...... 1147 entries total ......
                  ................................
filetime-0.2.1                num_cpus-1.2.1                serde-1.0.16                            winapi-0.3.5                        
fixedbitset-0.1.9             num_cpus-1.4.0                serde-1.0.18                            winapi-build-0.1.1                  
flate2-0.2.20                 num_cpus-1.7.0                serde-1.0.19                            winapi-i686-pc-windows-gnu-0.4.0    
flate2-1.0.1                  num_cpus-1.8.0                serde-1.0.20                            winapi-x86_64-pc-windows-gnu-0.4.0  
fnv-1.0.5                     odds-0.2.25                   serde-1.0.21                            wincolor-0.1.4                      

$ rg '\$[a-zA-A0-9_]+\s*:\s*stmt' --no-heading
libc-0.2.32/src/macros.rs:53:        $($body:stmt);*
libc-0.2.36/src/macros.rs:53:        $($body:stmt);*
libc-0.2.17/src/macros.rs:53:        $($body:stmt);*
libc-0.2.40/src/macros.rs:53:        $($body:stmt);*
libc-0.2.38/src/macros.rs:53:        $($body:stmt);*
libc-0.2.20/src/macros.rs:53:        $($body:stmt);*
libc-0.2.39/src/macros.rs:53:        $($body:stmt);*
libc-0.2.41/src/macros.rs:53:        $($body:stmt);*
libc-0.2.42/src/macros.rs:53:        $($body:stmt);*
libc-0.2.33/src/macros.rs:53:        $($body:stmt);*
libc-0.2.35/src/macros.rs:53:        $($body:stmt);*
libc-0.2.37/src/macros.rs:53:        $($body:stmt);*
libc-0.2.30/src/macros.rs:53:        $($body:stmt);*
libc-0.2.34/src/macros.rs:53:        $($body:stmt);*
combine-2.5.2/src/lib.rs:371:        $stmt: stmt; $($parser: tt)*
combine-2.5.2/src/lib.rs:387:        $stmt: stmt; $($parser: tt)*

The two appearances in combine are #[doc(hidden)] macros. The appearance in libc is in a non-exported macro.

Great scott, stmt is worthless!

Edit: Who likes tables?!

Total number of unique[^1] lines containing any given macro matcher in my crates.io source cache:

Fragment	Count	Fragment	Count
`ident`	2257	`pat`	73
`expr`	2098	`block`	43
`tt`	1457	`item`	30
`ty`	655	[^2]`vis`	5
`meta`	174	`stmt`	2
`path`	123

[^1] Counted using a command of the form:

rg '\$[a-zA-A0-9_]+\s*:\s*expr' --no-heading --no-filename --no-line-number | sort | uniq | wc -l

which cuts out exact textual duplicates that often (though not always) arise due to the cache having multiple versions of the same crate. For the pat fragment I used pat[^h].

[^2]: vis isn’t even stable and yet it is still more common than stmt!

petrochenkov · June 29, 2018, 9:04am

I've recently made similar investigation.
I have some possible explanation for stmt matcher's behavior there.

Regarding statements outside of macros:

{}; // Extraneous or ends the block expression?

The answer is that it's unobservable and doesn't matter!
Right now parser can immediately eat one or two semicolons after a "naked statement" even if they are not required, and then eat remaining extraneous semicolons one-by-one, but that's an implementation detail.

Should I just ignore the statement macro matcher?

Quite probably.

petrochenkov · June 29, 2018, 9:10am

I think the simplest mental model of the current situation would be that

stmt matcher represents the “true statement”.
Other semicolons are parts of the containing block rather than parts of statements.
However, some statements require a semicolon to follow them in a block.

Soni · June 29, 2018, 12:40pm

can you make a macro to eliminate semicolons from most of the language?

fn foo() {
    f!{
        let a = 1 let b = 2 let c = 3
        return a + b + c
    }
}

kornel · June 29, 2018, 2:57pm

Probably not a regular macro, because it already follows the way Rust parses expressions. You could do it using a procedural macro, which gives you freedom to use any language grammar you want. However, you’d quickly find cases where it makes the language ambiguous and writing a sensible parser for it pretty hard.

Soni · June 29, 2018, 3:09pm

can’t you take a list of space-separated statements and add semicolons between them?

you can! http://play.rust-lang.org/?gist=ded014ea5406b05886368fe595bb50e1&version=stable&mode=debug

imagine this:

#[macro_use]
extern crate semicolon_free_rust;

scfr! {
// your code here
}

kornel · June 29, 2018, 5:59pm

Oh nice, but if I add return () it shows how hacky it is:

warning: expected `;`, found `return`
  --> src/main.rs:10:9
   |
10 |         return ()
   |         ^^^^^^
   |
   = note: This was erroneously allowed and will become a hard error in a future release

And you run into ambiguities like:

let ref b = 1
*b = 2

because it parses as let ref b = ((1 * b) = 2);

Soni · June 29, 2018, 7:16pm

yes, you run into those ambiguities, which if this was a real macro you could just add a semicolon to disambiguate.

they’re fine, if you know what’s up.

don’t blame me for your misunderstandings.

system · March 25, 2019, 8:30am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Understanding decisions behind semicolons language design	34	4324	January 18, 2022
Expression vs statement ambiguities language design	7	902	September 13, 2022
Make (Some) Separators Optional language design	53	6625	March 25, 2019
Allow more semicolons language design	2	858	March 25, 2019
Why if/else expression in Rust doesn't end with a ;?	6	1999	March 25, 2019

Semicolons and Statements

Edit: Who likes tables?!

Related topics