Semicolons and Statements


#1

I’ve a few questions on statements based on the following observations.

//! Statements and semicolons

// We can capture statements in macros.
macro_rules! stmts_semicolon { ($($stmt: stmt);*) => {}}
macro_rules! stmts_whitespace { ($($stmt: stmt)*) => {}}

// Single expression statements must not be terminated by semicolons when
// captured by a macro. This implies that the semicolon is **not** a part of
// the statement.
stmts_semicolon!{1}
stmts_whitespace!{1}

// Likewise with let statements.
stmts_semicolon!{let x = ()}
stmts_whitespace!{let x = ()}

// When we get to two expression statements in a row, we need the spearator.
stmts_semicolon!{1; 1}
stmts_whitespace!{1 1}

// Items are also statements
stmts_semicolon!{struct X{}}
stmts_whitespace!{struct X{}}

// And the separator is needed when multiple item declaration statements.
stmts_semicolon!{struct X{}; struct X{}}
stmts_whitespace!{struct X{} struct X{}}

// Some items require ending semicolons, and that's not a part of the statement.
stmts_semicolon!{struct X;}
stmts_whitespace!{struct X;}

// So we need double semicolons when separating them.
stmts_semicolon!{struct X;; struct X;}
stmts_whitespace!{struct X; struct X;}

// But note, an empty statement is not valid for this macro.
// stmts_semicolon!{;} // ~err: expected a statement
//                  ^

// If the expression is a block with unit return, it must not end with a semicolon.
stmts_semicolon!{{}}

// And if the expression ends with non-unit, it must not either.
stmts_semicolon!{{0i32}}

// But the actual semantics in blocks requires semicolons terminating blocks
// with non-unit returns.
fn blocks_with_semis() {
    {} // End of statement
    
    // {0i32} ~err: mismatched types
    //  ^^^^ expected (), found i32
    
    {0i32} /* not end of statement */ ; // End of statement
    
    () // Explicit block expression.
}

// Furthermore, within blocks, extraneous semicolons are allowed and ignored.
// The Rust grammar that's unused by the compiler calls these statements while
// the compiler will just detect and discard them.
fn extraneous_semicolons() {
    ;;;;;;;;;;
}

// So, is the following an extraneous semicolon?
fn maybe_extraneous_semicolon() {
    {}; // Extraneous or ends the block expression?
    
    1;
    
    ()
}

// And in blocks, not every statement needs to end with a semicolon.
// Specifically items that aren't semicolon terminated don't need a semicolon
// after them in a block either.
fn statements_without_semicolons() {
    struct Foo {}
    struct Bar {}
    ()
}

So this leaves me in a weird spot with semicolons. In non-macro-land, they appear to be required as part of an expression or let statement, while they are forbidden in macro-land. Should I just ignore the statement macro matcher?

Furthermore, in blocks, are semicolons that aren’t strictly necessary “extraneous” or “empty”?

And finally, is a semicolon after a block or control flow expression of unit type one of those extraneous/empty statements or is it actually a part of the expression statement?


#2
$ cd ~/.cargo/registry/src/github.com-1ecc6299db9ec823/
$ ls
adler32-0.3.0                 fnv-1.0.6                     open-1.2.1                              serde-1.0.24                        wincolor-0.1.6
adler32-1.0.2                 font-loader-0.4.2             openblas-src-0.5.4                      serde-1.0.27                        ws2_32-sys-0.2.1
advapi32-sys-0.2.0            foreign-types-0.2.0           openblas-src-0.5.6                      serde-1.0.29                        x11-dl-2.12.0
aho-corasick-0.5.3            foreign-types-0.3.2           openssl-0.9.21                          serde-1.0.30                        xattr-0.1.11
                  ................................
                  ...... 1147 entries total ......
                  ................................
filetime-0.2.1                num_cpus-1.2.1                serde-1.0.16                            winapi-0.3.5                        
fixedbitset-0.1.9             num_cpus-1.4.0                serde-1.0.18                            winapi-build-0.1.1                  
flate2-0.2.20                 num_cpus-1.7.0                serde-1.0.19                            winapi-i686-pc-windows-gnu-0.4.0    
flate2-1.0.1                  num_cpus-1.8.0                serde-1.0.20                            winapi-x86_64-pc-windows-gnu-0.4.0  
fnv-1.0.5                     odds-0.2.25                   serde-1.0.21                            wincolor-0.1.4                      

$ rg '\$[a-zA-A0-9_]+\s*:\s*stmt' --no-heading
libc-0.2.32/src/macros.rs:53:        $($body:stmt);*
libc-0.2.36/src/macros.rs:53:        $($body:stmt);*
libc-0.2.17/src/macros.rs:53:        $($body:stmt);*
libc-0.2.40/src/macros.rs:53:        $($body:stmt);*
libc-0.2.38/src/macros.rs:53:        $($body:stmt);*
libc-0.2.20/src/macros.rs:53:        $($body:stmt);*
libc-0.2.39/src/macros.rs:53:        $($body:stmt);*
libc-0.2.41/src/macros.rs:53:        $($body:stmt);*
libc-0.2.42/src/macros.rs:53:        $($body:stmt);*
libc-0.2.33/src/macros.rs:53:        $($body:stmt);*
libc-0.2.35/src/macros.rs:53:        $($body:stmt);*
libc-0.2.37/src/macros.rs:53:        $($body:stmt);*
libc-0.2.30/src/macros.rs:53:        $($body:stmt);*
libc-0.2.34/src/macros.rs:53:        $($body:stmt);*
combine-2.5.2/src/lib.rs:371:        $stmt: stmt; $($parser: tt)*
combine-2.5.2/src/lib.rs:387:        $stmt: stmt; $($parser: tt)*

The two appearances in combine are #[doc(hidden)] macros. The appearance in libc is in a non-exported macro.

Great scott, stmt is worthless!


Edit: Who likes tables?!

Total number of unique[^1] lines containing any given macro matcher in my crates.io source cache:

Fragment Count · Fragment Count
ident 2257 pat 73
expr 2098 block 43
tt 1457 item 30
ty 655 [^2]vis 5
meta 174 stmt 2
path 123

[^1] Counted using a command of the form:

rg '\$[a-zA-A0-9_]+\s*:\s*expr' --no-heading --no-filename --no-line-number | sort | uniq | wc -l

which cuts out exact textual duplicates that often (though not always) arise due to the cache having multiple versions of the same crate. For the pat fragment I used pat[^h].

[^2]: vis isn’t even stable and yet it is still more common than stmt!


#3

I’ve recently made similar investigation.
I have some possible explanation for stmt matcher’s behavior there.

Regarding statements outside of macros:

{}; // Extraneous or ends the block expression?

The answer is that it’s unobservable and doesn’t matter!
Right now parser can immediately eat one or two semicolons after a “naked statement” even if they are not required, and then eat remaining extraneous semicolons one-by-one, but that’s an implementation detail.

Should I just ignore the statement macro matcher?

Quite probably.


#4

I think the simplest mental model of the current situation would be that

  • stmt matcher represents the “true statement”.
  • Other semicolons are parts of the containing block rather than parts of statements.
  • However, some statements require a semicolon to follow them in a block.

#5

can you make a macro to eliminate semicolons from most of the language?

fn foo() {
    f!{
        let a = 1 let b = 2 let c = 3
        return a + b + c
    }
}

#6

Probably not a regular macro, because it already follows the way Rust parses expressions. You could do it using a procedural macro, which gives you freedom to use any language grammar you want. However, you’d quickly find cases where it makes the language ambiguous and writing a sensible parser for it pretty hard.


#7

can’t you take a list of space-separated statements and add semicolons between them?

you can! http://play.rust-lang.org/?gist=ded014ea5406b05886368fe595bb50e1&version=stable&mode=debug

imagine this:

#[macro_use]
extern crate semicolon_free_rust;

scfr! {
// your code here
}

#8

Oh nice, but if I add return () it shows how hacky it is:

warning: expected `;`, found `return`
  --> src/main.rs:10:9
   |
10 |         return ()
   |         ^^^^^^
   |
   = note: This was erroneously allowed and will become a hard error in a future release

And you run into ambiguities like:

let ref b = 1
*b = 2

because it parses as let ref b = ((1 * b) = 2);


#9

yes, you run into those ambiguities, which if this was a real macro you could just add a semicolon to disambiguate.

they’re fine, if you know what’s up.

don’t blame me for your misunderstandings.