Semicolons and Statements

I’ve a few questions on statements based on the following observations.

//! Statements and semicolons

// We can capture statements in macros.
macro_rules! stmts_semicolon { ($($stmt: stmt);*) => {}}
macro_rules! stmts_whitespace { ($($stmt: stmt)*) => {}}

// Single expression statements must not be terminated by semicolons when
// captured by a macro. This implies that the semicolon is **not** a part of
// the statement.
stmts_semicolon!{1}
stmts_whitespace!{1}

// Likewise with let statements.
stmts_semicolon!{let x = ()}
stmts_whitespace!{let x = ()}

// When we get to two expression statements in a row, we need the spearator.
stmts_semicolon!{1; 1}
stmts_whitespace!{1 1}

// Items are also statements
stmts_semicolon!{struct X{}}
stmts_whitespace!{struct X{}}

// And the separator is needed when multiple item declaration statements.
stmts_semicolon!{struct X{}; struct X{}}
stmts_whitespace!{struct X{} struct X{}}

// Some items require ending semicolons, and that's not a part of the statement.
stmts_semicolon!{struct X;}
stmts_whitespace!{struct X;}

// So we need double semicolons when separating them.
stmts_semicolon!{struct X;; struct X;}
stmts_whitespace!{struct X; struct X;}

// But note, an empty statement is not valid for this macro.
// stmts_semicolon!{;} // ~err: expected a statement
//                  ^

// If the expression is a block with unit return, it must not end with a semicolon.
stmts_semicolon!{{}}

// And if the expression ends with non-unit, it must not either.
stmts_semicolon!{{0i32}}

// But the actual semantics in blocks requires semicolons terminating blocks
// with non-unit returns.
fn blocks_with_semis() {
    {} // End of statement
    
    // {0i32} ~err: mismatched types
    //  ^^^^ expected (), found i32
    
    {0i32} /* not end of statement */ ; // End of statement
    
    () // Explicit block expression.
}

// Furthermore, within blocks, extraneous semicolons are allowed and ignored.
// The Rust grammar that's unused by the compiler calls these statements while
// the compiler will just detect and discard them.
fn extraneous_semicolons() {
    ;;;;;;;;;;
}

// So, is the following an extraneous semicolon?
fn maybe_extraneous_semicolon() {
    {}; // Extraneous or ends the block expression?
    
    1;
    
    ()
}

// And in blocks, not every statement needs to end with a semicolon.
// Specifically items that aren't semicolon terminated don't need a semicolon
// after them in a block either.
fn statements_without_semicolons() {
    struct Foo {}
    struct Bar {}
    ()
}

So this leaves me in a weird spot with semicolons. In non-macro-land, they appear to be required as part of an expression or let statement, while they are forbidden in macro-land. Should I just ignore the statement macro matcher?

Furthermore, in blocks, are semicolons that aren’t strictly necessary “extraneous” or “empty”?

And finally, is a semicolon after a block or control flow expression of unit type one of those extraneous/empty statements or is it actually a part of the expression statement?

$ cd ~/.cargo/registry/src/github.com-1ecc6299db9ec823/
$ ls
adler32-0.3.0                 fnv-1.0.6                     open-1.2.1                              serde-1.0.24                        wincolor-0.1.6
adler32-1.0.2                 font-loader-0.4.2             openblas-src-0.5.4                      serde-1.0.27                        ws2_32-sys-0.2.1
advapi32-sys-0.2.0            foreign-types-0.2.0           openblas-src-0.5.6                      serde-1.0.29                        x11-dl-2.12.0
aho-corasick-0.5.3            foreign-types-0.3.2           openssl-0.9.21                          serde-1.0.30                        xattr-0.1.11
                  ................................
                  ...... 1147 entries total ......
                  ................................
filetime-0.2.1                num_cpus-1.2.1                serde-1.0.16                            winapi-0.3.5                        
fixedbitset-0.1.9             num_cpus-1.4.0                serde-1.0.18                            winapi-build-0.1.1                  
flate2-0.2.20                 num_cpus-1.7.0                serde-1.0.19                            winapi-i686-pc-windows-gnu-0.4.0    
flate2-1.0.1                  num_cpus-1.8.0                serde-1.0.20                            winapi-x86_64-pc-windows-gnu-0.4.0  
fnv-1.0.5                     odds-0.2.25                   serde-1.0.21                            wincolor-0.1.4                      

$ rg '\$[a-zA-A0-9_]+\s*:\s*stmt' --no-heading
libc-0.2.32/src/macros.rs:53:        $($body:stmt);*
libc-0.2.36/src/macros.rs:53:        $($body:stmt);*
libc-0.2.17/src/macros.rs:53:        $($body:stmt);*
libc-0.2.40/src/macros.rs:53:        $($body:stmt);*
libc-0.2.38/src/macros.rs:53:        $($body:stmt);*
libc-0.2.20/src/macros.rs:53:        $($body:stmt);*
libc-0.2.39/src/macros.rs:53:        $($body:stmt);*
libc-0.2.41/src/macros.rs:53:        $($body:stmt);*
libc-0.2.42/src/macros.rs:53:        $($body:stmt);*
libc-0.2.33/src/macros.rs:53:        $($body:stmt);*
libc-0.2.35/src/macros.rs:53:        $($body:stmt);*
libc-0.2.37/src/macros.rs:53:        $($body:stmt);*
libc-0.2.30/src/macros.rs:53:        $($body:stmt);*
libc-0.2.34/src/macros.rs:53:        $($body:stmt);*
combine-2.5.2/src/lib.rs:371:        $stmt: stmt; $($parser: tt)*
combine-2.5.2/src/lib.rs:387:        $stmt: stmt; $($parser: tt)*

The two appearances in combine are #[doc(hidden)] macros. The appearance in libc is in a non-exported macro.

Great scott, stmt is worthless!


Edit: Who likes tables?!

Total number of unique[^1] lines containing any given macro matcher in my crates.io source cache:

Fragment Count · Fragment Count
ident 2257 pat 73
expr 2098 block 43
tt 1457 item 30
ty 655 [^2]vis 5
meta 174 stmt 2
path 123

[^1] Counted using a command of the form:

rg '\$[a-zA-A0-9_]+\s*:\s*expr' --no-heading --no-filename --no-line-number | sort | uniq | wc -l

which cuts out exact textual duplicates that often (though not always) arise due to the cache having multiple versions of the same crate. For the pat fragment I used pat[^h].

[^2]: vis isn’t even stable and yet it is still more common than stmt!

5 Likes

I've recently made similar investigation.
I have some possible explanation for stmt matcher's behavior there.

Regarding statements outside of macros:

{}; // Extraneous or ends the block expression?

The answer is that it's unobservable and doesn't matter!
Right now parser can immediately eat one or two semicolons after a "naked statement" even if they are not required, and then eat remaining extraneous semicolons one-by-one, but that's an implementation detail.

Should I just ignore the statement macro matcher?

Quite probably.

I think the simplest mental model of the current situation would be that

  • stmt matcher represents the “true statement”.
  • Other semicolons are parts of the containing block rather than parts of statements.
  • However, some statements require a semicolon to follow them in a block.
1 Like

can you make a macro to eliminate semicolons from most of the language?

fn foo() {
    f!{
        let a = 1 let b = 2 let c = 3
        return a + b + c
    }
}

Probably not a regular macro, because it already follows the way Rust parses expressions. You could do it using a procedural macro, which gives you freedom to use any language grammar you want. However, you’d quickly find cases where it makes the language ambiguous and writing a sensible parser for it pretty hard.

can’t you take a list of space-separated statements and add semicolons between them?

you can! http://play.rust-lang.org/?gist=ded014ea5406b05886368fe595bb50e1&version=stable&mode=debug

imagine this:

#[macro_use]
extern crate semicolon_free_rust;

scfr! {
// your code here
}

Oh nice, but if I add return () it shows how hacky it is:

warning: expected `;`, found `return`
  --> src/main.rs:10:9
   |
10 |         return ()
   |         ^^^^^^
   |
   = note: This was erroneously allowed and will become a hard error in a future release

And you run into ambiguities like:

let ref b = 1
*b = 2

because it parses as let ref b = ((1 * b) = 2);

yes, you run into those ambiguities, which if this was a real macro you could just add a semicolon to disambiguate.

they’re fine, if you know what’s up.

don’t blame me for your misunderstandings.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.