Make (Some) Separators Optional


#1

Every time I switch back to Rust from languages that finally have done away with the mostly redundant need for semicolons and commas everywhere, I get really annoyed.

As I see the Rust syntax, in most places they are really optional and not required for correctly parsing the code.

Here are the places I think they could just be dropped, without complicating the parser much:

  • extern crate xxx**;**
  • use std::fs**;**
  • struct X { some_struct_field: u64**,** }
  • enum…
  • trait T { fn f(); }

In all those examples, the parser has to detect a path or a (return) type, which should be perfectly possible without the terminator. No expressions are involved here, which would be a lot harder to detect without ambiguity / backtracking / more extensive lookahead.

Making those optional should be backwards compatible, and already would have a nice convenience impact. The parser would optionally accept a newline instead of a semicolon in those places.

The semicolon after expressions can be annoying too (at least for me), but it actually carries semantic meaning and probably can’t be touched.

Swift is a good example here, because it is very close to Rust syntax/feature wise (traditional C family, but with generics, pattern matching, lambdas, …), and is perfectly fine without them.

What do you think?


#2

People make fun of Javascript for automatic semicolon insertion, so I think we need to be careful here. I do wish I didn’t need semicolons, but I think that ship as sailed.

Go also does automatic semicolon insertion, but it’s somewhat tempered by embedding a style guide into the language and encouraging everyone to use an autoformatter.


#3

The problems with Javascript are mostly historical, because especially Internet Explorer (pre 9) didn’t handle them well and often caused parsing / runtime errors. Nowadays it’s perfectly safe to write JS without them.

Go has very simple syntax, but Swift for example, which is very close to Rust in syntax + features and quite complex (as in traditional C family, but also with generics, matching, …) is also perfectly does without them.

Rust is the only recent/relatively new language (that I know of) that hasn’t dropped them or made them optional.


#4

This one can be use near expression context, e.g.

fn foo() {
    use std::fs
    ::std::process::exit(0)
}

It’s not presently acceptable code, but you’d have to parse all the way to the ( and then backtrack to decide the newline should be the separator instead.

I’m not generally a fan of significant whitespace, but I accept this in Python because the rules are consistently applied. Having optional newline separators sounds fraught with danger.


#5

Here’s a working use of use that would be ambiguous without a semicolon:

struct File;
impl Drop for File {
    fn drop(&mut self) {
        println!("Dropping");
    }
}
/// Prints "Dropping"
fn a() {
    use std::fs;
    ::File;
}
/// No-op
fn b() {
    use std::fs
    ::File;
}

#6

Aside from the ambiguity @jethrogb beat me to, the details of this change would be very tricky since Rust currently ignores all whitespace. A newline token would have to be introduced, and new tokens bring headaches with them. The most significant is a Rust-specific one, so analogies with other languages are of limited use: Should newlines be “real” tokens, able to be consumed, matched, and generated by macros?

  • If so, a number of rules need to be set down: Follow sets for the backwards compatibility checks on macro_rules arms, should macros be able to generate newline tokens (and if so, how), should they be preserved for pretty-printing (this affects the AST), etc.
  • If not, macros can’t mimic this aspect of Rust syntax, which is annoying for using macros that splice ordinary Rust syntax into a different context. For example, a macro can currently accept use statements using the path fragment and repetition, but if newlines aren’t real tokens then use items terminated by a newline can’t be accepted without also accepting invalid code (e.g., accepting the following use coming right after in the same line) or running into ambiguity errors (e.g., a path followed by an ident).

#7

@rkruppe Regarding the parser, one neat trick I have found to work nicely is to attach something like a after_newline flag to each token, that is set to true when a a newline precedes the token. (I usually also add after_whitespace, but that should’nt be relevant here. This is trivial to detect in the lexer, removes the need to have separate newline/whitespace tokens and doesn’t complicate the parser, but you can always easily check if the token is after a newline / indented.

Regarding use of use in expression contexts like inside an fn, one could make the semicolon required here. (Assuming the parser is written in a way that the context is easily detectable).

This might be confusing though (why do I need a semicolon for use in a nested scope, but not in the top level…).


#8
macro_rules! yolo {
    ($($e:stmt)*) => {
        $($e);*
    }
}

fn main() {
    yolo! {
        println!("like this?")
        let result = 1 + 1
        println!("1 + 1 = {}", result)
    }
}

#10

I have read an excellent article by Walter bright once about this very subject (I don’t have a link unfortunately). The gist of it is that we need to consider what we are trying to solve in the human realm before trying to solve a technical/engineering problem.

Natural languages are not orthogonal. They have plenty of redundancies and evolved as such because we humans need these redundancies for error checking, sync points, etc. Programming languages are used by humans first and foremost, which is why we have plenty of “redundancies” that the compiler doesn’t care about, such as human-readable significant identifier names.

While it’s technically possible to remove the need for semicolons, they are actually very useful for the human user for the same reasons we have punctuation in English. Python is great for short scripts, presentation slides, etc, but for a big long-lived enterprise project I’d like to have my semicolons, thank-you-very-much.


#11

Perhaps this one?


#12

This is not a constructive comment. Many of the languages that don’t require ; have made great strides in language design & deserve respect, including some of the languages you were probably thinking of (e.g. Python, Ruby, Go) as well as languages I can’t imagine you meant, because of how clearly important their contributions are (e.g. ML, Haskell).


#13

ML is a good example of a usable & readable language as opposed to Haskell which has completely absurd syntax. Both are very important languages to know and have important lessons to teach us with regards to their semantics but the tersness if Haskell makes it a soup of identifiers and completely unusable as they decided to remove parentheses from function calls. Even handwritten mathematical functions have parentheses!

Clearly the most interesting languages were developed by very smart people that completely ignored human usability design factors. Those require a separate set of skills which I feel aren’t technically oriented.


#14

Moderator note: Everyone please remember to keep their comments constructive. The Rust forums aren’t the place to deliver unstructured critique of other projects.


#15

Apparently you apply two standards here: OP, who sees , and ; of languages to be something good generally, and me, who sees languages/code without , and ; generally to be something bad generally. So I get the moderator notice? This is unfair!

Well either way, you are right @withoutboats, in functional languages like Haskell it indeed makes much sense and feels good. I really enjoy writing in Haskell!. I was mostly referring to the trend to remove it from recent languages like Kotlin. Also, I think there is also some merit to put it into beginner focused languages like Lua that are simplified to the minimum.


#16

The moderator note was not directed at any one individual. If anyone has questions about moderation, please email the mods. :slight_smile:


#17

Speaking for myself (not a moderator, and my prior comment was not a moderator comment), the reason I found your comment unconstructive but not the OPs was that the OP described their negative experience using Rust (they are “annoyed” they have to type semicolons), but you imputed negative attributes to the authors of other languages (“their authors weren’t creative enough”).

On the original subject, I’m not really in favor of eliminating these separators. I think all of these semicolons are to keep these items consistent with how they’d be used in an expression context. I think the commas between fields/variants are not strictly necessary, but including them feels more natural than not to me.


#18

As someone who is still learning Rust, but has written a great deal of code in both C and Python (with their diametrically opposed attitudes toward semicolons), and somewhat less JavaScript (with the infamous “semicolon insertion” rules):

  • A modest amount of syntactic redundancy is a Good Thing for both error recovery and human ability to scan through code quickly.

  • Newline being syntactically different from other whitespace is a Bad Thing unless the language was designed from the ground up with that in mind (as Python was, and even then there are still places where it causes problems).

  • Semicolons at the end of statements are a Good Thing. They eliminate an entire class of syntactic ambiguities that you can deal with in other ways, but not as cleanly. I am still at “eegh, not sure I like this” on implicit return by absence of semicolon, and part of why is that semicolon not being a mandatory terminator for all statements makes me worry that there is some nasty parser-ambiguity gotcha waiting for me the next time I need to put a complex expression in tail position.

  • Rust still trips me up sometimes with the places where C requires a semicolon, but Rust requires the absence of a semicolon, e.g. right after the close curly of struct Foo { ... } — this is because Rust omits the legacy C feature of being able to inline a struct declaration into a variable declaration, so it’s actually a good change in Rust, but it’s something to highlight for n00bs coming from the C family maybe.

  • Rust also still trips me up with the places where it requires a comma where C would take a semicolon, e.g. in between enum and match cases and struct fields. Allowing semicolon in those contexts would remove a rough edge. (Relatedly, whereever either semicolon or comma is used to separate a list of things, a trailing separator should always be allowed.)

  • At module scope only, I could argue that @theduke’s original proposal to make semicolons optional after extern crate foo and use foo would be a virtuous change, provided that we also make them optional after the <item>s where they are currently forbidden. (Specifically, we could change the grammar such that all <item>s may be followed by either zero or one semicolons, but only when parsing a module — not when parsing the body of anything other than a module.) I think this introduces no new grammar ambiguities, and it might indeed reduce visual clutter.

  • I do not like the idea of removing explicit separators in between structure fields, enum cases, or trait members.


#19

I’m not in favour or omitting semicolons in general. It hasn’t worked out well for JS — it caused quirks, and in the end majority of developers write semicolons anyway.

However, I wouldn’t mind a bit of flexibility on whether semicolons are terminators {a;b;} or separators {a;b}. Both interpretations have pros and cons, and which one is required varies between languages and from construct to construct.

Coming from C I keep making the mistake of typing struct Foo {};.


#20

Personally, I keep forgetting when semicolons are required and when they’re forbidden in C++, Rust and Javascript. And I keep forgetting to get my indentation right in Python. Changing any of the above wouldn’t really reduce my fat fingering any more than making Rust use === for equality would make me stop accidentally writing === in my C++ code every day.

While Go can get away with ASI, I see that as the exception that proves the rule. ASI is acceptable in Go is because it’s a very opinionated and simple language targeting a fairly specific type of project (web services) and thus can successfully get most of its userbase using the one true coding style. Plus, it’s still using newlines as statement terminators, even with ASI. Although Rust is more opinionated than C, it’s also trying to have a much broader appeal than Go, and is much more complicated because it tries to do a bunch of things Go (and C) simply can’t do.

tl;dr I think every language benefits from some kind of explicit “this is the end of the statement” character, be it semicolon or newline or whatever, except for languages that don’t have statements at all.


#21

I’ll probably never get used to commas separating fields, so :heart: to whoever put in “help: struct fields should be separated by commas”. But additionally allowing semicolons (or omitting them) and causing style guide arguments would be worse than the status quo, IMHO.

I definitely get tripped up after structs. (Apparently my brain filed Rust under C+±like, not C#-like.) It doesn’t help that struct Bar1; must have it but struct Bar2 {}; must not. Though "expected item, found ;" is clear enough that I don’t lose meaningful time to it.

@kornel Any particular places for terminators/separators? I know that arrays, (non-unit) tuples, structs, use, and enums all allow both. Is it just the end of a block? (Hmm, traits are terminator-only, though personally I like that.)