Understanding decisions behind semicolons

I am new to Rust. Same as most of you guys, I have seen some languages and created/ maintaining a lisp dialect for metaprogramming needs that's hard to do without it.

As new to the language, I am fascinated with the ownership system, its really cool. Lifecycle is also a little peculiar but perfectly understandable.

I saw few posts here but could not find what I was looking for. I understand the high level stuff but confused by a simple one.

I am trying to port a Kotlin project of mine to Rust. In Kotlin, everything is an expression and result of last line to be executed in a block becomes the result of that block which also determines its type. I can use the semicolons o end expressions anywhere but Ctrl+F will probably fail me.

I noticed that in Rust, situation is similar but opposite. Last line still has something to do with returning the result and determining the type of block. But couldn't understand what semicolons have to do with the whole process, you get what I am trying to say?

Appreciate any help from the designers I can get as a newcomer to the language.

Questions like this should best be asked in the users forum not here in internals.

3 Likes

To give a short answer here, blocks in Rust look like

{
    statement;
    statement;
    // ···
    statement;
    expression
}

or

{
    statement;
    statement;
    // ···
    statement;
}

In case there is a final expression without semicolon, that’s the return value of the block. Most of Rust’s syntax is expression-based, in fact, every expression can be used as a statement. The only kinds of statements that aren’t expressions are let statements.

The return value of a block is the return value of the function in case the block is the body of a function, otherwise, the block is an expression evaluating to its return value, so you can e.g. assign it in an outer level

{
    let value = {
        some_expression;
        the_value
    };
    // … etc
}

which will execute some_expression and then evaluate and assign the_value to value.

And blocks also appear as part of control-flow expressions, some of them also support return values, e.g. if.

Coming back to semicolons, there’s one final rule: Certain kinds of expressions make semicolons optional when you use them as a statement, so you can have

{
    statement;
    certain_kind_expression // no semicolon
    statement;
    expression
}

This includes expressions such as if and match and for, and also blocks themselves, but only if they don’t evaluate to anything themselves.

{
    let mut a = 42_i32;
    if a > 40 { println!("big!"); } // no semicolon after the `if`
    a /= 2;
    a.pow(2)
}

Note, technically every expression or black evaluates to something. In the discussion above, blocks without return values or expressions not evaluating to anything actually means that they evaluate to “unit”, i.e. a value of type ().

3 Likes

Actually I did. They redirected me here. My desire to understand a design decision was closed in favor of RFC or something like that. And it's not like we can't get an RFC out of it in a distant timeline.

Well, your question above, e.g. this part

seems like you’re still trying to understand the design itself, not the decisions behind it.


In this case, you might want to include a link back to there in your post here, so people can understand the context of your question better :wink:

This doesn’t make too much sense to me, I’m curious as to what was actually said and if necessary would like to help your understanding of how and why “RFC” was mentioned. Please give us a link to this discussion :slight_smile:

2 Likes

Appreciate the help. But I know how this had been laid out since it isn't that complex. My main concern is 'why?' since there are simple alternative design decision like the ones provided by Kotlin

I’m not really familiar with Kotlin, so you’d have to present some explanation and/or good links for the “design decisions by Kotlin”. Quick Google search reveals that the syntax seems to be sensetive to newlines. Rust doesn’t do that, whitespace is just whitespace (outside of string literals) and it doesn’t matter how many spaces or tabs or newlines you do or don’t have. That’s clearly a trade-off between eliminating semicolons and avoiding whitespace-sensitive code. Design decision in this space is ultimately rather arbitrary, and design ends up being what the people in charge of designing favored at the time they designed it.

If your intention isn’t to change the design of Rusts in this regard, I’d somewhat question the relevance to this forum, the user’s forum seems totally appropriate for the question to ask for help in understanding the design of Rust. But RFC’s being mentioned can only mean that either you tried too hard convincing others that Rust should change or someone found an old RFC that contains an answer to your question. If it was the former than that’s a way of telling you why your proposal to change Rust’s usage of semicolons is futile (the argument being: It would need to be approved in an RFC). Rust’s design also includes stability, changing syntax in a fundamental way would be against that. So if your intention is to change the design of Rusts, this forum probably won’t get you there either.

1 Like

I did provide a simple exaplanation of how stuff works in Kotlin right at top. I wont go into tangents. I take that decision was just arbitrary or reasoning behind it is lost in time and no one else bothered to rethink about it since its a minor inconvenience at a parser level and we have high level features to worry about. I get that, appreciate your help if there isn't anything else I did not get.

1 Like

Note that Rust is rather old. For example Rust 0.3, from July 2012, comes with a tutorial that explains rules that seem pretty much identical to how things still are today. And who knows how many months or years blocks and expressions already worked that way up to that point.

I guess for many of us, it takes some time to become familiar with the language enough to stop seeing semicolons as an inconvenience and start interpreting it as a design that feels clear and simple and intuitive :slight_smile:

The semicolon basically has three advantages.

  • First, it allows for clear syntactical rules about where a statement ends without needing to assign meaning to line breaks. Admitted, even in Rust there are still corner cases where things can become confusing due to the special cases for control-flow structures like if and match (but the compiler diagnostics help out in those cases). Not having to worry about formatting at all can simplify code generation and automatic code formatting.
  • Second, the semicolon as used in Rust can convey meaning: You can (and have to) use it to actively control whether or not the final expression of a block should be a “return value” of the block.
  • Third, the semicolons make Rusts’s syntax seem a little bit more familiar to people coming from languages such as C or Java. Maybe this isn’t much of an advantage though.
9 Likes

This part of Rust syntax is copied more or less directly from OCaml. (Note: The very first Rust compiler was written in OCaml.) It's been in Rust unchanged since fairly near the start of the language's development, so it didn't go through anything like the RFC process.

14 Likes

It's much more complex than you would think with a cursory investigation. (I would know; I've spent a good deal of effort trying to "solve" the lack of semicolons for my own toy language design experiments.)

The first question would be "how do you get a unit block in a world without semicolon?" Thankfully, that has an obvious (if somewhat clunky) solution: just use let _ = $expr (or drop($expr)).

The real key problem is: if you don't have visible statement terminators, how do you know where statements are separated? My conclusion is that for an imperative, side-effectful programming language (one where you would want to evaluate a statement purely for its side effects, not just its return value), explicit statement terminators are more valuable than any benefit of not having to add terminators.

(Plus; it could be considered more proper to think of ; as a postfix operator that takes an expression and turns it into a statement.)

Keep in mind that Rust is not only about communicating to the compiler what the programmer wants, but also in communicating the program semantics clearly to the developer(s).

Some of the thorniest edge cases are:

a
[b]
// could be `a[b]` (index expression) or `a; [b]` (array/slice expression)
//                  ^ user overloadable

a
::b
// could be `a::b` (path expression) or `a; ::b` (two path expressions)

a
- b
// could be `a - b` (infix minus expression) or `a; -b` (prefix minus expression)
//                   ^ user overloadable                 ^ user overloadable

break
a
// could be `break a` or `break; a`

These are all human ambiguities that you can avoid just by having a statement terminator. Together with the benefit for unit-returning functions and the familiarity it offers to C-style braces language users (a primary target audience of Rust), it makes more sense for Rust to just use semicolons rather than spend complexity budget on handling these edge cases.

(I don't know how Kotlin handles the ambiguity. Prefix minus is the big one here to look at, as it's always present, and the nail in the coffin of my opinion for side-effectful languages wanting for statement terminators.)

(Swift's approach here is interesting; newlines are still insignificant, but operator binding is based on being (not) spaced away from the expression(s) it applies to. a - b and a-b are always infix, a- b is always postfix, and a -b is always prefix. I believe indexing has to be "joint" on the left side as well. This is similar to how rustc glues together multicharacter operators from pieces, as opposed to the traditional (e.g. C++, Java) approach of operator splitting.)

18 Likes

I'd say, regardless of how complex it is, we have a strong evidence that semicolon-less syntax works just fine for imperative languages. Both Go and Kotlin lack semicolons, and I haven't heard people complaining about that.

Implementation in Kotlin doesn't seem complicated:

A unit-returning block is spelled as unit-returning block:

{
  92
  Unit
}

This is rarely needed (much rare than ; in Rust), because anything coerces to Unit at the end of the block:

fun main() {
    val x: String = run { "not coerced" }
    val y: Unit = run { "coerced, gets `unused expr` warning" }
}

Here's a rare real-life example where explicit Unit is required.

This approach does make \n significant (IIRC, it is significant in Go as well), which doesn't work well with Rust's tokentree based macro expansion model. But then, current trailing semicolon rule creates some problems for macro expansion as well: the concept of "tail expression" becomes ambiguous when macros expand to nothing (can't immediately find a link to the relevant issue).

I also would like to note that it's not like existing semicolon rules are syntactically unambiguous:

if true { 1 } else { 1 }
&2

what is & here -- a bitwise and or a reference? The answer depends on the context:

{
  let _ = 
  // Single expression
  if true { 1 } else { 1 }
  &2
  ;
  // A statement and an expression
  if true { 1 } else { 1 }
  &2
}

All that being said, the discussion here is theoretical -- there's no way we can change Rust now to make semicolons optionally. One actionable thing we can do is to fix the tooling to make working with existing language easier. Here's the relevant issue: Automatically add semicolon · Issue #3830 · rust-analyzer/rust-analyzer · GitHub.

12 Likes

Here's a bug in someone's program that could reasonably be considered a complaint. The author might still be pro-no-semicolon but I think it's useful to see real bugs introduced by the lack of semicolons.

I wanted to like your example but the fact that let _ = and ; were "cut out" kinda ruins point, IMO. Lots of subexpressions cannot be parsed (e.g., < and > and their interaction with generics) without their surrounding context.

11 Likes

Just to elucidate more here: IIRC, ; is actually an operator in ocaml that basically looks like this:

fn (;)<T, U>(_: T, next: U) -> U { next }

Thus, you can view a ; b as "execute a and b; discard a", similar to the , operator in C++. Now, this isn't exactly how it works in Rust, but it's a good approximation for why it works the way it does.

1 Like

That’s exactly the point: the fact that we allow dropping ; after if “statement” creates ambiguity that requires context. Even if in both cases rustc would prefer one parse, that would still be human-level ambiguity. The fact that the grammar (very deliberately) resolves the ambiguity in two different ways in different contexts just makes this example more fun.

The point is that orthogonal, unambiguous and otherwise nice grammar doesn’t necessary result in the most usable language. The absolute “just use semicolons to minimize compiler’s and human’s confusion” position would require semicolons after if, while and friends. Rust’s position is already more nuanced than that — we accept complexity and edge cases in the grammar to get rid of some of the semicolons. Getting rid of all the semis is just another step in that direction, it’s not a radical departure from today’s language.

2 Likes

I've always found this one confusing, to me it should interpret the if true { 1 } else { 1 } as an expression, since there's a (potentially) binary operator between the if ... {<non_semicolon_terminated>} else {<non_semicolon_terminated>} and the 2.

Basically, if should only be a statement if all the branches end in ; or are empty

1 Like

i just want to add that knowing what rustc does to resolve these ambiguities as well as carry enough metadata context to provide smart suggestions, which I'm sure RA also does in one way or another, the autoinsertion of ; could work but is far from trivial, particularly because there are cases we don't really handle correctly even with those efforts.

3 Likes

I would honestly like Rust slightly better if a semicolon was required after the final statement/expression in a block, and

let val = {
  statement 1;
  statement 2;
  statement 3;
};

set val to the value of statement 3. Write an explicit (); as the last statement if you want the block's value to be unit.

2 Likes

I think that would be needlessly verbose

1 Like

Enh. It's a trivial amount of extra typing for greatly enhanced consistency and never again having to close-read for whether I / the person whose code I'm reviewing remembered to leave the semicolons out in just the right places. I'll take that action.