pre-RFC: Leading commas

The Pre-RFC process

Here I'd mostly like the following discussed:

  • Are there more motivations?
  • Do the drawbacks sufficiently cover the situation? are there more?
  • Are any cases missing?

In other words: I'd like help with describing the situation as fairly, honestly, and accurately as possible.

The RFC

  • Feature Name: leading_comma
  • Start Date: 2018-04-<TODO>
  • RFC PR: (leave this empty)
  • Rust Issue: (leave this empty)

Summary

Permit leading commas everywhere trailing commas are permitted.

An example:

let array = [
    , 1
    , 2
    , 3
];

Motivation

This RFC is purely about style and accomodating the tastes and formatting habbis of more people.

What this RFC is not for

This RFC does not suggest that everyone should adopt leading commas as their new one-true-style. In particular, this RFC does absolutely not change the current rustfmt style.

This RFC does also not proscribe how one should style leading commas either when used and leaves up to people or a style RFC.

It is in line with Rust's philosophy to have a tolerant grammar

In particular, RFC 1925 was accepted to allow a leading | in match and then it was said that:

We tend to support a flexible grammar when possible, including things we don't consider idiomatic at all. Philosophically, this is an RFC we are inclined to accept, and there'd probably have to be a pretty strong, practical reason for us not to accept it.

- @withoutboats

We accept leading semicolons

In other words, the following is a legal Rust program:

fn foo() {}

fn main() {
    ; foo()
    ; foo()
}

This style of formatting is probably quite uncommon though.

It should however be noted that we accept an arbitrary number of semicolons anywhere in a block, so the following program is also valid:

fn main() { ; ; ; }

Leading commas read like bullet points

To answer the question "Why would anyone want leading commas?"

Consider the following type:

struct Foo {
    , alpha: A
    , beta: B
    , gamma: G
}

and compare that to:

  • alpha: Alpha conversion is the process of renaming variables.
  • beta: Beta reduction is a fancy term for reducing function application to a value.
  • gamma: An uppercase gamma (Đ“) is often the typing environment.

A similarity here is that , acts like the spine of a list as in the bullet point list above.

The commas can also be, for some people, more visually clear compared to commas at the end. To some, this is however a drawback as they would see the list spine as noise.

Guide-level explanation

In places where trailing commas are permitted, leading commas are now also permitted.

Let's go through a few examples of such places. Note that some of these uses can look quite strange but are included for consistency and generality. We also illustrate some different possible styles even tho they may not be optimal for leading commas.

Arrays and vec![..]

let arr = [
    , 1
    , 2
    , 3
];

let vec = vec![
    , "Alan Turing"
    , "Joan Clarke"
    , "Hugh Alexander"
];

Structs

As seen above in the motivation, we can use leading commas in structs:

struct Thing {
    , velocity: (f64, f64)
    , color: Color
    , autonomous: bool
}

and the associated pattern:

let Thing {
    , velocity
    , color
    , autonomous
} = foo;

Unions

union Foo {
    , as_i32: i32
    , as_u32: u32
}

Enums

enum Foo<T> {
    , Recv(
        , usize
        , T
      )
    , Send {
        , id: usize
        , msg: T
    }
}

Match

and the associated patterns:

match foo {
    , Foo::Recv(
        , a
        , b
    ) => recv(a, b)
    , Foo::Send
        {
        , id
        , msg
        } => send(id, msg)
}

Tuples

let (, b): (, usize) = (, 2);
// equivalent to:
let (b,): (usize,) = (2,);

let (, a, b, c): (, usize, usize, usize) =
    (
    , 42
    , 1337
    , 42
    );
// equivalent to:
let (a, b, c): (usize, usize, usize) = (1, 2, 3);

Function arguments, universal quantification sites, and where clauses

fn foo<
    ,'a
    , Bar
    , Quux
   >(
    , bar: Bar,
    , quux: Quux
) -> Foo<
    , Bar
    , Quux
> where
    , T: Clone
    , U: Copy
{
}

Turbofish

let x = alpha::<, Foo, Bar>();

#[derive] and #[attribute] in general

#[derive(
    , Copy
    , Clone
)]
struct Foo(
    , u8
    , u8
);

Reference-level explanation

Anywhere in the grammar where a list of commas are accepted, a leading comma is also accepted.

Note in particular that this means that the following is accepted:

struct Foo { , a: T,  b: T, }  // Both leading and trailing comma.

Since the change is technically trivial, a full grammar diff is not given.

This also applies to macros in the standard library.

Drawbacks

Some people will not want to read this in the code of others

Simply put, the style introduced will not fit the taste buds of a majority of rustaceans, and they will not wish to see this style in code of other people. Many people will find the syntax noisy.

To mitigate this concern, the fmt style guide will keep recommending the current formatting.

Learnability

It will take more time to learn the language as more grammatically valid forms can now be found in the wild. However, not all grammatical forms are know by expert Rustaceans and they get by anyways. We can mitigate this by not teaching the form anywhere else than in the reference or in a very late section so that it does not have a cost for beginners.

It complicates the grammar

It compilcates the grammar and therefore probably also syn.

These complications can make rustc's parser minutely less performant since it now has to look for an optional comma where a comma separated list is accepted.

As part of the complication in supporting leading commas, source code parsers for other projects will have to be updated and so will syntax highlighters to support the new and modified grammar of Rust. However, this is nothing new and happens every time the grammar of Rust changes. Many highlighters such as the internals forum and in VSCode already handle this fine.

The benefits to macros are very tenuous at best

Given that trailing commas are already supported, there does not seem to be an improvement to macros from allowing leading commas. Leading commas may lead to an expectation that comma-separated list-taking macros should support leading commas, which can make it harder to write macros. However, this is not a must, and so libraries can opt to not support leading commas.

Rationale and alternatives

The only alternative is to not introduce leading commas.

The impact of that would be to not accomodate the punctuation style desires of a minority of people.

Clean diffs

Trailing commas lead to clean diffs, but so do leading commas as well.

Easier multi-cursor support

In many editors such as Sublime Text it is relatively easy to add a comma to the start of every line. In the editior mentioned, you only need to press <cmd> + shift + L + home (modulo the <cmd> key) to do that.

In VSCode (with the shunt extension), you can do the same with ctrl + shift + O.

It is often equally easy to multi-select select the end of every line in a selection, so this is an argument both in favor of leading and trailing commas (the latter of which we already support).

Prior art

Haskell

In Haskell, it is quite common to style things, in particular lists, with a leading comma, but Haskell does not actually accept leading commas:

exceptions =
    [ InvalidStatusCode
    , MissingContentHeader
    , InternalServerError
    ]

however, @joshtriplett said:

I’ve heard multiple experienced Haskell programmers say “this is a thing we do because we can’t use trailing commas”.

but @ExpHP replied:

For me, that is certainly how my use of leading comma began. But to be entirely honest, if Haskell were to one day suddenly begin supporting both trailing and leading commas, I would start writing what @centril wrote. Leading punctuation makes structure more obvious, because it lines up.

Other languages

In general, it seems uncommon for languages to support leading commas.

Unresolved questions

  • Have any cases been missed / forgotten in the RFC?
1 Like

I’m not a particular fan of leading commas, but, I can’t see any reason to favor allowing trailing and not allowing leading.

5 Likes

I think this will be a prevailing sentiment :wink:

I hope this will too; but I fear it might not be.

2 Likes

On this, if Rust is going to allow leading commas in lists, then, I think part of this RFC should be to specify rules for Clippy/Rustftm lints (that can be toggled on optionally) to prefer leading vs. trailing style and also to enforce the rules for leading style if that option is turned on. Ideally, the Clippy/Rustfmt lints should default to preferring/recommending the trailing style, so if you used the leading style, it would recommend you to change it to the trailing style. Also, it would be nice if the lint had a mention of the other formatting option if you wanted to change your preference. If you changed to the leading lint, then, of course, the opposite would be true.

To me, being that Clippy/Rustfmt has become such an important part of Rust, it would be nice that any additions to the language be accompanied by whatever Clippy Lints and/or Rustfmt options are appropriate to the change even if they are optional and not the default.

EDIT: To clarify why I think this is necessary: I'd hate to see a code-base/crate that mixed the two styles willy-nilly. I'd be fine looking at code in a crate that did all leading or all trailing, but, I'd hate to have every other instance be one or the other within a Crate (or at least within the same .rs file). For that reason, I think it would be best to have rustfmt/clippy support to enforce whichever style is chosen by the author. At the end of the day, I'm not that concerned that code is styled the way I prefer so much as it is styled consistently within a work.

2 Likes

I wanted to be less opinionated on style, but this is entirely reasonable :+1:. Additionally, rustfmt does not seem to have lints (other than checkstyle and diff), but only does formatting.

Does clippy have have style lints? I checked Redirecting to https://rust-lang.github.io/rust-clippy/master and it didn't seem so.

Personally also, I would not use leading commas in some situations (and not trailing either), such as in the case of #[derive], turbofish, type constructors, universal quantification sites, match (before each match arms), before each variant of an enum. But I would use them for: arrays, vec!, structs, unions, inside the fields of an enum if it is sufficiently long. Therefore, I think styling should be case-by-case, i.e: you should be able to format arrays with leading commas, but not have leading commas in turbofish.

Do you have some suggestions on naming (of lints) and on what the styles should be when leading commas are enabled?

Definitely agree here.

No, unfortunately I don't. I'd have to put some thought into it. I'd really need to check and see what the current Rustfmt options are named for formatting/styling trailing commas style.

As far as what the style should be, I think the examples you showed should be good guidance. I would, however, tend to want to prefer not having leading (or trailing) commas if it is all on one line, so, to me (opinion only, which isn't worth much)

let (, b): (, usize) = (, 2);
// equivalent to:
let (b,): (usize,) = (2,);

let (, a, b, c): (, usize, usize, usize) =
    (
    , 42
    , 1337
    , 42
    );

Would be discouraged and instead would prefer this:

let (b): (usize) = (2);
// equivalent to:
let (b): (usize) = (2);

let (a, b, c): (usize, usize, usize) =
    (
    , 42
    , 1337
    , 42
    );

Notice, no leading/trailing commas on the one-liners above.

That being said, it wold be nice if this also were optional and could be toggled as well if the author so preferred.

1 Like

Great :+1: I'll do the same.

It's an opinion I share :wink: I think it is a good rule of thumb. I think usually, you don't have a trailing comma on a single line either, so (a, b) and not (a, b,) (unless you have (a,) which you have to write that way because it is a different type than (a)).

I’ve only seen leading commas used as a workaround in places where trailing commas aren’t supported. Since Rust does support trailing commas quite well, I’m against adding an ugly hack for a non-existent problem.

7 Likes

Why is it non-existent? There are people who use leading commas and prefer it that way, and not because of a lack of trailing commas.

"Ugly hack" is also quite subjective.

A few more drawbacks:

  • looks noisy to everyone but the tiny minority
  • supporting multiple syntaxes might make parsing code more difficult, slower and more error prone for humans because of unfamiliarity
  • more lax syntax increases risk of copy-paste errors and typing mistakes to go unnoticed
  • people that are new to the language might get confused and think that this means something different, or perhaps it will annoy them so much that they lose interest in Rust
  • it will take more time to learn the language
  • there’s a greater risk that source code parsers, highlighters will fail with this unusual syntax
  • it’s easier to fail to write correct parsers for Rust which might lead to lower support for Rust syntax in editors etc.
3 Likes

I’d be fine with relaxed syntax where it makes sense from a technical perspective, e.g. keeping diffs trim or helping macro expansion, but this proposal adds nothing on that front. Indeed, it makes it harder to write macros that accept a list of things. (Remember that some core macros didn’t accept trailing commas until recently.)

Instead, it caters to the preference of a small group, with little to no prior art in other languages.

6 Likes

I think there is a lack of strong motivation for this at the moment. Also, I think a number of the motivation points are misleading:

It is in line with Rust’s philosophy to have a tolerant grammar

I think this is not an argument for adding syntax which does nothing, rather it is an argument to allow such syntax when there is a strong motivation. It's not motivation in and of itself (It's easy to come up with syntax extensions that are utterly pointless, but allow more valid programs.)

Clean diffs

No cleaner than trailing commas, surely?

Easier multi-cursor support

Again, as long as you have either trailing comma or leading comma support, this is easy. You don't gain any benefit from editors from adding both.

JavaScript

JavaScript is not prior art. JavaScript does not allow leading commas with no semantic value. An "empty array element" is treated as undefined. This is entirely different from what this pre-RFC is proposing. (JavaScript does have proper trailing commas, however.)

The main advantage to such a feature is for people who prefer leading comma style. I really can't imagine such an opinion is shared among very many people at all. If this is not advantageous to a significant group of people, the disadvantages (as mentioned by others previously in the thread) would seem to me to outweigh the advantages considerably. If you're putting this forward as a "some people prefer this style", could you provide concrete examples? (Even just a few people posting in this thread would be enough to satisfy me.)

1 Like

Assuming the macro wants to support the format. But I will elaborate on this in the drawbacks.

I think this falls under "Some people will not want to read this in the code of others"; I'll elaborate on it in the drawbacks;

Sure; That is "It complicates the grammar"; But I doubt it will have any noticable perf impact; all the compiler has to do is to check for a comma in front, eat it and then continue as before.

Elaborate on how this applies / example?

Somehow this sounds artificial to me. And people who don't use the style can't be annoyed with it; the only way they can be annoyed is if they read someone else's code; which I've already written about in the drawbacks.

It will take more time to learn all the valid forms of the grammar, yes - but you don't need to learn every strange syntactic form to make you proficient at using Rust. But I'll include this

Sure; parsers always have to be updated when the grammar of a language changes. However, as you can see from the highlighting above, the highlighter for the internals forum already handles this well, and so did VSCode. So there is a risk that some work will be required, but I'd say it is low and the work needed is also low.

For the linked RFC, it seemed sufficient motivation that a minority of people preferred this style. In fact, the quote says that there needs to be strong practical motivations not to do this (ambiguities, slower parsing (and not by a millisecond)).

Yes, as already mentioned in the motivation they have equally clean diffs.

Already mentioned.

This is also noted (the length comparison) but I will clarify this.

Myself and @ExpHP as per the quote in the Prior Art.

As far as I've seen, this is the preferred style of match statements in OCaml — even if, in Rust, this pattern made up a minority, there is clear precedence in a very similar language construct in another popular language.

This doesn't seem to be a motivation, then. You don't get cleaner diffs, or ease of use after this RFC that you wouldn't already have. So there's no advantage (just no disadvantage when it comes to these particular qualities).

Regardless of how you phrase it, I think this is misleading as it is an entirely different feature. It just looks syntactically similar — but the semantics are different, which surely makes it irrelevant.

Maybe it would be a good idea to hold a poll or something to see which style people prefer. I'd feel more comfortable knowing there was a reasonable proportion that preferred this. (Leading | in match has strong precedence in languages like OCaml, but I haven't seen a similar effect for leading commas in other similar languages).

I've updated the pre-RFC now to address your comments as nuanced as I could.

Fair enough; I moved those parts to the rationale.

Removed the section accordingly.

Linking what I said last time: Grammar liberalizations - #6 by scottmcm

I see no good reason to do this (in contrast to the vert change), as this is a ton of churn and doesn't actually make any scenarios easier over just using what's already supported.

(Also, if this is accepted, I don't think it's just about commas, and should also allow trailing verts, leading +, etc as well.)

Not really; we just allow ; as short form for (); --- ;bar() ;bar() is different from bar(); bar(); in general.

1 Like

@varkor And here’s the poll you requested =)

  • I would use leading commas.
  • I would not use leading commas.
  • I would not use leading commas, but I don’t mind it either.

0 voters

If we were doing this, wouldn’t it be strictly better to support “•” (Unicode 2022) as the leading separator, since that’s the correct symbol to use for an unordered list, instead of incorrectly abusing the comma?

Or no punctuation at all, which is also a reasonable style.

1 Like

I would do that but I don't have that symbol on my keyboard :wink:

Not just some - most probably. And for what reason? The burden of motivation should be on the part introducing new syntax not the other way round.

I'm not talking about compilers but human readers. I experienced great difficulties when trying to read your examples. The spacings, alignments and groupings all seemed very off to what I'm used to. Not only did it take more time to read this for me, I'd easily make mistakes or fail to catch mistakes because of the unfamiliarity.

If you make a copy-paste error, for example fail to copy an element but only its comma this would still be valid. You might also plan to change the first item in a list, erase it but forget to write its replacement. The acceptance of the leading comma will not signify that an error was made.

That's not true at all. Code on forums, in blog posts, in libraries, on github... It's not possible to run rustfmt on all of that. You would have to deal with it. It puts an unnecessary burden on those who are just trying to learn or understand. It adds more friction on forums and in teams working together for no increase in expressiveness of the language. There are reasons standards exist after all.

You need to learn this to read Rust code. The language is already huge and takes a lot of time to learn. Opening the flood gates by accepting all kinds of creative styles is hardly a good idea, other than if the goal is to make Rust look like an unrestricted art spectacle.

Yes, but this means more work for a lot of people. Each of them will have to implement this unusual syntax. We're not just talking about VSCode. Every single implementer that aspires to support Rust will have to deal with this oddity. Mistakes will be embarrassing. Apart from highlighters these mistakes might have more serious consequences.

In conclusion, energy, money and time will be wasted that could otherwise be used to solve real problems.

3 Likes