RFC: Raw, Non-Nesting Comment Syntax

Block comment syntax is currently not omnipotent. There are two character sequences that can't be put into a comment: /* and */. The former starts a (surprising) nested comment. The latter of course terminates the comment. But both can have other meanings one may want to comment about, e.g. a Shell wildcard and other examples in the 2nd answer.

Illegal syntax can pop up, when wrapping a block comment around valid code containing /* or */ in a string or line end comment or in a procedural macro, with some totally different syntax (see cron! below).

To remedy this I suggest not touching existing comment syntax, but an additional raw, non-nesting comment syntax. As with raw strings, the number of hashes has to be at least one more than any apparent contained comment end. So each of these lines are a single comment:

 r#/* …   */  …   /* …   r#/* …   */#
r##/* …   */# …   /* …  r##/* …   */##

Being a syntax error, r#/* doesn't seem to collide with anything currently possible. Whereas r/* could be r followed by a comment, so at least one hash must be mandatory, like it is for raw identifiers.

This would work for all three comment types, i.e. also r#/** and r#/*!.

They are non-nesting in the sence that there is no magic as with /* /* nested */ */. Only a close with as many hashes ends the raw coment r#/* r#/* not nested */#. That means with enough hashes you can put anything inside, including other classic and raw comments.

If we want these to nest within classical block comments, the 1st line would be one comment, not seeing the closing within the raw comment. But such a complication is not necessary, as any raw comment can instead be embedded in another with more hashes (2nd line.)

   /* …   r#/* …   */ …   */# …   */
r##/* …   r#/* …   */ …   */# …   */##

Two scenarios hint at accidental /* or */ as comment content:

  • a runaway comment containing a nested comment
  • an orphaned comment end after a block comment, except in a macro that allows it (see cron! answer)

The compiler should suggest raw syntax, when these happen.

1 Like

No, it would be r followed by the start of a block comment, but the conclusion should still be the same, anyways.

Indeed, r# followed by whitespace (or comment) currently results in an error, so it isn’t used.

I’d be curious about the use cases here. I can imagine, there are already possible workarounds, so we’d need to compare against those. At least for an ordinary comment, you could just open multiple levels of block comment to cancel out occurrences of */ inside, or close multiple to cancel out /* inside.

For doc comments, I don’t know if there’s any form of escaping that generally works (e.g. including in code blocks), so maybe that’s a valid thing to pose as a use-case; either way, some concrete examples might be useful to prove the point :wink:

4 Likes

Thanks for pointing out that r/* is a comment after an identifier. Fixed in 1st text.

I actually got bitten by a Shell command. Not when I typed it, which might have made me understand. But much later when I wrapped code with that in the middle into a long comment. A comment that was puzzlingly impossible to close. I had never heard of nested comments, so I thought it's a compiler bug.

Some more cases, 2 Rusty and more that might be in a comment or string:

  • Regex for 0-n slashes or any character 0-n times followed by a slash

    Regex::new(r"/*|.*/")
    
  • Cron entry for every 5 minutes

    Cron::new("*/5 * * * *")
    
  • The zmq library has this syntax

    socket.bind("tcp://*:3333")
    
  • As per RFC 3986 * is a sub-delim (like &) in URLs and thus not to be url encoded. Sometimes just used randomly:

    web.archive.org/web/*/https://internals.rust-lang.org
    
  • Wildcard MIME type and HTTP header

    Accept: text/*, text/plain, text/plain;format=flowed, */*
    
  • Markdown **/** for bold slash /

  • Not to forget ASCII art *\o/*

2 Likes

I agree this is surprising.

If you select a line with a string containing "/*" and hit the shortcut to make a block comment from it in vscode w/rust-analyzer, you'll comment the rest of the file and get a syntax error.

1 Like

Non-nestable comments are inconvenient because you can’t easily comment out stuff that already contains comments. This can of course be mitigated by following a convention that explanatory/doc comments use line comments and temporarily commenting out code uses block comments.

Edit: Nevermind, I actually misunderstood you a bit.

Unless the inner comments have a different syntax than the outer comments (by using the # syntax)

1 Like

Non-nestable comments are inconvenient because you can’t easily comment out stuff that already contains comments.

Thanks for pointing out that my RFC is not clear. They are non-nesting magically. They are of course nesting just by adding one hash more than anything contained. Will amend it.

1 Like

I believe OCaml goes even further than Rust and actually interprets quotes inside comments with respect to comment nesting, thus fixing the cases you listed, but at the same time requiring that quotes are properly balanced.

2 Likes

Could you give an example?

// rm -fr /*

(real command not relevant here) becomes invalid by wrapping it (and probably many surrounding lines, so you don't notice what hit you)

/* …
// rm -fr /*
… */

In the 2nd answer above I also gave many other examples, where the same would happen.

Anyone who actually looks at the Rust grammar won't be even slightly surprised.

Right! Anyone who learned 100% of the medium complex grammar would have known. Too bad I started using Rust at maybe 70% :man_facepalming:

But it's beside the point. Nested comments have their pros and cons. Here I'm proposing a syntax in the spirit of Rust, that allows putting anything into block comments, which just isn't possible today. And as a benefit, I'm proposing that the compiler spot when people shoot themselves in the foot. The new syntax is a solution it could propose, where none really exists until now.

nesting comments is a behavior that I actually find useful and use regularly, and so do not want to lose.

You're not losing anything with this proposal. It doesn't touch existing syntax, so you can nest to your heart's delight. It's a new more powerful comment syntax that can comment anything (including nested comments.)

4 Likes

This example suggests to me that perhaps line comments should nest inside block comments. That is, in this case the inner /* would not start a nested block comment, because it's still considered to be inside a line comment, which is itself inside the outer block comment.

6 Likes

I'd be interested in seeing a crater run to this effect.

2 Likes

This example suggests to me that perhaps line comments should nest inside block comments.

You're opening up a rat's nest of possibilities. Then you'd have to have comments conform to essentially full Rust syntax, because it might be /*, */ or // in a string, or in a procedural macro for some sub-language, where it's an operator (// is in Perl.)

Even if we went that way, it would horribly limit what you can put inside a comment. Not being able to write the comment you may want to, is what we already slightly have, by /* and */ being special and unmaskable. The more magic we add inside of comments, the more limiting it gets.

I'm not proposing something really new here. This is just how raw strings can turn off magic in strings. And they offer a super delimiter so you can put normal delimiters inside without a fuss.

1 Like

I'd be interested in seeing a crater run to this effect.

This should of course be done before stabilsing the new syntax. But seeing as r#/ currently gives "Syntax Error: Invalid raw string literal" it's unlikely to break something.

Not quite such a huge deal… all you’ll need to actually handle in order to support “full Rust syntax” is Rust comments and tokens. (Not even token trees, as parentheses don’t need to be bracketed.) This amounts to supporting nested block comments (already supported), nested line comments (the thing being said here), and additionally nested string literals [including raw string literals, and byte string literals] (the only other kind of Rust tokens that could contain /* or */).

I sometimes do insert a line like this

// */ // */ // */ // */ // */

in my code during development so I can comment out blocks up to that point easily without having to adjust the closing comment. Maybe someone else has done so in a crate before :slight_smile:

fn some_code() {}

fn more_code() {}

fn many_function() {}

fn wow_so_much() {}

// */ // */ // */ // */ // */

fn not_commented_out() {}
fn some_code() {}

fn more_code() {}
/*
fn many_function() {}

fn wow_so_much() {}

// */ // */ // */ // */ // */

fn not_commented_out() {}
/*
fn some_code() {}

fn more_code() {}
/*
fn many_function() {}

fn wow_so_much() {}

// */ // */ // */ // */ // */

fn not_commented_out() {}
/*
fn some_code() {}

fn more_code() {}
/*
fn many_function() {}
/*
fn wow_so_much() {}

// */ // */ // */ // */ // */

fn not_commented_out() {}
fn some_code() {}

fn more_code() {}
/*
fn many_function() {}
/*
fn wow_so_much() {}

// */ // */ // */ // */ // */

fn not_commented_out() {}
4 Likes

all you’ll need to actually handle in order to support “full Rust syntax” is Rust comments and tokens.

I'm not sure how free procedural macros are, but if this is possible, it must also be possible to nest it in a block comment:

cron!(
*/5 * * * * object_with_cron_callback_method
);
1 Like

It’s not possible. Proc macro calls are processed like normal Rust code, up to tokenization into token trees, i.e. normal comment syntax, Rust tokens, and pairing up of brackets ((), [], {}) applies to their inputs and outputs.

Edit: Actually… I’m not sure how */ is handled outside of comments… one moment…

Edit2: Today I learned that */ outside of block comments is just * followed by /, and this compiles fine:

macro_rules! cron {
    ($($t:tt)*) => {}
}

cron!(
*/5 * * * * object_with_cron_callback_method
);

Well that kills any minimally invasive way of supporting arbitrary Rust code, without e.g. also going as far as starting to enforce proper nesting of parentheses.

3 Likes

It doesn't even need new sub-languages. Can happen in a macro definition too:

macro_rules! slash {
    ($($t:ident)*/) => {}
}

slash!(a b/);

In several editors you can select the code block and hit ctrl+/ (or some other shortcut) to insert // before every line, which does not have these issues. If all selected lines are comments then it also undoes it. So that makes it easy to toggle some blocks or subsections thereof.

2 Likes

(Not really tracking this thread so far but) I too sometimes line-comment out the beginnings and endings of block comments to make them kinda elastic.

1 Like