Unrestricted_attribute_tokens feature status


#1

(I sort of answered my own question while writing this up, consider this a pre-pre-rfc train of thought.)

Tracking issue: rust#44690

Reference: https://doc.rust-lang.org/reference/attributes.html (out of date)

With rfc#2539 merged, we have I’ve seen a semi-formal definition of what the current meta syntax is (pest syntax):

attribute = { "#" ~ "[" ~ meta ~ "]"  }
meta = { meta_kv | meta_list | meta_word }
meta_word = { Ident }
// Ident := identifier
meta_list = { meta_word ~ "(" ~ TokenStream ~ ")" }
// TokenStream := sequence of tokens where delimiters match
meta_kv = { meta_word ~ "=" ~ TokenTree }
// TokenTree := identifer | literal | punctuation mark | delimited TokenStream

This is the #![feature(unrestricted_attribute_tokens)] grammar, not the currently stable one. Currently stable is closer to:

attribute = { "#" ~ "[" ~ meta ~ "]"  }
meta = { meta_kv | meta_list | meta_word }
meta_word = { Path }
// Path := non generic path
meta_list = { meta_word ~ "(" ~ CommaSeparatedList(meta)? ~ ")" }
meta_kv = { meta_word ~ "=" ~ Literal }
// Literal := (raw) (byte) string literal | (byte) char literal | unsuffixed number literal

(Yes, #[namespaced(hey::jude = 1968)] is valid. This is likely due to (a side effect of?) scoped lints.)


The tracking issue is focused on attributes for tools (e.g. #[rustfmt::skip]), so the remaining ungating of attribute syntax locked behind unrestricted_attribute_tokens probably deserves to be pulled out into a new tracking issue.

Recently when writing derives I’ve been running into a large number of cases where #[namespace(key = path::to::thing)] would be very useful. I personally would even prefer #[namespace(key = (path::to::thing))] over the current #[namespace(key = "path::to::thing") or #[namespace(key(path::to::thing))].

Maintaining a consistent syntax for attributes even in the face of attribute proc macros and derives is probably desirable. Has there been an RFC for the desired end-state of attribute syntax? If there hasn’t been, I could and would be interested in writing an RFC for committing attributes to the following syntax:

attribute = { "#" ~ "[" ~ meta ~ "]" }
meta = { meta_kv | meta_list | meta_word }
meta_kv = { meta_word ~ "=" ~ (LiteralExpression | PathExpression) }
// *Expression := <https://doc.rust-lang.org/reference/expressions.html>
meta_list = { meta_word ~ "(" ~ CommaSeparatedList(meta)? ~ ")" }
meta_word = { Path }
// Path := non generic path

(This is… actually a very small expansion I now realize.) The intent of restricting the expressions allowed in “value” position is to prevent inviting writing code within attributes. Ideally we’d allow “small snippets” of code to be written, but still disincentivize writing large chunks of code.

At the very least, allowing (non-generic) paths in the value position of key-value pairs should be fairly uncontroversial and benefit existing macros. They can be written with () instead of = anyway, and existing derives use = "path".

A full list of code-in-attribute-string-literal used by serde:

  • #[serde(bound(serialize = "T: MyTrait"))] – a full where clause
    • requires ,, so needs to be bracketed somehow
  • #[serde(default = "path")] – a path

Side note observation: #[why = ,] is valid with #![feature(unrestricted_attribute_tokens)], which should probably not be allowed, because it’s useless and #[a = ,, b = ,, c = ,,] just looks silly. The feature gate also currently does not effect the $:meta macro matcher, so that would need to be fixed before stabilization.


#2

Thanks for rising this issue.
unrestricted_attribute_tokens is one of those “technical” feature gates that were introduced during stabilization of macros 1.2 for future compatibility, but they certainly need a path to stabilization and a tracking issue.

This post catalogs the issues with unrestricted_attribute_tokens well, but there’s a couple more things on the implementation side (underimplemented and incorrectly implemented).

I’ll expand a bit later, but I think we could stabilize the delimited forms

PATH `(` TOKEN_STREAM `)`
PATH `[` TOKEN_STREAM `]`
PATH `{` TOKEN_STREAM `}`

for non-builtin attributes right now without any issues.


#3

Tracking issue: https://github.com/rust-lang/rust/issues/55208


There are three important aspects with regards to tokens in attributes:

  • Who parses and interprets the attribute tokens?
  • What tooling is used for parsing attribute tokens?
  • Whether we care about this kind of attributes or not?

Let’s classify our attributes using these aspects:

  • Proc macro attributes. Tokens are interpreted by proc macro authors, using proc macro API and higher level libraries on top of it. These attributes are stable and we care about them.

  • Derive helper attributes. Tokens are interpreted by proc macro authors, using proc macro API and higher level libraries on top of it. These attributes are stable and we care about them.

  • Tool attributes. Tokens are interpreted by tools, using whatever, probably libsyntax/MetaItem APIs right now. These attributes are stable and we kinda care about them, but it’s mostly tool’s responsibility to parse these attributes correctly and define what that “correctly” means, and tools also have their own compatibility policies with regards to accepted attribute syntax.

  • Built-in attributes. Tokens are interpreted by the compiler, using libsyntax/MetaItem APIs. These attributes are stable and we care about them.

  • Legacy proc macro attributes. Tokens are interpreted by proc macro authors, using libsyntax/MetaItem APIs. These attributes are unstable, deprecated and we don’t care about them.

  • Legacy proc macro helper attributes. Tokens are interpreted by proc macro authors, using libsyntax/MetaItem APIs. These attributes are unstable, deprecated and we don’t care about them.

  • Custom attributes (feature(custom_attribute)), these are basically legacy proc macro helper attributes that are not whitelisted in any way. Tokens are interpreted by proc macro authors, using libsyntax/MetaItem APIs or perhaps proc macro API. These attributes are unstable, deprecated and we don’t care about them.

So, we see that the non-builtin non-macro attributes we care about are interpreted by macro authors using token streams and proc macro API.
This means that they are already prepared to deal with anything that can appear in macro attributes, and also that any compiler’s internal issues with libsyntax/MetaItem APIs won’t affect them.
Basically, for non-builtin attributes we can immediately extend the syntax to what proc macro attributes accept, namely:

PATH
PATH `(` TOKEN_STREAM `)`
PATH `[` TOKEN_STREAM `]`
PATH `{` TOKEN_STREAM `}`

with every inert macro-helper or tool-helper attribute defining its own little grammar restricted only by the macro or tool author’s imagination, similarly to macro attributes. (Legacy unstable stuff will be affected as well, but we don’t care about it.)

This leaves us with two issues - built-in attributes and key-value attributes #[PATH = SOMETHING].


Built-in attributes are in very sad state.

First, the MetaItem API is both outdated and buggy, and not prepared to deal with any tokens beyond meta-items in their “classic” Rust 1.0 flavor, this leads to issues like https://github.com/rust-lang/rust/issues/55168. Second, even if all footguns of the MetaItem API are avoided, most of built-in attributes are simply not validated in any way beyond checking their name!
So, accepting more nonsensical tokens in addition to already accepted nonsensical meta-items is not something we should strife for.

I suggest to just issue an error for built-in attributes with non-meta-item tokens instead of feature gate, none of these attributes should accept those tokens anyway.


Now, regarding #[PATH = SOMETHING], this is an open question.
I think we do need an RFC to discuss what we want from them.

What we strictly need for backward compatibility with stable code is 1) literals and 2) NtExpr for stuff like #[doc = $my_doc_str] in macros, I think that’s all, but we can check with crater.

We can certainly accept #[PATH = IDENT], perhaps we may want to extend the value syntax to arbitrary expressions #[PATH = EXPR], or arbitrary types #[PATH = TYPE], some people also mentioned macro invocatios #[PATH = my_macro!(tokens)].
On the other hand, all this is already available with delimited attributes - #[PATH(IDENT)], #[PATH(EXPR)], #[PATH(TYPE)], #[PATH(my_macro!(tokens))], so perhaps we don’t need to extend key-value attributes at all?
I honestly don’t know.


@CAD97
In the near future I’m going to submit a PR relaxing the rules for non-builtin attributes and restricting the rules for built-in and key-value attributes, as described above.
After that you’ll be able to proceed with an RFC deciding the fate for key-value attributes.


#4

Oh, one more thing, the meta matcher needs to be updated to accept the new attribute syntax.
I thinks it’s a bug that it doesn’t accept it, the “meta item” syntax currently used for meta and for a lot of other internal stuff in rustc is no longer a part of the language really.
But that’s an orthogonal issue.