Macro Keyword

I was so exciting about Rust procedure macros https://blog.rust-lang.org/2018/12/21/Procedural-Macros-in-Rust-2018.html ...

#[proc_macro_attribute]
pub fn hello(attr: TokenStream, item: TokenStream) -> TokenStream {
    let input = syn::parse_macro_input!(item as syn::ItemFn);
    let name = &input.ident;
    let abi = &input.abi;
    // ...
}

Then I thought ... what if to extend it with procedure macro keyword feature ?!

Consider the following example:

#[proc_macro_keyword]
pub fn strange_struct(attr: TokenStream, item: TokenStream) -> impl Iterator {
    // ...
}

then it could be used in programmer code like this:

strange_struct! HelloStruct {
   k: u32,
   l: u8,
}

or even the following way:

javascript! {
   function do_something() {

   }
}

where javascript! is procedure keyword macro like this:

#[proc_macro_keyword]
pub fn javascript(attr: TokenStream, item: CharStream) -> impl Iterator {
    // ...
}

In such way it is possible to embed code or snippets from other languages and simplify library usage across languages !!

It sounds like you're asking for "function-like procedural macros", which have been stable for a while: https://doc.rust-lang.org/reference/procedural-macros.html#function-like-procedural-macros

1 Like

No, function-like procedure macros works with Rust Token Lexer ... What I am asking is that to create macros that could handle invalid Rust characters (even that does not exist in Lexer at that moment) until it completes, then Rust handle its own token again

Iterator will notify when keyword macro procedure is completed !! Rust just need to provide to it the character stream and back it will take valid Rust token, but inside could be broken code at all !!

This won't currently work because after a macro invocation there can only be a single token tree (i.e. stuff in parentheses). You can have it like

strange_struct! {
    HelloStruct {
        k: u32,
        l: u8,
    }
}

If you don't like the extra indentation that's what attribute-macros are for ^^

Please, check the answer that I gave to lxrec

Current Rust proc_macro works with valid Token in Rust system, but with this proposal it would be possible to work event with broken Lexer symbols

All of the examples in your first post seem like they'd be fine with the Rust lexer, modulo that extra pair of braces.

Going outside the Rust lexer is fundamentally impossible because the macro invocation is still in a Rust source file. If Rust can't lex and parse the macro invocations in Rust source files, it can't even tell where the macro invocations begin and end, so there's no way to do the rest of the things macros are supposed to do or even recognize the rest of the code in the file. It's pretty much a logical contradiction.

3 Likes

What proposal are you talking about? There's nothing in your other posts that suggests this was even intended, much less describes how you think it can be made possible.

Those example are valid syntax, but consider broken one, for example:

strange_broken_language! {
 "!" Some strange string for that strange language "!"
}

That works with today's macro system.

I'm going to suggest you take some extra time to become more familiar with Rust and its features before you start suggesting massive breaking changes.

5 Likes

As I've shown in example:

#[proc_macro_keyword]
pub fn some_strange_language(attr: TokenStream, item: CharStream) -> impl Iterator {
    // ...
}

Where: attr is TokenStream of attributes that is valid Rust before this keyword item is stream of characters, because lexical analyze for that language could be different than in Rust

This would mean that no tool other than cargo (or rustc with the right command line arguments) would be able to parse Rust sourcefiles anymore. Because for parsing Rust you would then need to first compile the dependencies to find and execute the macro definition that takes over the role of the parser.

Take rustfmt for example. It wouldn't be able to tell where a macro invocation ends anymore.

3 Likes

For stuff like this, you simply have to put the strange language in a string literal.

Thanks to raw string literals, you should be able to put any language whatsoever into one no matter how bizarre it is (as long as you can come up with a single delimiter string that won't occur in the other language, which is presumably true of anything you could plausibly call a "language" and fundamentally required for this to parse).

This only shows the macro implementation side. The "you fundamentally can't lex that" problem is about the source file invoking the macro.


Please, take more time to learn about Rust and programming languages in general before suggesting huge and vague changes like this. A lot of our responses in a lot of these threads are simply telling you what a few minutes of googling could've easily uncovered.

4 Likes

Yes, you are exactly right !! When Rust will detect macro keyword then it will try to find the proc_macro and then compile involved crate, then crate with this macro will compile tokens and then Rust procede handling remaining part of the file

This is the price that we should pay for embedding in Rust any language that was produced or ever will be produced !!

Why do you want to embed languages in Rust that even raw string literals can't handle? Are there any such languages? Is there a single actual use case for this?

Note that this isn't just implementation effort. "Stopping the world" at every macro invocation would be hellish for build times.

1 Like

You seem to have skipped this part:

You would also break every rust syntax highlighter.

I would've suggested this next, too:

To elaborate, you would just need to use it like this:

strange_broken_language! {r#"
 "!" Some strange string for that strange language "!"
"#}
1 Like

Of course not. Everything can go inside raw string literals. You just need to start them with r###…#" where the number of # is at least 1 more than the largest number of # for which "###…# appears inside the string.

3 Likes

Agh! I'm sorry but I think the replies to this thread are really disappointing, and consistent with a really frustrating trend I've seen in IRLO. Someone posts a reasonable proposal - unpolished, needing work, and not expressed with expertise, but still reasonable - and the response they get really isn't trying to work with them to narrow down and explore the space of possiblity. Instead it's exactly what I see here - "that's impossible & you don't know enough." There's not much point to this forum if this is the way people posting new ideas are going to be treated. These responses are gatekeeping, and I find their tone condescending and passive-aggressive as well.

I don't think we can let the macro parse arbitrary tokens, but for example it would be totally possible that we have a macro form which takes an identifer and a tokentree, instead of just a tokentree, so you can write macros for creating things that look a lot like items, as in weird_struct! Foo { }. In fact, the original syntax extensions included a form like this I believe.

A good response from the community would've made this thread an exploration of the other syntactic forms proc macros could be allowed to take, the pros and cons of each of those from an implementation and tooling perspective, the value of the use cases they represent, etc.

EDIT: It's also telling that there have been 17 posts in this thread in one hour! I've seen this a lot in IRLO recently too. People in this community should take a lot more time to think about the ideas being proposed and how they could be constructively critiqued before immediately falling into a back and forth like what's happening here. It's a waste of everyone's time.

30 Likes

You are completely right !! The original idea was just the new keyword (that looks like Rust syntax keyword, but due to ! just macro) that could add new functionality either without updating compiler each time or due to specific domain area

gc_object! GcStruct {
   k: u32,
   l: u8,
}

gc_object! will create object that can Garbage Collected and also add other functionality with custom keywords ...

2 Likes
This collapsed part is off-topic, as is a huge part of the post I'm responding to!

I don't think this is entirely fair. At least speaking for myself I am just trying to help OP understand the shortcomings of their concrete proposal (letting other tokenizers take over) and presenting what's currently already possible. This does not mean there is no need for anything else.

Your interpretation of OP's idea seems like an entirely different proposal to me. OP was explicitly not only asking about more rust-like styles of macro-invocation like weird_struct! Foo { } where the parentheses only contain rust-tokens again. This only became clear because of additional questions answered by OP. Without questioning him/her, it would be impossible to really understand their point.

If there are problematic individual responses, point them our, flag them but don't blame the whole thread.

Exploring other possibilities that are useful must start close to what already is possible. If we want to use macros for non-Rust-like languages using some_macro!{r#" ... "#} is currently lexically possible but not really nice to read. Especially the parantheses seem superflous, I would like at least some_macro!r#" ... "# to work.

Going further, one could think of syntax analogous to raw strings with a delimiter, compare to other languages and discuss ideas like parsing multiple token trees on macro-invocation in item-position until the first semicolon or braced token-tree to get more Rust-keyword-like behavior.

Speaking of other languages, for example Haskell has so-called Quasiquotes which look like [some_ident| ... |] where ... cannot contain any |] and where some_ident is some(thing like a) macro definition that gets passed the ... verbatim as a stream of characters.

7 Likes

For some context:

macro_rules! was originally "just" another macro (with compiler state altering powers), and the macro syntax allowed for $path! $ident $block; it's just that macro_rules! was the only macro with said syntax, and there was no way to define another "item like macro".

Between then and now, macro_rules! was made into its own syntactical form, and regular bang macros that aren't macro_rules! are required to take on the functionlike!() format.

I could potentially see "item like" macros returning (bitflags! for example could be more natural with it than wrapping a struct item), but we'd have to be careful not to regress error messages for existing functionlike macros. Note that when parsing $path!, there's no way of knowing whether that macro is "item like" or "function like".


@redradist

It certainly is an interesting idea to make parsing dependent on "syntax extension" macros. However, this is ill advisable for one main reason: this makes parsing undecidable.

Currently, Rust code can be fully parsed without having to execute any user code. Furthermore, it can be parsed in basically linear time with fixed lookahead. If parsing requires running arbitrary code, parsing becomes undecidable, and it's impossible to even say if it's possible to determine if a file is valid Rust (let alone if it is).

Even furthermore, consider the following:

custom_keyword! /* inputs */

use library::custom_keyword;

Here it's impossible to resolve custom_keyword without first parsing it to determine how much input it consumes, which may or may not include consuming the import for the keyword.


As such, arbitrary input without brackets is unfeasible. However, so long as your macro input has matched brackets ((), {}, and []), I think it potentially should be possible to do. (It's not yet, today). It's probably worth pursuing a TokenStream API that is fully lossless and can represent any sequence of tokens, error or not, to be processed by proc macros, so long as it has matched brackets.

The important part to realize and make provisions for in any extension to the macro system is that the resolution of the macro may not yet be known when parsing an invocation of the macro.

† Yes, raw strings are context sensitive, but only "slightly" context sensitive in that they can still easily be parsed linearly with only a tiny amount of state. Fixed lookahead here refers to the syntactical grammar consuming tokens, which is roughly LL(3) (modulo some error handling). Disclaimer: none of this is formal.

2 Likes