[Idea or pre-RFC] Arbitrary Token Stream Region

Motivation

Macros are powerful tools. However currently they have a fundamental limit: The input code if the macro must be (assumingly) correctly parsed first.

By introducing "arbitrary token stream region"s, programmer can mark regions of tokens solely interpreted and used by macros. Note that comments will not be recognized by lexer within the zone.

Guide-level explanation

arbitrary token stream expression region

They’re introduced by r#( ... ) regions. You can add all kinds of tokens between the parenthesis, the only limit is that: all parenthesis within the zone must be properly paired.

arbitrary token stream item region

They’re introduced by r#{ ... } regions. You can add all kinds of tokens between the braces, the only limit is that: all braces within the zone must be properly paired.

diagnostics

If such a zone did not get replaced by a macro, the compiler will emit an error.

changes to proc-macro apis/syn crate

TBD

Example #1

See below.

Could a macro not interpret the contents of an include_bytes!() or a literal? The interpreted input would then likely not be located in the same file (or be an awfully formatting b"" string) but that would encourage separating files by syntax of their content which I would regard as a feature.

2 Likes

That’s already possible, you can just define a proc macro that takes a a file path as input. The proc macro loads the file and does with it whatever it wants to.

It is not at all clear to me what is being suggested here and much is left as guesswork for a reader of this thread. Please try to incorporate the idea into some example of how it may be used in its proper context.

3 Likes

I agree you should be more precise on what you want to do with this feature and provide examples.

If I guess right, you want to accept code that does not match the Rust parsing rules. It might be useful to insert other languages in rust code. If I had to accept any code, I think this syntax would be even more useful :

my_macro!#{
    code that does not have any restriction. 
    not even matching } or )
}#

But if you accept code the rust compiler can not parse, I’m not sure it can be treated as a TokenStream. I think it would rise a lot of problems (with hygiene particularly).

1 Like

What is the advantage of a non-Rust-parsable token stream over a string?

What if we just made the Rust lexer available as a crate that you could invoke on a string?

2 Likes

Yes, i admit that i should make an example first, let me try to make one now. The problem the example itself exhibits is not expected to be discussed in this thread, only showing this as a potential useful mechanism. So this functionality is kind-of meta.

Example #1 Delegation

Imagine i want to help myself to write a macro to do some method forwarding(not to change Rust the language, but only to meet my own needs in my own little project), i’ve chosen the following syntax:

struct Button;
impl Button {
    fn set_text(&mut self, text: &str) {}
    fn set_state(&mut self, state: ButtonState) {}
}

struct ImageButton(Button, Image);

#[delegation]
impl ImageButton r#{
    delegate self => self.0 {
         fn set_text(&mut self, text: &str);
         fn set_state(&mut self, state: ButtonState);
    }
    fn set_image(&mut self, image:Image) {} 
}

So under the current design, this code needs to be correct parsed first before the delegation macro can process it.(I don’t know whether it will parse, let’s assume it won’t. And even if it can be parsed now, maybe it won’t be properly parsed in Rust 1.45, who knows)

Without this feature, i’ll have to modify the syntax somehow to make it more like ordinary rust, by using more attributes and reduce keyword-like structure and usages.

However, using this feature, as the r# usage within the example, the parser will magically pack whatever between these braces together, and send them to the macro. The delegation macro will reparse and generate whatever code it like to implement this functionality.

For the surface syntax i think strings might give others wrong appearance and hints about the text within it (it seems it’s data!) And “bare” tokens with braces gives the impression taht maybe it’s actually some nonstandard-extensions to the language. And IDEs can provide some basic support here.

Rust macros are already overpowered as it is. I cannot think of a situation where readability of a proc macro is improved by being able to process something that doesn’t lex as Rust.

1 Like

I’d much rather see a delimiter that gets treated as a string, without the presumption of lexing. IDEs would give counterproductive help if they assume the contents should look like valid Rust.

Any change to not require matching brackets of all three types would require major changes to the Rust lexer.

Function-like macros can already contain any syntax that Rust can lex.

Attribute macros specifically take valid Rust item syntax as input because they don’t wrap their contents, and it needs to be clear both to the compiler and the human what the scope of the macro is.

The ability to lex external files and get proper spans for it would be invaluable for macros.

For your specific use case, it might be more productive to argue towards functionlike macros in more positions, for syntax like:

#[delegation]
impl ImageButton {
    delegate! { self => self.0;
         fn set_text(&mut self, text: &str);
         fn set_state(&mut self, state: ButtonState);
    }
    fn set_image(&mut self, image:Image) {} 
}

Even better, and potentially valid today (didn’t check) (but definitely close):

#[delegation]
impl ImageButton {
    #[delegate(self => self.0)]
    fn set_text(&mut self, text: &str);
    #[delegate(self => self.0)]
    fn set_state(&mut self, state: ButtonState);
    fn set_image(&mut self, image:Image) {} 
}
3 Likes