What would it take to create a "tokenize!" macro?

B.Will.C · September 22, 2020, 1:45pm

To convert a series of tokens into a string, there's a stringify! macro.

To convert a string into a series of tokens, there's nothing.

An example of a situation where it might be useful is on the users' forum.

How hard would it be to create it, and where would it make most sense to start?

steffahn · September 22, 2020, 1:53pm

I'm not too much into macros but I suspect that something like this can be achieved in a crate with a procedural macro.

mjbshaw · September 22, 2020, 1:57pm

You can do this in a proc macro via syn (it's a bit of a heavyweight dependency for something like this but oh well). You should be able to do syn::parse_str<TokenStream>(some_string) to turn the string into a TokenStream (or some other type if you want, including a custom type), which you can then manipulate.

B.Will.C · September 22, 2020, 2:05pm

I've just come across the proc_macro crate, documented here, with an example:

extern crate proc_macro;
use proc_macro::TokenStream;

#[proc_macro]
pub fn make_answer(_item: TokenStream) -> TokenStream {
    "fn answer() -> u32 { 42 }".parse().unwrap()
}

Would this work as well, if instead of the string provided we used the stringify! macro?

CAD97 · September 22, 2020, 2:54pm

I think you want quote!?

mjbshaw · September 22, 2020, 3:11pm

Why would you stringify! something, only to re-parse it back into tokens?

I think this has started venturing back into URLO territory. stringify! primarily exists in Rust's standard library because there wasn't any other way to stringify tokens back in Rust 1.0. But now that we have proc macros it's possible for users to implement their own version of stringify, including the reverse operation. Since it's already possible for a user to implement tokenize! as a proc macro, I think people would want to see more justification for the existence of a tokenize! macro in Rust's standard library. IRLO is a good place to discuss justifications for such a macro in std, and URLO is a good place to discuss ways to implement parsing/tokenizing/stringifying.

B.Will.C · September 22, 2020, 3:15pm

Procedural macros require to create separate crates, totally segregated from the rest of the code, in order to perform a seemingly trivial operation. I'm not familiar with the history of stringify!, but I know that it can be used anywhere - which can't be said about procedurals. Thus - the suggestion of tokenize!

mjbshaw · September 22, 2020, 3:23pm

Procedural macros are annoying to write. I agree that the requirement to split things into separate crates is far from ideal.

That said, I think the right way to work this out is to fix proc macros so they don't require separate crates, rather than introducing new macros into std. I can't find the discussion, but there have been some ideas in the past about how to fix proc macros so they're easier to write without overcomplicating the compilation process (it was something like putting proc macros in their own submodule without requiring them to be in their own crate).

I personally think tokenize! isn't quite useful enough for enough people to be in the std library. But I'm just some random user, so my opinions don't mean much.

B.Will.C · September 22, 2020, 3:33pm

That could work as well - although I don't see how introducing a new macro to std is going to hurt anyone. We're not changing any of the existing code, nor introducing any breaking changes. Just another feature to make lives of developers, who need to interact with macros and meta-programming features of Rust on a daily basis, that much easier.

mjbshaw · September 22, 2020, 3:51pm

There is a general goal to make std lean and push users to use external crates. The concern isn't that this would break anyone, but rather that it would be yet another thing added to std. Of course the ultimate decision is up to the libs team, but usually they ask for justification for any new items in std that show it's such a useful feature that it needs to be in std instead of an external crate. Usually things start out as a user crate, where they're iterated on and gain traction amongst users. Once the design has been proven, and the usage is high enough that it shows many people want/need this, then it is considered for adoption in std.

B.Will.C · September 22, 2020, 3:54pm

That makes sense, and it brings us back to the original question: where should one start to create an alpha version of a tokenize! macro, which could be used inline, without the hassle of procedurals?

mjbshaw · September 22, 2020, 5:26pm

Here's a totally untested implementation of tokenize!, which you could publish in a proc macro crate:

extern crate proc_macro;
extern crate syn;

#[proc_macro]
pub fn tokenize(tokens: proc_macro::TokenStream) -> proc_macro::TokenStream {
    let string: syn::LitStr = syn::parse(tokens).unwrap();
    return string.parse().unwrap();
}

illicitonion · September 22, 2020, 9:30pm

dhm · September 24, 2020, 9:51pm

FWIW, I've just published

unstringify!

It features what the OP asked for, implemented using a procedural macro with zero dependencies (for optimal compile times), and implements the "preprocessor pattern" (c.f., ::paste) for enhanced usability.

What for?

No idea, the best usability I could come up with was to be able to evaluate the code inside doc comments, since it's the only plausible situation where one may encounter already stringified code

It's more of an educational crate than anything else, since it showcases:

a non trivial proc-macro (featuring well-spanned error reporting) with zero-dependencies;
the callback pattern;
an inlined implementation of stringify! and concat!, to support them being used in argument position.

system · December 23, 2020, 9:51pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Idea: Macro for constructing Strings without format! libs	22	2522	August 21, 2020
Converting Rust's AST back to code compiler	2	1062	June 25, 2021
Is there any such thing as `str!("Hello world.")` instead of `"...".into_string()`? libs	18	3829	March 25, 2019
CharStream macros language design	7	1037	July 30, 2020
Macro Keyword language design	40	3660	August 5, 2020

What would it take to create a "tokenize!" macro?

unstringify!

Related topics

`unstringify!`