What would it take to create a "tokenize!" macro?

To convert a series of tokens into a string, there's a stringify! macro.

To convert a string into a series of tokens, there's nothing.

An example of a situation where it might be useful is on the users' forum.

How hard would it be to create it, and where would it make most sense to start?

I'm not too much into macros but I suspect that something like this can be achieved in a crate with a procedural macro.

3 Likes

You can do this in a proc macro via syn (it's a bit of a heavyweight dependency for something like this but oh well). You should be able to do syn::parse_str<TokenStream>(some_string) to turn the string into a TokenStream (or some other type if you want, including a custom type), which you can then manipulate.

2 Likes

I've just come across the proc_macro crate, documented here, with an example:

extern crate proc_macro;
use proc_macro::TokenStream;

#[proc_macro]
pub fn make_answer(_item: TokenStream) -> TokenStream {
    "fn answer() -> u32 { 42 }".parse().unwrap()
}

Would this work as well, if instead of the string provided we used the stringify! macro?

1 Like

I think you want quote!?

1 Like

Why would you stringify! something, only to re-parse it back into tokens?

I think this has started venturing back into URLO territory. stringify! primarily exists in Rust's standard library because there wasn't any other way to stringify tokens back in Rust 1.0. But now that we have proc macros it's possible for users to implement their own version of stringify, including the reverse operation. Since it's already possible for a user to implement tokenize! as a proc macro, I think people would want to see more justification for the existence of a tokenize! macro in Rust's standard library. IRLO is a good place to discuss justifications for such a macro in std, and URLO is a good place to discuss ways to implement parsing/tokenizing/stringifying.

1 Like

Procedural macros require to create separate crates, totally segregated from the rest of the code, in order to perform a seemingly trivial operation. I'm not familiar with the history of stringify!, but I know that it can be used anywhere - which can't be said about procedurals. Thus - the suggestion of tokenize!

Procedural macros are annoying to write. I agree that the requirement to split things into separate crates is far from ideal.

That said, I think the right way to work this out is to fix proc macros so they don't require separate crates, rather than introducing new macros into std. I can't find the discussion, but there have been some ideas in the past about how to fix proc macros so they're easier to write without overcomplicating the compilation process (it was something like putting proc macros in their own submodule without requiring them to be in their own crate).

I personally think tokenize! isn't quite useful enough for enough people to be in the std library. But I'm just some random user, so my opinions don't mean much.

That could work as well - although I don't see how introducing a new macro to std is going to hurt anyone. We're not changing any of the existing code, nor introducing any breaking changes. Just another feature to make lives of developers, who need to interact with macros and meta-programming features of Rust on a daily basis, that much easier.

There is a general goal to make std lean and push users to use external crates. The concern isn't that this would break anyone, but rather that it would be yet another thing added to std. Of course the ultimate decision is up to the libs team, but usually they ask for justification for any new items in std that show it's such a useful feature that it needs to be in std instead of an external crate. Usually things start out as a user crate, where they're iterated on and gain traction amongst users. Once the design has been proven, and the usage is high enough that it shows many people want/need this, then it is considered for adoption in std.

That makes sense, and it brings us back to the original question: where should one start to create an alpha version of a tokenize! macro, which could be used inline, without the hassle of procedurals?

Here's a totally untested implementation of tokenize!, which you could publish in a proc macro crate:

extern crate proc_macro;
extern crate syn;

#[proc_macro]
pub fn tokenize(tokens: proc_macro::TokenStream) -> proc_macro::TokenStream {
    let string: syn::LitStr = syn::parse(tokens).unwrap();
    return string.parse().unwrap();
}
1 Like
2 Likes

FWIW, I've just published

unstringify!

It features what the OP asked for, implemented using a procedural macro with zero dependencies (for optimal compile times), and implements the "preprocessor pattern" (c.f., ::paste) for enhanced usability.

What for?

No idea, the best usability I could come up with was to be able to evaluate the code inside doc comments, since it's the only plausible situation where one may encounter already stringified code :sweat_smile:

It's more of an educational crate :nerd_face: than anything else, since it showcases:

  • a non trivial proc-macro (featuring well-spanned error reporting) with zero-dependencies;

  • the callback pattern;

  • an inlined implementation of stringify! and concat!, to support them being used in argument position.

7 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.