Getting value out of `proc_macro::Literal`

matklad · February 28, 2021, 4:05pm

Quick question: would it be a terrible idea to add the following imps

impl TryFrom<proc_macro::Literal> for $ty {
    type Error = $tyValueError
    fn try_from(lit: proc_macro::Literal) -> Result<$ty, Self::Error> {
        ...
    }
}

where $ty ranges over all numeric types, char and String?

Right now, if you want to extract the value out of String literal, you need to do your own string unescaping, which is not great.

jhpratt · February 28, 2021, 5:43pm

While this would be a massive improvement in and of itself, I think it would be better to expose more of the internal proc macro structure. You certainly know more about this than me, but I believe rustc already knows the actual type — along with things like numeric suffixes and whatnot.

These two could be done separately, of course.

matklad · February 28, 2021, 5:50pm

The idea here is exactly that we can avoid exposing internal structure, which is quite fiddly. What is the type of the value of 92? It can be any integral type! Expressing this with enums gets awkward. With try from, we can just make more than one conversion succeed.

jhpratt · February 28, 2021, 5:51pm

That's certainly true. I'm by no means opposed to what you're suggesting to be clear; I think it would be great.

dhm · February 28, 2021, 7:37pm

FWIW, here is what the code to handle all the cases correctly looks like:

github.com

dtolnay/syn/blob/69148aa2ff558bb4f10322ecc9ab505c4b835aba/src/lit.rs#L907-L1556


      
          mod value {
              use super::*;
              use crate::bigint::BigInt;
              use proc_macro2::TokenStream;
              use std::char;
              use std::ops::{Index, RangeFrom};
          
              impl Lit {
                  /// Interpret a Syn literal from a proc-macro2 literal.
                  pub fn new(token: Literal) -> Self {
                      let repr = token.to_string();
          
                      match byte(&repr, 0) {
                          b'"' | b'r' => {
                              let (_, suffix) = parse_lit_str(&repr);
                              return Lit::Str(LitStr {
                                  repr: Box::new(LitRepr { token, suffix }),
                              });
                          }
                          b'b' => match byte(&repr, 1) {

This file has been truncated. show original

I personally agree that it could be legitimate to bundle some of these helpers into the "standard" proc-macro library, if only because it could perform some of these operations in a more performant fashion (unnecessary additional stringifications),

In that regard, I think that a with_str<R>(&self, _: impl FnOnce(&str) -> R) -> R kind of API would be a simple, and performant optimization over the current Display-based API of Literals (using Display to inspect a value seems like a hack).

and also because most macro authors may not be aware of these caveats.

That being said, I don't think this logic belongs to a TryFrom trait; it should involve some ParseLiteral trait, which could be akin to FromStr (which incidentally relates to the with_str API). For instance, with integer literals, we shouldn't be assuming the underlying type nor the underlying base. So I'd expect things like

let n = lit.get_value::<parse_base_10::u16>()?;
let s = lit.get_value::<String>()?;

rather than:

let n = u16::try_from(lit)?;
let s = String::try_from(lit)?;

jhpratt · March 5, 2021, 1:07am

Is this something that could be done with a simple PR? Actually just running into a situation where I realize that even handling seemingly basic cases is stupidly difficult, and would certainly prefer something better.

matklad · March 5, 2021, 8:06am

My gut feeling is that that's more of an RFC material, sadly, as the API surface is significant, and the trait impls are insta stable @jhpratt are by any chance volunteering to write an RFC?

jhpratt · March 5, 2021, 12:17pm

Can't say as I've ever written an RFC, but I'll look into the process for doing so.

Aloso · March 5, 2021, 6:29pm

Yes! This would speed up cargo run and cargo check, because these commands compile procedural macros in debug mode. Implementing literal parsing in the standard library (which is built in release mode) will undoubtedly be much faster.

jhpratt · March 5, 2021, 7:04pm

Not to mention that they're being parsed already, so they're really being parsed twice currently.

upsuper · March 8, 2021, 10:03am

I'd be very happy to see this. syn is quite heavyweight, and it's sometimes undesirable to add its dependency just for parsing things as simple as literal. My crate cstr (for generating static CStr reference) added a simplified implementation of string and byte string parsing to avoid syn dependency per request from user.

Given that, I'd also like to add that please also add Vec<[u8]> for byte strings not just String

upsuper · March 8, 2021, 10:08am

Alternatively, it might be useful to extract certain parsing code from rustc into small crates and publish on crates.io, having both syn and rustc (and other third-party things) depend on it. The literal format seems to be reasonably stable, and it can probably be such target. This also avoids extending the API surface.

matklad · March 8, 2021, 10:29am

You wish is granted:

upsuper · March 9, 2021, 8:57am

Thanks.

The versioning is a bit hostile, and it still requires unquoting the literal, which needs some extra code, although maybe not as many. But the versioning makes me hesitant.

What I was trying to suggest is to have a proper isolated crate and have rustc depend on it for parsing literal, rather than an auto-published crate from rustc which isn't quite properly maintained to be used outside the compiler circle. But it seems that it may not be easy to have an easy-to-use interface that satisfies both rustc and proc macro uses anyway, so okay...

matklad · March 9, 2021, 9:20am

rustc_lexer is such proper isolated crate. It's interface is finicky, but is explicitly designed to be independent of the compiler. It is used by rust-analyzer as well.

The interface is not as straightforward as one might expect because it needs to express more that just "what's the value of this string". It also needs to be able to point locations of escape sequences within the string, and allow error resilience.

EDIT: the versioning indeed can be improved a bit, the crate can obey the proper semver. However, there would be little benefit there, but it would require new infra in rustc to deal with "in-tree, but from crates.io deps".

system · June 7, 2021, 9:21am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Feature Request: `scan!` macro language design	11	782	October 10, 2024
`TryFromIntError` should include the original value libs	5	867	March 2, 2021
Impl TryFrom<&str> for CString language design	7	172	August 20, 2024
[pre-RFC] custom string literals language design	7	3943	March 25, 2019
derive(TryFrom) for C-like enums libs	6	4703	November 22, 2020

Getting value out of `proc_macro::Literal`

Related topics