Getting value out of `proc_macro::Literal`

Quick question: would it be a terrible idea to add the following imps

impl TryFrom<proc_macro::Literal> for $ty {
    type Error = $tyValueError
    fn try_from(lit: proc_macro::Literal) -> Result<$ty, Self::Error> {
        ...
    }
}

where $ty ranges over all numeric types, char and String?

Right now, if you want to extract the value out of String literal, you need to do your own string unescaping, which is not great.

11 Likes

While this would be a massive improvement in and of itself, I think it would be better to expose more of the internal proc macro structure. You certainly know more about this than me, but I believe rustc already knows the actual type — along with things like numeric suffixes and whatnot.

These two could be done separately, of course.

1 Like

The idea here is exactly that we can avoid exposing internal structure, which is quite fiddly. What is the type of the value of 92? It can be any integral type! Expressing this with enums gets awkward. With try from, we can just make more than one conversion succeed.

4 Likes

That's certainly true. I'm by no means opposed to what you're suggesting to be clear; I think it would be great.

1 Like

FWIW, here is what the code to handle all the cases correctly looks like:

:dizzy_face:

I personally agree that it could be legitimate to bundle some of these helpers into the "standard" proc-macro library, if only because it could perform some of these operations in a more performant fashion (unnecessary additional stringifications),

  • In that regard, I think that a with_str<R>(&self, _: impl FnOnce(&str) -> R) -> R kind of API would be a simple, and performant optimization over the current Display-based API of Literals (using Display to inspect a value seems like a hack).

and also because most macro authors may not be aware of these caveats.

That being said, I don't think this logic belongs to a TryFrom trait; it should involve some ParseLiteral trait, which could be akin to FromStr (which incidentally relates to the with_str API). For instance, with integer literals, we shouldn't be assuming the underlying type nor the underlying base. So I'd expect things like

let n = lit.get_value::<parse_base_10::u16>()?;
let s = lit.get_value::<String>()?;

rather than:

let n = u16::try_from(lit)?;
let s = String::try_from(lit)?;

Is this something that could be done with a simple PR? Actually just running into a situation where I realize that even handling seemingly basic cases is stupidly difficult, and would certainly prefer something better.

My gut feeling is that that's more of an RFC material, sadly, as the API surface is significant, and the trait impls are insta stable :frowning: @jhpratt are by any chance volunteering to write an RFC?

Can't say as I've ever written an RFC, but I'll look into the process for doing so.

1 Like

Yes! This would speed up cargo run and cargo check, because these commands compile procedural macros in debug mode. Implementing literal parsing in the standard library (which is built in release mode) will undoubtedly be much faster.

1 Like

Not to mention that they're being parsed already, so they're really being parsed twice currently.

1 Like

I'd be very happy to see this. syn is quite heavyweight, and it's sometimes undesirable to add its dependency just for parsing things as simple as literal. My crate cstr (for generating static CStr reference) added a simplified implementation of string and byte string parsing to avoid syn dependency per request from user.

Given that, I'd also like to add that please also add Vec<[u8]> for byte strings not just String :slight_smile:

Alternatively, it might be useful to extract certain parsing code from rustc into small crates and publish on crates.io, having both syn and rustc (and other third-party things) depend on it. The literal format seems to be reasonably stable, and it can probably be such target. This also avoids extending the API surface.

You wish is granted:

Thanks.

The versioning is a bit hostile, and it still requires unquoting the literal, which needs some extra code, although maybe not as many. But the versioning makes me hesitant.

What I was trying to suggest is to have a proper isolated crate and have rustc depend on it for parsing literal, rather than an auto-published crate from rustc which isn't quite properly maintained to be used outside the compiler circle. But it seems that it may not be easy to have an easy-to-use interface that satisfies both rustc and proc macro uses anyway, so okay...

rustc_lexer is such proper isolated crate. It's interface is finicky, but is explicitly designed to be independent of the compiler. It is used by rust-analyzer as well.

The interface is not as straightforward as one might expect because it needs to express more that just "what's the value of this string". It also needs to be able to point locations of escape sequences within the string, and allow error resilience.

EDIT: the versioning indeed can be improved a bit, the crate can obey the proper semver. However, there would be little benefit there, but it would require new infra in rustc to deal with "in-tree, but from crates.io deps".