[Moderator note: General discussion about non-ASCII identifiers (not specifically related to custom literals) should go in a separate thread or in the RFC PR. Note that there are over 500 comments on that PR already, so please try not to repeat comments that have already been made.]
A few days and an RFC draft later (and many excellent comments from @hanna-kruppe poking holes in it!) , Iād like to propose an alternative to the i__ and f__ for feedback.
Introduce two lang items (note that the names are strawmen; they clash with the traits Iāve defined above!)
#[lang = "int_literal"]
struct IntLit(str);
impl IntLit {
/// Creates a new IntLit from a string,
/// checking to make sure it's syntactically valid!
const fn new<'a>(&'a str) -> &'a IntLit { .. }
/// The literal, exactly like it appears in
/// source code.
const fn verbatim(&self) -> &str { self.0 }
/// The canonicalized for of the literal, with
/// extraneous '0' and '_' characters removed,
/// and converted to base 10.
///
/// assert_eq!(IntLit::new("9001").canonical(), "9001")
/// assert_eq!(IntLit::new("0x00_ff_ff").canonical(), "65535")
const fn canonical(&self) -> &str { .. }
}
#[lang = "float_literal"]
struct FloatLit(str);
impl FloatLit {
/// Creates a new IntLit from a string,
/// checking to make sure it's syntactically valid!
const fn new<'a>(&'a str) -> &'a FloatLit { .. }
/// The literal, exactly like it appears in
/// source code.
const fn verbatim(&self) -> &str { self.0 }
/// The canonicalized for of the literal, with
/// extraneous '0', '_', and '.' characters removed, and
/// with the exponent removed if the exponent is 0. The
/// 'e' in the exponent is also lowecased.
///
/// assert_eq!(IntLit::new("42.42").canonical(), "42.42")
/// assert_eq!(IntLit::new("1.").canonical(), "1")
/// assert_eq!(IntLit::new("0.2").canonical(), ".2")
///
/// assert_eq!(IntLit::new("10e100").canonical(), "10e100")
/// assert_eq!(IntLit::new("10E-5").canonical(), "10e-5")
/// assert_eq!(IntLit::new("1e0").canonical(), "1")
const fn canonical(&self) -> &str { .. }
// the following are optional convenience methods for
// splitting a float literal. it is not clear if we will
// be including these or not
/// The decimal part of this literal.
const fn decimal_part(&self) -> &str { .. }
/// The integral part of this literal, if it exists.
const fn decimal_part(&self) -> Option<&str> { .. }
/// The fractional part of this literal, if it exists.
const fn fractional_part(&self) -> Option<&str> { .. }
/// The exponent of this literal, if it exists.
const fn exponent(&self) -> Option<&str> { .. }
}
Alternatively, we could imagine something like
// imagine for simplicity that you can have multiple DSTs in the same
// struct... we'll just store the metadata at the top
struct FloatLit {
integral_part: [u8],
fractional_part: [u8],
negative_exponent: bool,
exponent: [u8],
}
where the byte arrays represent the parsed components in the target endianess, i.e.
12.34e56
// produces
FloatLit {
integral_part: transmute(12),
fractional_part: transmute(34),
negative_exponent: false,
exponent: transmute(56),
}
I have a question. Can literals apply arithmatic operators?
Another word, assume that we have two literrals, m for meter and s for second. In order to build a speed value m/s for meters per second, is it possible to apply a type of ādivisionā for them like let velocity = 1.5m/s? This ādivisionā could be different from what we write between two variants.
A lot of people really want this, actually! I don't want to include it in my current proposal though, since I think there's more value in simple literals before compound literals, and, somewhat more sharply, I think it's really hard to get right! For example, it's not clear that 1.5m/s should parse as (1.5m)/(s) or 1.5(m/s)! Somewhere upthread I suggested a verbose syntax like 1.5::[m/s] which doesn't clash with current syntax.
I just come up with an idea. What if we provide a unit value for some literals especially what is used in physics and maths? Unit value is for example 1.0m for m as meters, 1.0s for s as seconds. Then if we parse 1.5m/s, we are parsing s into 1.0s at first, and then 1.5m/1.0s, then it comes to the normal division between two variants, which is able to produce a special struct or something else, for the speed value we expected. In that way we would somehow suggest it the same, for (1.5m)/(s) firstly goes into (1.5m)/(1.0s), and 1.5(m/s) into 1.5*(1.0m/1.0s).
There could be a trait for unit values, and when multiplications and divisions are applied as a*b and a/b where b is a unit value, the calculation is never executed and the a*b or a/b is directly parsed into a. There could be more limits under this unit value trait.
So, Iām not sure how much you know about Rustās parser; sorry if Iām repeating stuff you already know =P
Rust, unlike e.g. C++, parses all of the code in one go before analyzing it. This means that when it parses a symbol, it has no idea what kind of entity it is (e.g. type, value) except from local context (only types go in <>, only expressions can go after a let, etc). Thus, thereās no way to tell from just looking at the expression 10m/s whether s should be a variable, or a unit, like you describe⦠so weāre stuck with parsing ambiguities, because Rust is context-free*.
* Not actually but thatās not relevant to this post.
Thank you of reminding me. I should have armed myself with systematized knowledge on Rust compilers. It is somehow ambiguous if design this new feature like what I did.
Don't worry! Learning how rustc works is a major undertaking, so I don't think anyone will fault you for not knowing some detail or other.
Of note is that in this proposal s is a unit struct though. So 5m/s has meaning that could me made to so whatās expected by making Div::div(5m, s) do whatās expected, purely as library code.
Maybe not the purest solution but it works. At least if you have some way of unifying units in the type system.
Iām guessing in addition to these IntLit and FloatLit types there would also be StrLit, ByteStrLit, etc. types for the other kinds of literal? With backslash-escapes interpreted and all that stuff handled.
Also, the āSummaryā section of the first post in this thread mentions āCompile-time checks of simple embedded langugesā, so Iād hope the trait method for handling a given literal would return Result<Self::Output, Self::Error> so that a custom literal can give a helpful error message.
About the const thing: perhaps there could be two traits, a super-trait with a method that will be invoked at runtime, and a sub-trait that adds a method that will be invoked at compile time to produce a value that can be embedded in the .rodata section. If your custom literal is a simple newtype around a primitive, you can safely make one at compile time and so you implement both traits; if your custom literal is complex it must be handled at runtime so you only implement the super-trait.
No; all those literals already have types (e.g. &'static str, &'static [u8]). See the list of traits in my proposal; the new types Iām proposing in my recent post are merely to fill in i__ and f__ in my proposal.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.