Pre-RFC: Custom Literals via Traits

[Moderator note: General discussion about non-ASCII identifiers (not specifically related to custom literals) should go in a separate thread or in the RFC PR. Note that there are over 500 comments on that PR already, so please try not to repeat comments that have already been made.]

6 Likes

A few days and an RFC draft later (and many excellent comments from @hanna-kruppe poking holes in it!) , I’d like to propose an alternative to the i__ and f__ for feedback.

Introduce two lang items (note that the names are strawmen; they clash with the traits I’ve defined above!)

#[lang = "int_literal"]
struct IntLit(str);

impl IntLit {

    /// Creates a new IntLit from a string,
    /// checking to make sure it's syntactically valid!
    const fn new<'a>(&'a str) -> &'a IntLit { .. }

    /// The literal, exactly like it appears in
    /// source code.
    const fn verbatim(&self) -> &str { self.0 }

    /// The canonicalized for of the literal, with
    /// extraneous '0' and '_' characters removed,
    /// and converted to base 10.
    ///
    /// assert_eq!(IntLit::new("9001").canonical(), "9001")
    /// assert_eq!(IntLit::new("0x00_ff_ff").canonical(), "65535")
    const fn canonical(&self) -> &str { .. }
}

#[lang = "float_literal"]
struct FloatLit(str);

impl FloatLit {

    /// Creates a new IntLit from a string,
    /// checking to make sure it's syntactically valid!
    const fn new<'a>(&'a str) -> &'a FloatLit { .. }

    /// The literal, exactly like it appears in
    /// source code.
    const fn verbatim(&self) -> &str { self.0 }

    /// The canonicalized for of the literal, with
    /// extraneous '0', '_', and '.' characters removed, and
    /// with the exponent removed if the exponent is 0. The
    /// 'e' in the exponent is also lowecased.
    ///
    /// assert_eq!(IntLit::new("42.42").canonical(), "42.42")
    /// assert_eq!(IntLit::new("1.").canonical(), "1")
    /// assert_eq!(IntLit::new("0.2").canonical(), ".2")
    ///
    /// assert_eq!(IntLit::new("10e100").canonical(), "10e100")
    /// assert_eq!(IntLit::new("10E-5").canonical(), "10e-5")
    /// assert_eq!(IntLit::new("1e0").canonical(), "1")
    const fn canonical(&self) -> &str { .. }

    // the following are optional convenience methods for
    // splitting a float literal. it is not clear if we will
    // be including these or not

    /// The decimal part of this literal.
    const fn decimal_part(&self) -> &str { .. }   

    /// The integral part of this literal, if it exists.
    const fn decimal_part(&self) -> Option<&str> { .. }   

    /// The fractional part of this literal, if it exists.
    const fn fractional_part(&self) -> Option<&str> { .. }   

    /// The exponent of this literal, if it exists.
    const fn exponent(&self) -> Option<&str> { .. }   
}

Alternatively, we could imagine something like

// imagine for simplicity that you can have multiple DSTs in the same
// struct... we'll just store the metadata at the top
struct FloatLit {
    integral_part: [u8],
    fractional_part: [u8],
    negative_exponent: bool,
    exponent: [u8],
}

where the byte arrays represent the parsed components in the target endianess, i.e.

12.34e56
// produces
FloatLit {
    integral_part: transmute(12),
    fractional_part: transmute(34),
    negative_exponent: false,
    exponent: transmute(56),
}
2 Likes

I have a question. Can literals apply arithmatic operators?

Another word, assume that we have two literrals, m for meter and s for second. In order to build a speed value m/s for meters per second, is it possible to apply a type of ā€˜division’ for them like let velocity = 1.5m/s? This ā€˜division’ could be different from what we write between two variants.

A lot of people really want this, actually! I don't want to include it in my current proposal though, since I think there's more value in simple literals before compound literals, and, somewhat more sharply, I think it's really hard to get right! For example, it's not clear that 1.5m/s should parse as (1.5m)/(s) or 1.5(m/s)! Somewhere upthread I suggested a verbose syntax like 1.5::[m/s] which doesn't clash with current syntax.

1 Like

I just come up with an idea. What if we provide a unit value for some literals especially what is used in physics and maths? Unit value is for example 1.0m for m as meters, 1.0s for s as seconds. Then if we parse 1.5m/s, we are parsing s into 1.0s at first, and then 1.5m/1.0s, then it comes to the normal division between two variants, which is able to produce a special struct or something else, for the speed value we expected. In that way we would somehow suggest it the same, for (1.5m)/(s) firstly goes into (1.5m)/(1.0s), and 1.5(m/s) into 1.5*(1.0m/1.0s).

There could be a trait for unit values, and when multiplications and divisions are applied as a*b and a/b where b is a unit value, the calculation is never executed and the a*b or a/b is directly parsed into a. There could be more limits under this unit value trait.

So, I’m not sure how much you know about Rust’s parser; sorry if I’m repeating stuff you already know =P

Rust, unlike e.g. C++, parses all of the code in one go before analyzing it. This means that when it parses a symbol, it has no idea what kind of entity it is (e.g. type, value) except from local context (only types go in <>, only expressions can go after a let, etc). Thus, there’s no way to tell from just looking at the expression 10m/s whether s should be a variable, or a unit, like you describe… so we’re stuck with parsing ambiguities, because Rust is context-free*.

* Not actually but that’s not relevant to this post.

Thank you of reminding me. I should have armed myself with systematized knowledge on Rust compilers. It is somehow ambiguous if design this new feature like what I did.

Don't worry! Learning how rustc works is a major undertaking, so I don't think anyone will fault you for not knowing some detail or other.

2 Likes

Of note is that in this proposal s is a unit struct though. So 5m/s has meaning that could me made to so what’s expected by making Div::div(5m, s) do what’s expected, purely as library code.

Maybe not the purest solution but it works. At least if you have some way of unifying units in the type system.

2 Likes

I’m guessing in addition to these IntLit and FloatLit types there would also be StrLit, ByteStrLit, etc. types for the other kinds of literal? With backslash-escapes interpreted and all that stuff handled.

Also, the ā€œSummaryā€ section of the first post in this thread mentions ā€œCompile-time checks of simple embedded langugesā€, so I’d hope the trait method for handling a given literal would return Result<Self::Output, Self::Error> so that a custom literal can give a helpful error message.

About the const thing: perhaps there could be two traits, a super-trait with a method that will be invoked at runtime, and a sub-trait that adds a method that will be invoked at compile time to produce a value that can be embedded in the .rodata section. If your custom literal is a simple newtype around a primitive, you can safely make one at compile time and so you implement both traits; if your custom literal is complex it must be handled at runtime so you only implement the super-trait.

No; all those literals already have types (e.g. &'static str, &'static [u8]). See the list of traits in my proposal; the new types I’m proposing in my recent post are merely to fill in i__ and f__ in my proposal.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.