Pre-RFC: Custom Literals via Traits

Riateche · July 25, 2018, 1:24am

I propose the following solution to the i__ input type problem:

// marker
trait LiteralInput {}
impl LiteralInput for i32 {}
//impl LiteralInput for u32, i64, ..., f32, f64, &'static str, ...
// possibly add big integer literals in some form in the future

trait Literal<T: LiteralInput> {
    type Output;
    /*const*/ fn apply_literal(input: T) -> Self::Output;
}

struct Meter(i32);
enum m {}
impl Literal<i32> for m {
    type Output = Meter;
    /*const*/ fn apply_literal(input: i32) -> Meter {
        Meter(input)
    }
}

With this, a custom literal can use any of the available input types, and it’s possible to support more input types (e.g. i256 or [u8; N]) in the future if needed. If the custom literal requires i32, the compiler will try to interpret the preceeding integer literal as i32 (and fail if it’s not a valid i32 literal).

It also will be possible to implement the trait for multiple input types, e.g. impl Literal<i32> for m and impl Literal<i64> for m. This is useful if multiple input types makes sense for this literal (for example, it may even produce a generic Meter<T: Num>). The compiler will pick a suitable implementation for invokation. This can work in a very similar way to how operator traits like Add<Rhs> or Index<Idx> work now.

If there are multiple implementations and the supplied literal can be interpreted as any of them, it’s not clear what the compiler chooses. It’s not very good but it’s the same situation we already have with normal integer literals. You may not know what type is selected for vec![1] if the context doesn’t place additional limitations. The type inference should do its magic. If the context requires a Meter<i64>, the impl Literal<i64> for m will be chosen over impl Literal<i32> for m.

kennytm · July 25, 2018, 1:40am

mcy:

The following impl s generate a lint (yet unnamed, please bikeshed):
impl IntLit for e<numbers> { .. }
impl IntLit for E<numbers> { .. }

impl FloatLit for e<numbers> { .. }
impl FloatLit for E<numbers> { .. }
This is to point out a parse ambiguity. Float literals in scientific notation are always lexed as literals, since having 10e100 possibly parse as a custom literal is extremely confusing.

If this one is linted, there should also be lints for:

impl IntLit for ** where the name ** starts with any of [a-fA-F], since 0x7fff_EiB (intended for exbibyte) will be interpreted as 0x7fffE with suffix iB.
- This could be problematic as it covers a lot of units of measures (A, cd, Bq, C, F, au, day, dB, eV, ft, fl_oz, prefixes exa-/deca-/deci-/centi-/femto-/atto-)
- We could avoid this if there would be separate HexIntLit/OctIntLit/BinIntLit traits, thought that means u256 will need to implement 4 traits (should be fine?).
- Speaking of which, should we also lint prefix x and o because of 0x6? I suggest no since 0_x6 is unambiguously parsed as 0 with suffix x6.
impl *Lit for ** where the name ** starts with _. 1234____s is equivalent to 1234s so there shouldn't be any leading underscores.

Or maybe these don't need to be linted at declaration site since you could unambiguously call it like 10::<e100> or 0x7fff::<EiB> (I'm not sure if using turbofish for this syntax is a good idea though).

scottmcm · July 25, 2018, 3:03am

In C++, milliseconds is a type: std::chrono::duration - cppreference.com (As is milli, but that's a different discussion.)

That solves all these problems, and as a bonus allows more efficient uses in places that don't want or need 1ns resolution (like sleeping a thread, which can sleep neither for 1ns nor for 136 years).

(Of course, it needs a bunch of stuff that rust doesn't have yet to have it not also cause a ton of manual grunt work creating all these types and to avoid the need for .into() all over the place to actually use them.)

mcy · July 25, 2018, 5:15am

Yeah... this is something of a mess. I'm actually in favor of HexIntLit/OctIntLit/BinIntLit, all of which have IntLit as a supertrait. This lets us lint a little bit less aggressively. core certainly won't have to put up with it, since the implementations for the primitive literals will get macro-generated anyways. Unit literals really shouldn't be given by hex, or octal.

10::[e100]? I think further abusing :: for parsing disambiguation is fine. Even in the presence of a disambiguation syntax, we should still lint, because people will be really confused when 10e100 doesn't do what they want and it takes them a second to remember about scientific notation literals.

fintelia · July 25, 2018, 2:53pm

Other than the type checked SI units proposal which currently seems to have a bunch of other open questions, couldn’t these use cases be supported by either:

A (proc?) macro:

time!(2s + 3m)
complex!(3 + 4i)
quaternion!(2i + 3j + 4k)

A normal const fn:

Regex::from_static_str("foo(.+)")
Utf16String::from_static_str("Hello, World!")

Or, a const fn + literal types:

BigInt::from_literal(192346712347913279231461927356)
HalfFloat::from_literal(1.23)

mcy · July 25, 2018, 7:32pm

Sure, this is the implicit “or we could just, not do this” in any RFC. Also, things like time!(2s + 3m) are hilariously hard to implement, because + can’t be used as a separator currently.

Soni · July 25, 2018, 8:47pm

I have to ask, what about 1e10e10?

mcy · July 25, 2018, 8:48pm

Parses as 1e10::[e10] and desugars to <e10 as FloatLit>::float_lit(1e10);. I think I pointed this out elsewhere.

Soni · July 25, 2018, 9:03pm

or should it be e10e10::int_lit(1)?

mcy · July 25, 2018, 9:18pm

The RFC is very clear on this. We always parse a literal as far as it will go before parsing the suffix, because doing otherwise will confuse users. See @kennytm’s replies.

Centril · July 26, 2018, 1:17am

So we have...

and if we have:

1.23 < m / s >

(or some other two set of tokens for disambiguation) we desugar to:

<<m as Div<s>>::Output as IntLit>::int_lit(1.23)

I guess that makes sense... Basically, we interpret < m / s > as a type. However, translating / to Div this way is a bit weird.

Wild speculation: Another mechanism could be to interpret m / s as a const expression and so you get something like Unit<{ m / s }>. However, then you may get into coherence problems wrt. the type constructor Unit and so it might need to be #[fundamental] so that you can write something like:

impl IntLit for Unit<{m / s}> {
    type Output = Qty<Quot<Meter, Second>>;
    const fn int_lit(lit: i__) -> Self::Output {
        ...
    }
}

kennytm · July 26, 2018, 2:12am

Such Unit type is probably not definable in Rust.

struct Unit<const suffix: ?????> { ... }

BTW,

/ has higher precedence than ^, this would be desugared to

(<m as IntLit>::int_lit(1.0).div(s)).bit_xor(2);

you'll need an exponential operator for this to be ordered properly (which cannot be ** since x**y already means x*(*y)).

Centril · July 26, 2018, 2:25am

Oh dear... Hmm... the only thing I can think of is:

struct Unit<const suffix: TypeId> { ... }

and so you translate to Unit<{ make_type_id( m / s ) }> but at this stage you might be better of translating to:

Unit<typeof(m / s)>

or directly via:

<typeof(m / s) as IntLit>::int_lit(1.23)

instead.

EDIT: actually, this last snippet is not so bad?

iliekturtles · July 27, 2018, 2:02pm

I haven’t been able to review this proposal, or the custom suffixes proposal, to the depth I would like. I apologize for anything already covered or discussed.

In my biased opinion (I’m the author of uom) units of measure are one of the more important uses of custom suffixes and the RFC shouldn’t discount that. Using abbreviations works for simple units when required to be a valid rust identifier. What about more complex abbreviations? m/s? m/s^2? m/s²? Å? Fall back to using full descriptions written as valid identifiers? What about supporting quantities with no explicitly defined units? Can existing units be combined (kg m/s^2)?

My other concern is around handling of literals. All of the examples except the first three would require code running at compile time. e.g. in the BigInt example when the number can’t be represented by the largest built-in integer type how do you convert that into an instance of BigInt at compile time? If delaying the conversion until run time is 12345..._BigInt really that big of an improvement over "12345...".bigint() or BigInt::new("12345...") that are already supported today?

You could ignore the problem and just include the built-in literal type in the trait definition:

pub trait<T> CustomLiteral<T> {
    type Output;
    const fn custom_literal(literal: T) -> Self::Output;
}

impl CustomLiteral<i32> for meter {
    type Output = Meter;
    const fn custom_literal(literal: i32) -> Meter { Meter::new(literal) }
}

mcy · July 27, 2018, 7:02pm

Ah, excellent! I was going to try to track you down!

The discussion so far seems to have settled on picking some hitherto unused syntax that simple literals desugar to. The current syntax strawman is 42i32 -> 42::[i32]. It is unclear what the grammar inside ::[] (which one might be tempted to call a "turbolit", as in "turbofish") is going to be, but the idea is that expressions like 123::[m/s] will be desugared into something like m::int_lit(123).lit_div::<s>(). I think that this is far more than I want to try to figure out in an initial RFC, so I'm punting on any literals that aren't Rust identifiers. (Also, as nice as having a symbol for angstroms sounds, I'm not going to be in favor of allowing non-ascii outside of str/char lits.)

Right, this is a problem my proposal is blocked on. In order to happen, we need to sort out my literal types proposal (which I linked in the OP), which adds the unsized types i__ and f__. In principle it might be ok to have &'static i__ around at runtime, but it'd need to be cast to i32 or whatever first (which would be mediated by a compiler-provided shim, leaving the representation of such pointers unspecified).

Your alternate proposal is interesting, but has a few problems. First, we'd probably want this:

// core::marker
pub unsafe trait FromLiteral {}
unsafe impl FromLiteral for i32 {}
unsafe impl FromLiteral for &'static str {}

to have as a bound for T in your trait. There's also the problem that you need to implement it all at once for all of the number types if you want reasonable behavior, and there's no good way to enforce this. We also can't enforce that you need e.g. T = i64 if you want T = f64. My FloatLit has IntLit as a supertrait.

There isn't. The rationale for adding this is the same rational for any operator overloading: ergonomics and readability. (Of course, like all operators, it's a massive readability footgun if overused, but that ship sailed with the rest of the ops traits.)

mcy · July 27, 2018, 8:24pm

After mulling it over for a few days, I’m going to postpone this proposal for now. There’s just too many unresolved questions, and I’m also kind of waiting for @varkor to come up with an opinion on exactly what a prefix is in the type system, as well as figuring out the story with arbitrary-size compile-time literals.

Tom-Phinney · July 28, 2018, 1:26am

That's not going to be a major issue when the Allow non-ASCII identifiers (#2457) RFC finally gets implemented. (An earlier version is already implemented in nightly.) Many Unicode scientific symbols will become generally available at that point.

mcy · July 28, 2018, 1:33am

Well, that's certainly not my problem. I disagree with that proposal (and, in fact, was unaware of it!), but if Rust gets non-ASCII idents, this proposal will get them as corollary.

use .. as ..; can be used to rename them to something that doesn't require finger-hurting chording, in any case!

Tom-Phinney · July 28, 2018, 1:45am

Nothing requires use of non-ASCII characters. However, since the limited ASCII subset of the Roman alphabet is not the native alphabet for most people in the world, at some point Rust needs to be inclusive and not relegate everyone else to secondary-citizen status.

It should be possible to write variable and type names in one's own native language, even when it has umlauts or cedillas or accents on vowels (and that's just for people native to western Europe). Non-Roman alphabets depart even more from ASCII, and are used by many more people worldwide than those who use variants of the Roman alphabet.

Note that Rust already supports Unicode identifiers as a nightly feature. So it's more than a proposal; it's already implemented and (somewhat) in use. However the final implementation is expected to be somewhat different, which is being discussed in that linked RFC.

Tom-Phinney · July 28, 2018, 1:37pm

Are you aware that the Rust characters {, }, [, ] require chording on essentially all non-English keyboards in Western Europe (e.g., Danish, Swedish, Finish, Portuguese, Spanish, French, German, Italian), and that many of those keyboards also require chording of |, @, #, and ^? In some cases the ` character itself is unavailable even with chording, so must be copied or entered by other arcane keyboard magic.

In its choice of often-used glyphs Rust has already mandated chording for people with non-English keyboards. What's a tiny bit of similar inconvenience for someone who wishes to write µ or Å on an English keyboard (both of which characters are easily entered from most non-English Western European keyboards)?

Topic		Replies	Views
[pre-RFC] custom string literals language design	7	4013	March 25, 2019
Pre-RFC: Custom suffixes for integer and float literals language design	49	5152	March 25, 2019
Custom literals via const generics language design	14	2468	June 9, 2021
(Mega-pre-RFC) Reference specialization types (DSTs, proxy-references) language design	32	2685	March 25, 2019
[Pre-RFC] integer and float literals for custom types compiler	8	1667	December 9, 2020

Pre-RFC: Custom Literals via Traits

Related topics