This is a formal collection of my counter proposal from https://internals.rust-lang.org/t/pre-rfc-custom-suffixes-for-integer-and-float-literals/.
Summary
This RFC proposes syntax sugar for custom literals, implemented as
std::ops
traits. A custom literal is a suffix added to a numeric,
string, or character literal to produce a user-defined type. For example:
- Complex numbers:
1 + 2i
- Time literals:
42ms + 120ns
- Typechecked units:
12m / 55s
- Compile-time checks of simple embedded languges:
"foo(.+)"Regex
"18.4.0.1"IPv4
"yyyy-mm-dd"DateFormat
- Literals for custom integers:
192346712347913279231461927356_BigInt
,1.23f16
- Literals for binary blobs:
"c29tZSBiaW5hcnkgZGF0YQ=="base64
- Literals for dealing with non-utf8:
"Hello, World!"utf16
,"Hello, World!"latin1
This proposal attempts to define custom literals in the least ad-hoc way
possible, in analogy with C++'s operator ""
and Rustâs operator overloading.
An explcit non-goal of this RFC is suffixes that are not valid Rust identifiers.
Motivation
The above examples provide ample motivation: shortening calls like
Complex::new(1, 2)
to 1 + 2i
, where the syntax is sufficiently well-known
that the shortened sugary form is âgenerally knownâ, or for introducing
literal syntax for new kinds of compile-time constructs.
In the small cases where the short form is common outside of the context of the defining library, this is an ergonomic win and a readability win.
However, custom literals have the potential (like other overloadable operators) to produce horribly unreadable code. This proposal does not try to prevent misuse of custom literals, and leaves that up to the usersâ judgement.
Note: I am in principle against the use of custom literals, since they can be easily abused. However, I think it is inevitable that Rust will get them, since their niche uses are justifiable.
Guide-level explanation
A custom literal is an expression consisting of a numeric (123, 42.3, 12e-5), string (âfooâ, b"bar"), or character (âkâ, bâ\0â) literal followed by a path. For example,
let _ = 10i32; // explicit i32 literal
let _ = 2.45f32; // explicit f32 literal
let _ = 5i; // imaginary number literal
let _ = 102ms; // millisecond duration literal
let _ = ".+"regex; // regular expression literal
let _ = '?'char16; // UTF-16 codepoint literal
let _ = 0xff_ff_ff_ff_ab_cd_ef_00m8x8 // SIMD mask literal
// though this last one does not strike me as a very good idea...
Custom literals are defined by implementing a trait, like the following
example from core
:
impl IntLit for i32 {
type Output = i32;
const fn int_lit(lit: i__) -> {
lit as i32
}
}
(Note: i__
is a âliteral typeâ described here).
The Self
type of the impl is the type used in the literal expression.
Thus, it can be chosen to be a dummy type that only exists to provide a
symbol:
enum ms {}
impl IntLit for ms {
type Output = Duration;
// ..
}
Thus, all custom literals have a simple desugaring:
let _ = 123i32;
// becomes
let _ = <i32 as IntLit>::int_lit(123);
let _ = "foo"regex;
// becomes
let _ = <regex as StrLit>::str_lit("foo");
To use a literal, youâll need to import the âsymbol typeâ into scope. For example,
use std::time::literals::ms;
let millis = 23ms;
let nanos = 45ns; // ERROR: can't find type `ns`
Itâs even possible to use the whole path,
let millis = 23std::time::literals::ms;
or rename them with imports or type aliases
use std::time::literals::{ ms, ns as nanos };
type millis = ms;
let millis = 32millis;
let nanos = 45nanos;
Note that there are some parsing ambiguities. 10e100
always parses
as a single float literal. Thus, the impl
impl FloatLit for e100 { .. }
will generate a lint warning. It is still possible to call it with UFCS.
We can teach this by comparison with C++'s operator ""
. Custom literals
are intended to be defined analogouslly with Rustâs usual operator
overloading.
Reference-level explanation
This RFC changes the grammar as follows:
Literal := (IntLit | FloatLit | StrLit | CharLit | ByteStrLit | ByteCharLit) Path?
Whenever a Literal
which includes a path after it is encountered, it is
desugared to
<$path as FooLit>::foo_lit($lit)
where FooLit
is the relevant literal trait for that literal type. These
traits are as following, defined in core::ops
. Each one comes with
an attendant lang item.
#[lang = "int_literal"]
pub trait IntLit {
type Output;
const fn int_lit(lit: i__) -> Self::Output;
}
#[lang = "float_literal"]
pub trait FloatLit: IntLit {
const fn float_lit(lit: f__) -> Self::Output;
}
#[lang = "string_literal"]
pub trait StrLit {
type Output;
const fn str_lit(lit: &'static str) -> Self::Output;
}
#[lang = "char_literal"]
pub trait CharLit {
type Output;
const fn char_lit(lit: char) -> Self::Output;
}
#[lang = "byte_string_literal"]
pub trait ByteStrLit {
type Output;
const fn byte_str_lit(lit: &'static [u8]) -> Self::Output;
}
#[lang = "byte_char_literal"]
pub trait ByteCharLit {
type Output;
const fn byte_char_lit(lit: u8) -> Self::Output;
}
Furthermore, the âobviousâ implementations for the 42i32
et. al. literals
will be added to core
.
All of the expected import and name shadowing rules apply as would be expected, as corrollaries of this being implemented as a trait.
The following impl
s generate a lint (yet unnamed, please bikeshed):
impl IntLit for e<numbers> { .. }
impl IntLit for E<numbers> { .. }
impl FloatLit for e<numbers> { .. }
impl FloatLit for E<numbers> { .. }
This is to point out a parse ambiguity. Float literals in scientific notation
are always lexed as literals, since having 10e100
possibly parse as
a custom literal is extremely confusing.
One problem this solves is having the same symbol for different literals. Consider, for example, complex numbers and quaternions:
let z = 1 + 2i;
let q = 3 + 4i + 5j + 6k;
The fact that literals must be imported, and have a unique implementation, completely sidesteps this confusion. To use the first syntax you might need to use num::complex::i;
, but for the second, you might need use my_gfx::quat::literals::*;
. Now, if somehow num::complex::i
and my_gfx::quat::literals::j
are in the same scope, we get a type error:
let q = 4i + 5j;
^^ ^^
| |
| of type Quat<i32>
of type Complex<i32>
= Cannot find impl Add<Quat<i32>> for Complex<i32>
Thus, the contents of the scope determine the type output of a particular literal. Also, we avoid the bikeshed of âdo we use mathematics or EE notation for complex numbers?â We can stick with i
(which num
already uses); our electrical engineer brethren can just use num::complex::i as j;
! At the end of the day, this boils down the the usual use
name-clash problems and solutions weâre all used to.
Drawbacks
Custom literals open the door to write-only code. For example, introducing literals that have meaning unique to a crate will confuse readers. Luckilly, the fact that literals need to be imported by name (or by glob) makes it somewhat easier to track them down.
I argue that compiler complexity is not a drawback: it actually makes a particular parser rule somewhat simpler, since it no longer has to care about a list of primitive types, and can let the type system deal with it. The only other compiler addition is six lang items and a simple desugaring rule.
Rationale and alternatives
I think this is the best design because it makes use of the trait system. Not only are literals first class types, but itâs also possible to write them as trait bounds. The import and shadowing story are both already part of Rust and thus familiar to both current users and new users learning about advanced operators.
We could define the traits instead as
trait IntLit {
const fn int_lit(lit: i__) -> Self { .. }
}
This would go a long way to make custom literals less confusing (and still works with 0i32
and friends!). However, unless we want to write type aliases like type i = Complex;
or define a type that
coerces via addition into Complex
, we lose out on a large class of useful literals.
The alternative proposed design (which this RFC started as a counter proposal to) is by attribute. For (an abridged) example:
#[int_literal("s", "ms", "ns")]
fn time_lits(lit: i64, suffix: &'static str) -> Duration { .. }
This is a problematic proposal for a few reasons:
- Attributes do not appear in documentation, which makes them hard to document
and not discoverable (unlike
std::ops
). - Attributes are pretty magical, so we need bespoke importing, shadowing, and naming rules, and this canât be used as a type constraint.
- Using strings instead of identifiers invites use of non-identifiers, which will make parsing more difficult and code generally more confusing.
- It canât leverage existing
use
syntax for renames without a lot of magic name generation.
Another alternative is a macro approach, either making the literal call a procedural macro (which is overkill for most uses) or a postfix macro.
We could also just not do this at all and rely on extension traits:
trait DurationLit {
fn s(self) -> Duration;
fn ms(self) -> Duration;
fn ns(self) -> Duration;
}
impl DurationLit for u64 {
// ..
}
let _ = 34.ms() + 65.ns();
I think that custom literals, if specified carefully, can make this code slightly more natural (Iâm also slightly opposed to doing it like this anyways, since extension traits are a bit [eyebrow raise] in my opinion).
Prior art
The sole language with custom literals that I know of is C++. They are defined as follows:
Duration operator ""_ms(uint64_t lit) { .. }
C++ does not seem to have a good import story for custom literals, or a shadowing story.
C++ also requires user-defined custom literals to start with an underscore, which is to avoid parsing ambiguity. The STL is, however, allowed to define things like
constexpr complex<double> operator ""i(long double arg) { .. }
This is not good for Rust for two reasons:
- Rust does not have this parsing issue.
- Rust allows undescores to appear anywhere in numeric literals, which
mostly defeats this STL/user code distinction. See
0xdeadbeef_u64
. - For readability purposes, this is about as useful as Hungarian notation.
At the point that custom literals are in play, it is clear which are
std
âs: the ones that are named after primitive types (though these may themselves be shadowed, since the primitive typesâ names are reserved).
Unresolved questions
It is not clear what the type of lit
should be for IntLit
and FloatLit
.
See my literal types proposal. It is tempting to use u128/i128
and f64
in place of i__
and f__
, but it leaves things like a big integer type,
that can consume arbitrary-size literals, a bit in the dark.
We need a name for the scientific notation parse lint.
The enum ms {}
idiom emits a lint; I donât think we should encourage this.
It would be nice to be able to write something like
impl IntLit for newtype ms { .. }
to indicate that ms
is a single-use uninhabited. We could also
special-case the linter to ignore non_camel_case_types
if the sole use
of the type is a FooLit
trait, but this seems too baroque.
It is not clear if the following should be valid syntax for invoking a literal:
let _ = 42 ms;
I donât think the grammar minds, and it would probably make things simpler. Am I correct in this assumption? Should we allow it and warn by default?
This RFC explicitly does not consider the following issues:
- Literals that are not Rust identifiers. E.g.
”s
andm/s**2
. - Macros like
42.si![m]
. - Generic literals. In principle I could imagine
12foo::<T>
, but for now we should probably not allow things likeimpl<T> IntLit for ...
(Iâd suggest a warning and a âmay be added in the futureâ).