Pre-RFC: Custom suffixes for integer and float literals

This proposal originates from “time units” PR.

Motivation

To make creation of values with associated unit of measure more natural and ergonomic. Currently we have to write:

let dt1: Duration = Duration::new(1, 200*1_000_000);
let dt2: Duration = Duration::from_micros(10);
let dt3: Duration = Duration::from_micros(10) + Duration::from_nanos(200);

// Units are from simple_units crate
let distance1: Meter = Meter(1.0);
let distance2: Meter = Meter(5e-9);
let accel1: MeterPerSecond2 = MeterPerSecond2(2.5*G); // here G = 9.80665
let accel2: MeterPerSecond2 = MeterPerSecond2(3.0);

This proposal instead will allow us to write:

use std::time::duration_literals;
let dt1: Duration = 1s + 200ms; // or just 1.2s, see further
let dt2: Duration = 10us;
let dt3: Duration = 10.2µs; // µs and us are equivalent, proposal can support both

use simple_units::literals::*;
let distance1: Meter = 1m;
let distance2: Meter = 5nm;
let accel1: MeterPerSecond2 = 2.5g;
let accel2: MeterPerSecond2 = 3[m/s^2]; // see further on how square brackets work

This feature should make Rust a bit more suitable for writing scientific and engineering code, as well as more approachable for target audience from those fields.

Design

To define custom suffixes for integer and float literals crate will have to define the following functions decorated with #[float_literal(..)] or #[int_literal(..)]:

pub struct Meter(pub f64);

// unfortunately we can't make it `const fn` right now
// more variants can be supported, e.g. `meter`, `meters`, `feets`, etc.
#[float_literal("au", "km", "m", "cm", "mm", "um", "µm", "nm")]
pub fn float_meter_literals(n: f64, sfx: &str) -> Meter {
    Meter(match sfx {
        "au" => METERS_PER_AU*n,
        "km" => 1e3*n,
        "m" => n,
        "cm" => 1e-2*n,
        "mm" => 1e-3*n,
        "um" | "µm" => 1e-6*n, // note that "um"  and "µm" are equivalent
        "nm" => 1e-9*n,
        _ => unreachable!(),
    })
}

// this will allow us to write `1m` instead of `1.0m`
#[float_literal("au", "km", "m", "cm", "mm", "um", "µm", "nm")]
pub fn int_meter_literals(n: i64, sfx: &str) -> Meter {
    float_meter_literals(n as f64, sfx)
}

Functions should accept two arguments:

  • integer/float with a concrete type (floats for float_literal and integers for integer_literal)
  • suffix string

if signature is not compatible compilation error will be issued.

Now to use custom literals int_meter_literals or/and float_meter_literals should be in the current scope. Functions which define suffixes shouldn’t conflict with each other, e.g. if we use m for both minutes and meters, function which handle conversion should not be in the same scope.

When compiler encounters non-builtin suffix (i.e. u8, u16, f32, etc.), for example 1m it searches functions with #[integer_literal(..)] attribute. If it can’t find the specified suffix, “unknown suffix” compilation error will be issued. If function was found 1m gets desugared into int_meter_literals(1, "m"). Note that the usual overflowing literal check will be applied here as well.

Some units can’t be used as suffixes, e.g, m/s^2. To overcome this restriction square bracket can be used to explicitly denote that custom suffix is used, i.e. 1m and 1[m] are equivalent to each other. 5.3[m/s^2] in turn will be desugared into float_acceleration_literals(5.3, "m/s^2"). Square brackets are traditional for denoting units of measure in scientific literature. (though it’s not strictly correct from the point of view of ISO 31-0) This functionality shouldn’t conflict with indexing, as it can bit applied only to {integer} and {float} literals.

With advancement of const fn capabilities custom literals could be used for constants definitions.

In addition to units this approach can be also used for definition of complex numbers and quaternions.

Drawbacks and alternatives

The main drawback of the proposal is complication of the language, and potential ambiguity of custom literals, especially if suffixes from several sources will be mixed.

The main alternative is to use extension traits over primitive types without any syntactic sugar. In this approach trait(s) will be defined (e.g. SiTimeUnits) and implemented e.g. for u32 and f32. The example from the proposal beginning will look like this:

use std::time::SiTimeUnits;
let dt1: Duration = 1.s() + 200.ms();
let dt2: Duration = 10.us();
let dt3: Duration = 10.2.us(); // `µs` methods can be supported in future

use simple_units::literals::*;
let distance1: Meter = 1.m();
let distance2: Meter = 5.nm();
let accel1: MeterPerSecond2 = 2.5.g();
let accel2: MeterPerSecond2 = 3.m_per_s2(); // m/s^2 can not be supported

Arguably this approach is more noisy, less flexible, and more surprising for new Rusteceans. While custom suffix immediately makes it apparent that you are dealing not with primitive types, use of (trait) methods on primitive types will result in a certain amount of confusion. Considering that perceived Rust “noisiness” is already a problem for Rust adoption, proliferation of this approach in my opinion will only worsen the situation. Also if we’ll support many variants (m, meter, meters) this approach will result in a bigger amount of code duplication.

Another alternative solution is to rely on constants:

use std::time::units::*;
let dt1: Duration = 1*SECOND + 200*MILLISECOND;
let dt2: Duration = 10*MICROSECOND;
let dt3: Duration = 10*MICROSECOND + 200*NANOSECOND; // `10.2` can't be supported

use simple_units::constants::*;
let distance1: Meter = METER;
let distance2: Meter = NANOMETER;
let accel1: MeterPerSecond2 = 2.5*G; // `G` here has type `MeterPerSecond2`
let accel2: MeterPerSecond2 = 3.*METER_PER_SECOND2;

Prior art

5 Likes

To overcome this restriction square bracket can be used to explicitly denote that custom suffix is used, i.e. 1m and 1[m] are equivalent to each other ... This functionality shouldn’t conflict with indexing, as it can bit applied only to {integer} and {float} literals.

No this does conflicts with existing syntax.

use std::ops::Index;
#[allow(non_camel_case_types)]
struct m;
impl Index<m> for i32 {
    type Output = Self;
    fn index(&self, _: m) -> &Self { self }
}
fn main() {
    println!("{:?}", 1[m]);
}

Though this also means we don't need custom suffixes for this syntax at all.

7 Likes

It was assumed that square bracket syntax will be used mostly for units which will be ambigous in the suffix position, i.e. m^2 or m/s, but I guess theoretically you can have m and s variables which after division produce Range or something. Is there a serious risk of breaking someone's code if we'll forbid indexing on literals? I guess we could add square bracket extension in in the next edition. In other words you'll have to write 1i32[m] in the cases like in your example.

Can you elaborate?

You're introducing a breaking change on an obscure feature by an another obscure feature, I don't think that is worth changing. Consider a different bracket instead, or maybe just postpone this feature.


If we had the IndexMove/IndexGet trait or some sort of custom DST, we could write

impl<U: Unit> IndexMove<U> for f64 { 
    type Output = Measure<f64, U>; 
    fn index_move(self, _: U) -> Self::Output {
        Measure::new(self)
    }
}

and 3.0e8[m/s] would call 3.0e8.index_move(m.div(s)).

5 Likes

Considering existence of ISO 31-0 and familiarity of scientific community with it, I don't think it's worth to search for different brackets. As I wrote in the previous message we can introduce it in the next edition, while in Rust 2018 bracketless suffix form and warning for indexing on literals will be added.

Hm, interesting approach, and even in some sense more correct from ISO viewpoint. (but not quite, since [g] equals to m/s^2, not to 9.8 m/s^2) It will allow us to use square brackets on runtime values as well. Shame we can't use existing Index trait due to the &Output return type. Problems which I can name are:

  • Absence of IndexMove and it's not quite clear how it can be added. Plus this operation is essentially equivalent to multiplication, so I am not sure if it pulls its weight in such formulation.
  • It does not include bracketless variant, which should be enough for most of the cases. (i.e. you'll have to use brackets, whereas in the OP proposal you will not)
  • If m and s are constants, then it will conflict with naming conventions.
  • We will have to check how g, au and other units will be handled by this functionality. (i.e. factors should be always correctly propagated)
  • You will not be able to use convenience shortcuts like m/s^2 and instead you'll have to write m/(s*s) or m_per_s2.
  • It will probably result in some crazy trait bounds down the line, e.g. will you be able to explain to the type system that m/s*m is equivalent to m*m/s?

I think this approach is definitely worth considering.

Which is funny, because this is a wrong (but often seen) use of the bracket.

1 Like

I find it unusual that we would use a function, but then have conflict detection between these other abstract things that are only listed in an annotation. It seems that the names we import should be the suffixes themselves, and that they should live in a new namespace (accompanying the existing namespaces of macros, modules, and values) and would be used.

This might appear to make them more annoying to import, but I think that in my experience it is not common to require working with multiple different units of the same dimension; e.g. I can’t think of situations where I have needed both cm and m. The only real counterexample I can think of is for units of time, for which non-scientific code may have reasons to specify durations of a variety of magnitudes.

2 Likes

Custom suffix is definitely not just a feature wanted by dimensioned types, for instance it was wanted by fixed point types and primitives of other bitness (f16, f80, u256, u31).

Expanding a bit, custom suffix can also be found outside of integer or floating point, e.g. strings ("foo"s), chars ('c'_ascii), bytes (b'@'u16), and byte strings (b"foo\xff"_big_endian).

These literals may also be needed as a pattern, not just expressions:

match some_256bit_number {
    0u256 => None,
    c => Some(c),
}

IMO a literal suffix should be a procedural macro instead of a const fn, so interpolated strings ("log {date:?}: {msg}"_fmt) can be supported.

use proc_macro::Span;

#[proc_macro_int_lit_suffix]
pub fn km(literal: &str, span: Span) -> TokenTree {
    let value = literal.parse::<u128>().unwrap();
    let value = Literal::f64_unsuffixed_with_span((value * 1000) as f64, span);
    quote!{ mylib::si::Meter(#value) }
}
2 Likes

I have two problems with this design. The ad-hoc attribute should be replaced by a reasonable trait. The best we can get without a coherence disaster is

impl FloatLit for Meter {
    type SUFFIX = "m";
    const fn float_lit(lit: f?) -> Self { .. }
}

What f? is is my second problem. I detest the fact that C++ uses a builtin, fixed-width type, which is nonsense when you realize your literal constructor should be const anyways. I've actually already proposed the necessary types for this, [Pre-RFC] Integer/Float literal types, for this exact purpose.

The other problem is that that you'll get a coherence disaster anyways if several such types make their way into scope. An alternative definition which sidesteps this is

enum m {}
impl FloatLit for m {
    type Output = Meter;
    // ..
}

This ensures that literals have an identifier name and have a path; their path is that of the type that provides the name. Plus, you get to do silly things like

impl FloatInt for Meter {
    type Output = Self;
    // ..
}

Now, to import the literal you simply import my_units::meter::m, and name collision will take care of the rest. This even interacts nicely with the vanilla literals! Though this seems like a bit of a syntactical foot gun, since people will expect to implement FloatInt on the output type rather than the symbol type.

Finally, I think we should avoid table the brackets syntax for now, since it goes quite afield; far more than the proposal should be allowing at this point in time.

Can both features just exist alongside each other? I think most uses of a custom literal are better suited as const fns, and I'd like to have to call the format prefix as "..."!fmt or something.

This counterproposal looks extremely complicated. I don't think that the type system should be responsible for resolving what looks like an arbitrary identifier embedded in a literal.

I think you misread my proposal... or I left a detail out? I think the first trait is pretty horrible and the best I could come up with. Then I came up with the second one, which makes things far less painful (after all, it's just asking the local scope for "is FooLit implemented for this type"?

In any case, anything involving custom literals should be extremely complicated, because it's an enormous readability footgun, and needs to be defined very delicately. Just saying "tack on this attribute" seems counter to how Rust defines all other operator-type things.

It should be noted that built-in support for units of measure has been proposed before. It’s probably worth taking a look there first, because this covers similar (if not the same) ground, perhaps with a stronger emphasis on syntax.

Sorry, my first response was made too hastily. I wouldn't normally think of literal suffixes as an "operator,"[^1] but I see now that this is essentially the basis of your counterproposal.

Here are some thoughts:

  • The second trait does look somewhat reasonable to live in std::ops.
  • Modules and types live in the same namespace. Normally this is not a huge issue, because modules are conventionally lowercase and types are conventionally uppercase. However, this counter-proposal invites defining types with lowercase names, which may exacerbate the issue.
  • If suffixes are "just types," it might feel frustrating if you are required to import them in order to use them. (i.e. support for paths in the syntax might be desirable; However, I feel this could lead to a complicated syntax.)

[^1] Yes, I know, C++ considers them "operators," but C++ considers all sorts of bizarre things (like casts) to be operators. I don't think there is much precedent for this in rust.

Is this a problem that ever comes up in practice? I think suffixes would typically be shorter, which is pretty rare for modules. In the worst, worst, worst case, you can import suffixes with use as or 123qualified::path.

Apparently this this disappeared from a previous draft, but yes, I think the syntax we want is

Literal := (IntLit | FloatLit | StrLit | CharLit | BStrLit | BCharLit) Path?

I also imagine that any library making extensive use of suffixes would have a prelude, which I think is a wide-spread enough practice that we needn't worry about prelude proliferation.

I actually think that C++'s use of the term "operator" is reasonable, outside of maybe casts (which Rust will never have, because of Into). ULL (for converting to unsigned long long) is definitely a suffix operator. I think this is all taste, though; my main goal is preventing the addition of more ad-hoc attributes, which are way too magical.

I've considered introducing a separate category for units, but in the end decided against it. For me it looks like much bigger and impactful change, e.g. you'll have to add "unit aliases" and describe how they will interact with usual types and values. Plus your proposal will not allow us to use and create convenience values like m/s^2 or use greek letters, at least in the near future. (yeah, there are purists who prefer µm to um)

Meanwhile the proposed approach introduces a relatively small addition to the language, everything else is handled by existing developments. And it does not pollute namespace with short names.

If language team is ready to invest into development of a more complicated system, then you proposal could make sense.

Yes, and the proposal handles such cases as well, as you can define function which will return such types. (though it could be abit problematic to write generic uN type) BTW I've mentioned complex integers and quaternions in the proposal.

And this approach can be straightforwardly extended easily to work with string suffixes and prefixes.

Hm, a good point.

I didn't want to use procedural macros because they can result in an arbitrary code, while const fn requires return of the single type value. Plus I though that in perspective const fns should be able to handle string parsing, no? Can't literal attribute be applied for both functions and macros? Though it will make feature a more complex...

My initial thoughts were about using traits as well, but in the end I don't think it's the right tool for the problem, as it makes feature significantly more complex without much benefit. And use of FloatLit and co types (if we'll get them) can be added to the functional approach without any problems.

From a readability perspective, I think this is incorrect. Ad-hoc literals are very magical, and make the importing story a disaster (like in C++). A feature which requires you to be aware of a attribute is a bad idea, since attributes are often poorly documented (what the hell does #[fundamental] do, right? no docs for that!). Traits have accessible documentation and fit into the rest of how Rust works. It also avoids the absolute coherence nightmare that we would have otherwise... what happens when two literals with the same name get imported?

I will exhort you to consider that this is a bad idea. Custom literals are already a readability nightmare (most C++ style guides ban them outright because of overuse), and the syntactical contortions required for such convenience should be avoided. If your application really really really needs this, you can come up with a short identifier, like m_per_s2. It is not the Rust way to allow what looks like operator voodoo.

Most people will not want to use your library if it requires typing things outside of ASCII. Non-ascii in any sort of identifier is an invitation to unreadability.

1 Like

How does it make it a disaster if you'll have an explicit use std::time::duration_literals;? Yes, we'll need guideline that function which define literals should e.g. end with _literals. IMO it's much better than having a separate trait or enum for each literal. Usually you'll have only a bunch of literals in your scope, so finding origin of the literal shouldn't be a problem. And "magical" property is solvable with documentation, be it in the std or in the crate which defines units.

I am not sure if you have scientific background, but personally for me 9.8m_per_s2 is readability disaster, while 9.8[m/s^2] is a very readable, familiar and shorter syntax. And note, that in my proposal you can use both.

Please, let people decide what they will want or not to use. Don't make decision for them.

3 Likes

With advancement of const fn capabilities custom literals could be used for constants definitions.

But so could standard const fn functions on value types.

Arguably this approach is [...] less flexible, and more surprising for new Rusteceans.

Maybe coming from C++. But for many other languages, including Python, Javascript, etc., calling methods on primitive types is perfectly normal. Because similar to Rust, there is no inherent difference between a primitive type and any other type. I would argue this makes teaching easier as there is no special case here but rather an application showing generalization.

And towards noisy, one might say that additional into or similar calls are extremely noisy, but in that example everything has a clear use to me.

10.us();

So I take a value of integer type 10, I call a function on it ., it's called us (which I look up in my reference, or grep for), and it does not take any arguments. Whereas in the design, I have something of unknown type (with possibly different type deduction rules?) calling a function whose name does not match anything stated there.

2 Likes

What if I don't want to import the entire suite of literals because I only need one, and the rest clash with literals from a different library? In your proposal I can't just write std::time::duration_literals::s, because suffixes might not be identifiers.

My background is topology, specifically computational homotopy theory, where we use similar notation. However, I am a strong believer that in non-typeset communication, notation should be kept minimal. In fact, a lot of my research involves writing computer code, and I prefer to not have to deal with operators when I could be dealing with named functions.

That said, I think people's backgrounds are irrelevant when discussing the merits of readability, a crucial feature of any programming language. Remember that code is read more often than it is written, so the language should try to make code readable for the average programmer, not someone with domain-specific knowledge.

Also, having more than one way to do the exact same thing should be avoided and never explicitly introduced. This results in dialectification of the language.

I'm not making the decision for other people, I'm pointing out that having to type characters not on one's keyboard (which is essentially anyone who isn't using a fancy Symbolics keyboard or similar chording arrangements) hurts productivity. I say this after working with Scala for years, where this problem is endemic. Simply allowing such identifiers is an invitation to use them exclusively, which will result in further dialectification.

1 Like

Seems much too complicated to me. How about that:

let a = si!{3m/s^2};
4 Likes