Pre-RFC: Custom suffixes for integer and float literals

I find it unusual that we would use a function, but then have conflict detection between these other abstract things that are only listed in an annotation. It seems that the names we import should be the suffixes themselves, and that they should live in a new namespace (accompanying the existing namespaces of macros, modules, and values) and would be used.

This might appear to make them more annoying to import, but I think that in my experience it is not common to require working with multiple different units of the same dimension; e.g. I can’t think of situations where I have needed both cm and m. The only real counterexample I can think of is for units of time, for which non-scientific code may have reasons to specify durations of a variety of magnitudes.

2 Likes

Custom suffix is definitely not just a feature wanted by dimensioned types, for instance it was wanted by fixed point types and primitives of other bitness (f16, f80, u256, u31).

Expanding a bit, custom suffix can also be found outside of integer or floating point, e.g. strings ("foo"s), chars ('c'_ascii), bytes (b'@'u16), and byte strings (b"foo\xff"_big_endian).

These literals may also be needed as a pattern, not just expressions:

match some_256bit_number {
    0u256 => None,
    c => Some(c),
}

IMO a literal suffix should be a procedural macro instead of a const fn, so interpolated strings ("log {date:?}: {msg}"_fmt) can be supported.

use proc_macro::Span;

#[proc_macro_int_lit_suffix]
pub fn km(literal: &str, span: Span) -> TokenTree {
    let value = literal.parse::<u128>().unwrap();
    let value = Literal::f64_unsuffixed_with_span((value * 1000) as f64, span);
    quote!{ mylib::si::Meter(#value) }
}
2 Likes

I have two problems with this design. The ad-hoc attribute should be replaced by a reasonable trait. The best we can get without a coherence disaster is

impl FloatLit for Meter {
    type SUFFIX = "m";
    const fn float_lit(lit: f?) -> Self { .. }
}

What f? is is my second problem. I detest the fact that C++ uses a builtin, fixed-width type, which is nonsense when you realize your literal constructor should be const anyways. I've actually already proposed the necessary types for this, [Pre-RFC] Integer/Float literal types, for this exact purpose.

The other problem is that that you'll get a coherence disaster anyways if several such types make their way into scope. An alternative definition which sidesteps this is

enum m {}
impl FloatLit for m {
    type Output = Meter;
    // ..
}

This ensures that literals have an identifier name and have a path; their path is that of the type that provides the name. Plus, you get to do silly things like

impl FloatInt for Meter {
    type Output = Self;
    // ..
}

Now, to import the literal you simply import my_units::meter::m, and name collision will take care of the rest. This even interacts nicely with the vanilla literals! Though this seems like a bit of a syntactical foot gun, since people will expect to implement FloatInt on the output type rather than the symbol type.

Finally, I think we should avoid table the brackets syntax for now, since it goes quite afield; far more than the proposal should be allowing at this point in time.

Can both features just exist alongside each other? I think most uses of a custom literal are better suited as const fns, and I'd like to have to call the format prefix as "..."!fmt or something.

This counterproposal looks extremely complicated. I don't think that the type system should be responsible for resolving what looks like an arbitrary identifier embedded in a literal.

I think you misread my proposal... or I left a detail out? I think the first trait is pretty horrible and the best I could come up with. Then I came up with the second one, which makes things far less painful (after all, it's just asking the local scope for "is FooLit implemented for this type"?

In any case, anything involving custom literals should be extremely complicated, because it's an enormous readability footgun, and needs to be defined very delicately. Just saying "tack on this attribute" seems counter to how Rust defines all other operator-type things.

It should be noted that built-in support for units of measure has been proposed before. It’s probably worth taking a look there first, because this covers similar (if not the same) ground, perhaps with a stronger emphasis on syntax.

Sorry, my first response was made too hastily. I wouldn't normally think of literal suffixes as an "operator,"[^1] but I see now that this is essentially the basis of your counterproposal.

Here are some thoughts:

  • The second trait does look somewhat reasonable to live in std::ops.
  • Modules and types live in the same namespace. Normally this is not a huge issue, because modules are conventionally lowercase and types are conventionally uppercase. However, this counter-proposal invites defining types with lowercase names, which may exacerbate the issue.
  • If suffixes are "just types," it might feel frustrating if you are required to import them in order to use them. (i.e. support for paths in the syntax might be desirable; However, I feel this could lead to a complicated syntax.)

[^1] Yes, I know, C++ considers them "operators," but C++ considers all sorts of bizarre things (like casts) to be operators. I don't think there is much precedent for this in rust.

Is this a problem that ever comes up in practice? I think suffixes would typically be shorter, which is pretty rare for modules. In the worst, worst, worst case, you can import suffixes with use as or 123qualified::path.

Apparently this this disappeared from a previous draft, but yes, I think the syntax we want is

Literal := (IntLit | FloatLit | StrLit | CharLit | BStrLit | BCharLit) Path?

I also imagine that any library making extensive use of suffixes would have a prelude, which I think is a wide-spread enough practice that we needn't worry about prelude proliferation.

I actually think that C++'s use of the term "operator" is reasonable, outside of maybe casts (which Rust will never have, because of Into). ULL (for converting to unsigned long long) is definitely a suffix operator. I think this is all taste, though; my main goal is preventing the addition of more ad-hoc attributes, which are way too magical.

I've considered introducing a separate category for units, but in the end decided against it. For me it looks like much bigger and impactful change, e.g. you'll have to add "unit aliases" and describe how they will interact with usual types and values. Plus your proposal will not allow us to use and create convenience values like m/s^2 or use greek letters, at least in the near future. (yeah, there are purists who prefer µm to um)

Meanwhile the proposed approach introduces a relatively small addition to the language, everything else is handled by existing developments. And it does not pollute namespace with short names.

If language team is ready to invest into development of a more complicated system, then you proposal could make sense.

Yes, and the proposal handles such cases as well, as you can define function which will return such types. (though it could be abit problematic to write generic uN type) BTW I've mentioned complex integers and quaternions in the proposal.

And this approach can be straightforwardly extended easily to work with string suffixes and prefixes.

Hm, a good point.

I didn't want to use procedural macros because they can result in an arbitrary code, while const fn requires return of the single type value. Plus I though that in perspective const fns should be able to handle string parsing, no? Can't literal attribute be applied for both functions and macros? Though it will make feature a more complex...

My initial thoughts were about using traits as well, but in the end I don't think it's the right tool for the problem, as it makes feature significantly more complex without much benefit. And use of FloatLit and co types (if we'll get them) can be added to the functional approach without any problems.

From a readability perspective, I think this is incorrect. Ad-hoc literals are very magical, and make the importing story a disaster (like in C++). A feature which requires you to be aware of a attribute is a bad idea, since attributes are often poorly documented (what the hell does #[fundamental] do, right? no docs for that!). Traits have accessible documentation and fit into the rest of how Rust works. It also avoids the absolute coherence nightmare that we would have otherwise... what happens when two literals with the same name get imported?

I will exhort you to consider that this is a bad idea. Custom literals are already a readability nightmare (most C++ style guides ban them outright because of overuse), and the syntactical contortions required for such convenience should be avoided. If your application really really really needs this, you can come up with a short identifier, like m_per_s2. It is not the Rust way to allow what looks like operator voodoo.

Most people will not want to use your library if it requires typing things outside of ASCII. Non-ascii in any sort of identifier is an invitation to unreadability.

1 Like

How does it make it a disaster if you'll have an explicit use std::time::duration_literals;? Yes, we'll need guideline that function which define literals should e.g. end with _literals. IMO it's much better than having a separate trait or enum for each literal. Usually you'll have only a bunch of literals in your scope, so finding origin of the literal shouldn't be a problem. And "magical" property is solvable with documentation, be it in the std or in the crate which defines units.

I am not sure if you have scientific background, but personally for me 9.8m_per_s2 is readability disaster, while 9.8[m/s^2] is a very readable, familiar and shorter syntax. And note, that in my proposal you can use both.

Please, let people decide what they will want or not to use. Don't make decision for them.

3 Likes

With advancement of const fn capabilities custom literals could be used for constants definitions.

But so could standard const fn functions on value types.

Arguably this approach is [...] less flexible, and more surprising for new Rusteceans.

Maybe coming from C++. But for many other languages, including Python, Javascript, etc., calling methods on primitive types is perfectly normal. Because similar to Rust, there is no inherent difference between a primitive type and any other type. I would argue this makes teaching easier as there is no special case here but rather an application showing generalization.

And towards noisy, one might say that additional into or similar calls are extremely noisy, but in that example everything has a clear use to me.

10.us();

So I take a value of integer type 10, I call a function on it ., it's called us (which I look up in my reference, or grep for), and it does not take any arguments. Whereas in the design, I have something of unknown type (with possibly different type deduction rules?) calling a function whose name does not match anything stated there.

2 Likes

What if I don't want to import the entire suite of literals because I only need one, and the rest clash with literals from a different library? In your proposal I can't just write std::time::duration_literals::s, because suffixes might not be identifiers.

My background is topology, specifically computational homotopy theory, where we use similar notation. However, I am a strong believer that in non-typeset communication, notation should be kept minimal. In fact, a lot of my research involves writing computer code, and I prefer to not have to deal with operators when I could be dealing with named functions.

That said, I think people's backgrounds are irrelevant when discussing the merits of readability, a crucial feature of any programming language. Remember that code is read more often than it is written, so the language should try to make code readable for the average programmer, not someone with domain-specific knowledge.

Also, having more than one way to do the exact same thing should be avoided and never explicitly introduced. This results in dialectification of the language.

I'm not making the decision for other people, I'm pointing out that having to type characters not on one's keyboard (which is essentially anyone who isn't using a fancy Symbolics keyboard or similar chording arrangements) hurts productivity. I say this after working with Scala for years, where this problem is endemic. Simply allowing such identifiers is an invitation to use them exclusively, which will result in further dialectification.

1 Like

Seems much too complicated to me. How about that:

let a = si!{3m/s^2};
4 Likes

Since the parallel is drawn to C++, I’d like to attract attention on the fact that to avoid conflict between standard literal suffixes and user-defined literal suffixes, the C++ Standard mandates that user-defined literal suffixes must begin by _.

Since similar issues could occur in Rust, it might be worth applying the same rule:

use std::time::duration_literals;
let dt1: Duration = 1_s + 200_ms; // or just 1.2s, see further
let dt2: Duration = 10_us;
let dt3: Duration = 10.2_µs; // µs and us are equivalent, proposal can support both

use simple_units::literals::*;
let distance1: Meter = 1_m;
let distance2: Meter = 5_nm;
let accel1: MeterPerSecond2 = 2.5_g;
let accel2: MeterPerSecond2 = 3[m/s^2]; // see further on how square brackets work

So, the difference really is 1_s vs 1.s().


Disclaimer: the following is a shoot in the dark; it is unclear to me whether the proposed syntax could conceivably cause ambiguities. I’d expect not, since methods are available on primitives and data-members are available on expressions, but I have not proved it.

The noisy () could potentially be reduced by considering the addition of Properties to the language. Specifically, by making it so that 1.s evaluates to 1.s() when s is a property-able method.

This would let us rewrite the example above as:

use std::time::TimeProperties;

let dt1: Duration = 1.s + 200.ms; // or just 1.2s, see further
let dt2: Duration = 10.us;
let dt3: Duration = 10.2.µs; // µs and us are equivalent, proposal can support both

use si::DistanceProperties;

let distance1: Meter = 1.m;
let distance2: Meter = 5.nm;
let accel1: MeterPerSecond2 = 2.5.g;
let accel2: MeterPerSecond2 = 3.m / (1.s * 1.s); // not the greatest, I'll admit.

This does not support compound units, however should be familiar to most users:

  • It uses method calls on primitives, which is widely available, either through monkey-patching or method extensions.
  • It uses properties, which is widely available.

I would like to voice support for the leading underscore _ in any such custom suffix to a literal. The SI standard mandates that a space separate a quantity from the corresponding SI unit. The _ provides an equivalent to the mandated space while still lexically binding the suffix to the preceding numeric quantity. I find 5_ns completely acceptable, whereas for me 5ns is a typo rather than an SI-based measure.

Disclosure: I’ve personally edited over 15k pages of IEC standards, so I am more sensitive to this issue than most people (though not more so than the central office editors who work for ISO, IEC, ECMA, CCITT, etc).

Edits: For those in the States who find references to multinational standards organizations not so compelling, I’ve also edited over 3k pages of IEEE and ISA standards, including the first edition of IEEE 802.11, the WiFi standard. Their central office editors impose the same requirements.

3 Likes

If we are content with a macro-based solution, then post-fix macros would make this slightly nicer perhaps:

let a = 3.si![m/s^2];
4 Likes

@iliekturtles That syntax appears to expand naturally to annotating compile-time and run-time expressions with units.

I like the _ prefix as well, but would like to note that it can’t serve to distinguish builtin/user suffixes, since 42_u16 is already valid Rust.

C++ can do it this way because their literal grouping character is the apostrophe, if I’m not mistaken.

3 Likes

Of course it can't be used to make that distinction. I always write long u32, u64, and u128 literals that way, as a _-separated suffix. (E.g., 0x_dead_beef_u32.) To me the advantage of requiring the _ prefix on non-standard suffixes is that it calls reader attention to the suffix. That's particularly true when the suffix starts with a confusable character such as lower-case l.