Pre-RFC: Custom suffixes for integer and float literals

Unfortunately, leading-underscore identifiers are already allowed, so this produces churn. Plus, separating namespaces by charset (e.g. Haskell, where types must be capitalized and variables (term and type) are lower case) is unprecedented in Rust, and maybe now is not a great time to start. (Ok, we sort of do this with macros, but the ! is a sigil, not part of the identifier. panic!() lexes as "panic", "!", "(", ")".)

If we want any reasonable import story for literals, they'll probably need to live in either the type or free-function-and-constant namespace (the OP proposal probably should put them in the free-function namespace, while mine dumps them into the type namespace). Creating yet another namespace just for a comparatively small feature is more baroque than we really need.

Indeed; I don't think that there is any hope to be able to pull what are ultimately typesetting standards into a plaintext format! Source code is not a document for publication, in any traditional sense.

1 Like

Of course typesetting standards don’t apply directly. However the expectations of professionals who use SI units in their daily work should be part of the considerations. Do recall that code is read many more times than it is written, so intelligibility to the reader should be a significant factor.

Just directly concatenating a unit with a literal value, using as the rationale that C++ did so, is not compelling; if it were Rust would have a lot less divergence from C++. Or perhaps I’ve misunderstood the intent of Rust to be a language usable by people of science who are not professional programmers.

I'm not sure that this is an issue? If a project really believes in adhering to the standard, they can always use 123_unit form and enforce it by linter or code review? It is already the case that this form is the predominant one, and Rust does not attempt to enforce any sort of style, beyond warnings that can be turned off.

I also would like to be able to write let z = 1 + 2i; for complex numbers. Complex numbers are defined in the num crate, which is not privileged like core, alloc, or std.

This is not my rationale, and, in fact, C++ requires the leading underscore for anything defined with operator "" outside of std. My rationale is that current Rust behavior does not require the underscore for existing builtin literals, and this behavior should remain uniform across all literals, discarding the notion of "builtin literals" entirely.

4 Likes

I actually just thought of a really, really annoying problem, which deserves mention.

How does 10e100 lex? Clearly, it should just lex as a single float literal, since the alternative is vanishingly rare in comparison, but… what the following code do?

enum e100 {}
impl FloatLit for e100 { .. }

I think we should just lint against this, since in principle you could write

<e100 as FloatLit>::float_lit(10);

and even more technically 10e0e100 would produce the expected result. We should still warn though, since this is just going to confuse people.

3 Likes

Agreed about the linting. I’m also concerned about using ASCII or Unicode confusables as the initial character of suffixes that are not underscore-separated from the preceding Rust literal. My earlier example of that was lower-case l. Bad actors will enter the Rust ecosystem; I’d like to be proactive against them by linting against direct juxtaposition of confusables.

Aside from specific cases with deep traditional use, such as complex numbers (suffix i) and quaternions (suffixes i, j, and k), which latter are also used to represent rotations in 3-space, I’d like to lint against any suffix that is not separated from the preceding literal string by an underscore (_). Omitting the underscore saves typing one character, at the expense of readability.

Honestly, this all sounds like more pain in the neck than it gains “ergonomics”. How about we just don’t add more magic literal suffix syntax? I would hate to read code that relied on them apart from the (obvious) ones that we have today.

Even with such simple things as complex numbers and quaternions, conventions vary, so we couldn’t be sure if 2 - 3j means 2 - 3i the complex number (because i and j are used interchangeably, e.g. physicists and electrical engineers tend to prefer j, while mathematicians usually use i), or it is the quaternion 2 + 0i - 3j + 0k.

I still think reading ergonomics is an overwhelmingly more important question than writing, and Complex { re: 2, im: -3 } and Meter::new(1.0) or Length::new::<Meter>(1.0) seems so much easier to digest at first glance than random suffixes.

Basically, this boils down to the same argument as the question of why we don’t program in plain English but in English-like, more structured artificial languages. It’s nice to have some degree of similarity, but it has to stop at the point where it becomes more ambiguous than understandable. And I think the proliferation of arbitrary suffixes crosses that line.

3 Likes

Constructing things the long way is fine if you only do it occasionally, but from personal, recent experience: the more you have to do it, the more tempted you are to just start taking shortcuts. Conveniences that encourage stricter type safety can be valuable, even if they sacrifice some explicitness.

As an aside: succinctness and explicitness are both aspects of ergonomics. Meter::new(1.0) semantically adds nothing over (say) 1.0{m}, provided the reader is aware of what that syntax means, but it does take longer to visually parse and consumes more space, making more complex expressions harder to read.

5 Likes

Thank you everyone for your response and critique!

I’ll wait a bit more for this discussion to unfold further and at the end of this week will try to write “take 2” for this proposal. Currently my thoughts are:

  • We need this feature, Duration::from_secs(2) + Duration::from_millis(200) is annoying both to write and read.
  • We need clear imports for custom literals.
  • But we probably don’t want to create a separate namespace for suffixes.
  • Feature should be extensible to string literals prefixes/suffixes.
  • Custom literals should be usable in match statements and ideally with runtime values.
  • Regarding 1s vs 1_s I think feature should support both, maybe with a recommendation to use 1_s in guidelines (excluding exceptions like complex numbers and quaternions)
  • For compound units (e.g. “m/s^2”) looks like the best approach will be to use suffixes which accept additional arguments, something like 2.3_si[m/s^2] (I am not sure if we need an explicit ! here, see further). Here you’ll import custom si suffix which will be able to support various units.

So here my rough thoughts: I think solution is to use macros for custom literals. Reasons are:

  • they will not conflict with variable names and naming conventions
  • macro output can be used in match statements without any problems
  • procmacros will allow more flexibility in future

In other words 1s will be desugared into s!(1) and 2.3_si[m/s^2] to si!(2.3, "m/s^2"). Because in this design custom literals can only be defined by a macro we can omit !. If user will wants to use those suffixes on runtime values, he can write: s!(var) or with postfix macros var.s!(). Macros which define custom suffixes should be marked with #[cusom_suffix] attribute. Though we will need a proper macros import system.

Unresolved questions:

  • Details on how macros will look internally. Should they accept expr or something like int_literal/float_literal?
  • Do we want a generic way to define set of sufixes without code duplication? (e.g. u1-u256)
3 Likes

Excuse me, but for me, it makes more sense and it's the literal suffix that's taking more time and effort to parse visually.

In addition, the question of ambiguity is a real problem even from the compiler's point of view.

The second part of this sentence is true but does not imply the first. See Time units by newpavlov · Pull Request #52556 · rust-lang/rust · GitHub for a much more lightweight proposal (already mentioned in this thread) that makes your example look pretty reasonable already: 2*S + 200*MS.

So, you'll need a much stronger motivation to argue that going from there to 2s + 200ms is worth all this machinery. (I am not saying I am against it, I am just saying your argument is not making a fair comparison.)

3 Likes

Bikeshedding: I don’t think ^ should be used for “exponent”, since it already means “XOR”. If you’re going to introduce an exponentiation operator, that should probably be conditional on at least reserving the same symbol in the language proper.

Exactly – we already have operator overloading, and multiplication is perfectly fine to express the notion of… multiplication, which units of measure are. So indeed, if terseness is regarded as a value, then using a well-fitting, existing language feature (which is unambiguous for humans and doesn’t conflict with anything else in the language) is a far superior solution than cramming in a new, potentially very confusing and ambiguous one.

You can note that this PR was written by me. :wink: And it's listed in the alternatives. You can see origins of this proposal in the PR thread. Though note that scope of this proposal is a bit larger than can be covered by constants.

It's just an example, ^ is not used as an operator there, but as part of the string "m/s^2". We probably could use 1_si["m/s^2"] to make it more explicit.

Likewise mathematics notations meant for humans are short, somewhat ambiguous, quick to write, and use visual metaphors.

oops okay :slight_smile:

Absolutely. But I think the comparison should be made with the most ergonomic thing that's possible without the proposal, not with whatever happens to be the status quo. That's all :slight_smile:

In my view, a macro based solution is not very optimal from the perspective of units (putting aside the complexity of other approaches). If you write say expr.si![unit] where expr is some expression and unit is some unit format the macro understands, then it is not particularly extensible to other forms of unit. This means that all units must be defined by the macro up front and thus you can’t compose units other than by using different macros.

At this stage, I think syntax is secondary. You first need to figure out the story around unification at the type level. As @varkor previously noted, this topic was further discussed over at Any RFC for Units of Measure?.

2 Likes

I certain that the most efficient and Rust-like way to side-step this issue is to

  • Associate each suffix with a type (possibly an uninhabited like enum m {})
  • Implement a trait on this “symbol type”.

This sidesteps all of the weird ad-hoc problems related to using attributes and introducing new entities and introducing substantial grammar changes to the language, and gives an object that (in a given scope) refers to a suffix uniquely.

I think at this point I’m going to just write up my own Pre-RFC with my counterproposal in its own thread, since my idea is too broken up across replies to be understandable.

5 Likes

I’ve collected my various replies about a trait-based counter proposal here. Let me know what y’all think.

I feel like I'm missing something. What is your use case that you frequently need to build up a bunch of Durations with very specific lengths of time? Also, why couldn't you just write Duration::from_millis(2200)?

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.