Scientific Notation for Integers

elbaro · April 26, 2021, 6:45am

The project I am working on frequently converts between different units of integers. It is very common to write

pub const DURATION_1MIN: i64 = 60 * 1_000_000_000;
let timestamp_ns = timestamp_s * 1_000_000_000;
let fract = (d.fract() * Decimal::new(1000000000000000000, 0)).round();

The underscore works well for up to millions (7_000_000) in terms of readability. It does not help reading/writing a literal longer than this (1_000_000_000_000_000). This is error-prone and hard to code-review.

However the compiler does not allow the scientific notation for integers:

let killo: i64 = peta * 1e12

I am reluctant to write an explicit cast

let a:i64 = 1e15 as i64 // is this safe?

because I am not sure it might be subject to floating errors like 0.999..e15 and converted to 999...i64.

Even more useful notation is let a: i64 = 123.45e15. This is still a valid whole number as long as the number of fractional digits before 'e' is not greater than 15.

// integer literal for various types
let a: i64 = 10;
let a: i32 = 10;

// scientific notation for various types
let a: f64 = 123.45e15;
let a: i64 = 123.45e15;
let a: i64 = 123000e-3;
let a: i64 = 123.456e2; // compile error

const TIME_LIMIT_NS: i64 = 1.5e9; // 1.5 seconds
const TIME_LIMIT_NS: i64 = 1_500_000_000; // 1.5 seconds

tspiteri · April 26, 2021, 7:21am

Something like this works for removing a very long string of zeros.

const PETA: i64 = 10i64.pow(15);

jdahlstrom · April 26, 2021, 8:39am

f64 can represent exactly all integers up to 2^53, so eg. 1e15 as i64 is correct. However, 1e16 no longer has an exact representation, so relying on as conversion is rather fragile.

Yoric · April 26, 2021, 11:05am

I like this idea very much. Doesn't look hard to spec as it's only syntax, plus it can easily be prototyped as a macro.

josh · April 26, 2021, 4:22pm

To avoid ambiguity, perhaps we could support this by allowing integer type suffixes on scientific notation: 1e9u64. Right now, scientific notation only allows a suffix of f32 or f64. We could start allowing integer suffixes as well, and then enforce at compile time that the number expressed via the scientific notation is a whole integer (so 1.5e3u32 would work but 1.234567e3u32 would not) and fits in the specified type (so 1e6u32 would work but 1e6u8 would not).

programmerjake · April 26, 2021, 6:58pm

maybe we could also add scientific notation for hexadecimal, octal, and binary literals:

const A: u64 = 0x12p5u64; // the same as 0x12_00000u64
const B: u64 = 0o34p5u64; // the same as 0o34_00000u64
const C: u64 = 0b101p5u64; // the same as 0b101_00000u64

It uses p instead of e for the exponent-part symbol because e is a hex digit and because p is used for nearly the same purpose in hexadecimal float syntax for C/C++ (icr if Rust supports hex floats).

scottmcm · April 26, 2021, 7:14pm

Note that allowing this would be an inference-breaking change: if a trait is implemented for i64 and f32, today foo(123.45e15) will treat it as f32 because that's a floating-point literal, whereas if it could be either then it'd be an ambiguity error.

(But as Josh says we could consider it an integer literal only if suffixed.)

Obligatory comment: these examples of "frequently convert[ing] between different units" sound to me like the best solution would be to have units of measure types (let x = Femtoseconds::from(Seconds(1));) instead of having these large constants in many places...

josh · April 26, 2021, 8:01pm

While I think there may be value in supporting this for literals in different bases, I think that may potentially be on the other side of the complexity tradeoff. 1e9u32 already has two internal letters, and when you add a numeric base prefix, you end up with three internal letters. The result doesn't feel especially readable, even with some _s thrown in, leaving aside the need to avoid the familiar e.

On top of that, I personally think that binary, hex, and octal literals look more clear written with shifts than with exponents. For instance, 0b101 << 15.

I would propose that we start by just considering the decimal case.

skysch · April 26, 2021, 9:07pm

You don't even need UOM for this, just declare scale constants:

const K: i64 = 1000;
const M: i64 = 1000000;
const G: i64 = 1000000000;

pub const DURATION_1MIN: i64 = 60 * G;

felix.s · April 27, 2021, 4:07pm

While this option does seem appealing, it will still be suboptimal if one would instead prefer to leave the precise integer type up to inference.

josh · April 27, 2021, 4:20pm

Today, existing code can assume that scientific notation always indicates a floating-point type. Requiring a suffix for integers in scientific notation would preserve that property.

scottmcm · April 27, 2021, 5:02pm

I think I'd like to hear more about the places that need both power-of-10 scientific notation and integers. When I see scientific notation I think sigfigs and relative error, where floats work great. Sure, N_A is exactly 602214076000000000000000, but I can't imagine any situation where computing with that in an i128 is better than an f64. File sizes perhaps, but then I'm just as likely to want 1 << 30, for which power-of-10 scientific notation doesn't help at all.

Spitballing: the talk of suffixes makes me ponder a slight abuse of SI to allow things like 1M == 1000 * 1000 or 1Ki == 1024. (I think these could be supported for integers and floats too, with 1.0G == 1.0e9.)

tspiteri · April 27, 2021, 7:09pm

(Nit picking alert:) While 1.000_000_000_000_000_1e16 (10¹⁶+1) is not representable exactly as f64, 1e16 does have an exact representation in f64, as it is 2¹⁶×5¹⁶, and the 2¹⁶ part is handled completely by the exponent. Since 5¹⁶ is exactly representable, so is 10¹⁶.

michalsrb · May 4, 2021, 7:24am

Then we could go all the way and add user-defined literals like in C++. Maybe as syntax sugar for macro invocation.

system · August 2, 2021, 7:25am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pre-RFC: Custom suffixes for integer and float literals language design	49	5328	March 25, 2019
Remove error "expected f64 found integral variable" (allow numeric literals without `.` where it's never ambiguous) ideas (deprecated)	26	8328	March 25, 2019
Pre-RFC: Custom Literals via Traits	51	6373	March 25, 2019
Format of the exponent in the scientific notation language design	8	15034	May 20, 2019
Idea: In the next edition, stop accepting `0.` as a valid float literal	32	4100	January 15, 2020

Scientific Notation for Integers

Related topics