Scientific Notation for Integers

The project I am working on frequently converts between different units of integers. It is very common to write

pub const DURATION_1MIN: i64 = 60 * 1_000_000_000;
let timestamp_ns = timestamp_s * 1_000_000_000;
let fract = (d.fract() * Decimal::new(1000000000000000000, 0)).round();

The underscore works well for up to millions (7_000_000) in terms of readability. It does not help reading/writing a literal longer than this (1_000_000_000_000_000). This is error-prone and hard to code-review.

However the compiler does not allow the scientific notation for integers:

let killo: i64 = peta * 1e12

I am reluctant to write an explicit cast

let a:i64 = 1e15 as i64 // is this safe?

because I am not sure it might be subject to floating errors like 0.999..e15 and converted to 999...i64.

Even more useful notation is let a: i64 = 123.45e15. This is still a valid whole number as long as the number of fractional digits before 'e' is not greater than 15.

// integer literal for various types
let a: i64 = 10;
let a: i32 = 10;

// scientific notation for various types
let a: f64 = 123.45e15;
let a: i64 = 123.45e15;
let a: i64 = 123000e-3;
let a: i64 = 123.456e2; // compile error

const TIME_LIMIT_NS: i64 = 1.5e9; // 1.5 seconds
const TIME_LIMIT_NS: i64 = 1_500_000_000; // 1.5 seconds

Something like this works for removing a very long string of zeros.

const PETA: i64 = 10i64.pow(15);
2 Likes

f64 can represent exactly all integers up to 2^53, so eg. 1e15 as i64 is correct. However, 1e16 no longer has an exact representation, so relying on as conversion is rather fragile.

3 Likes

I like this idea very much. Doesn't look hard to spec as it's only syntax, plus it can easily be prototyped as a macro.

To avoid ambiguity, perhaps we could support this by allowing integer type suffixes on scientific notation: 1e9u64. Right now, scientific notation only allows a suffix of f32 or f64. We could start allowing integer suffixes as well, and then enforce at compile time that the number expressed via the scientific notation is a whole integer (so 1.5e3u32 would work but 1.234567e3u32 would not) and fits in the specified type (so 1e6u32 would work but 1e6u8 would not).

18 Likes

maybe we could also add scientific notation for hexadecimal, octal, and binary literals:

const A: u64 = 0x12p5u64; // the same as 0x12_00000u64
const B: u64 = 0o34p5u64; // the same as 0o34_00000u64
const C: u64 = 0b101p5u64; // the same as 0b101_00000u64

It uses p instead of e for the exponent-part symbol because e is a hex digit and because p is used for nearly the same purpose in hexadecimal float syntax for C/C++ (icr if Rust supports hex floats).

3 Likes

Note that allowing this would be an inference-breaking change: if a trait is implemented for i64 and f32, today foo(123.45e15) will treat it as f32 because that's a floating-point literal, whereas if it could be either then it'd be an ambiguity error.

(But as Josh says we could consider it an integer literal only if suffixed.)


Obligatory comment: these examples of "frequently convert[ing] between different units" sound to me like the best solution would be to have units of measure types (let x = Femtoseconds::from(Seconds(1));) instead of having these large constants in many places...

While I think there may be value in supporting this for literals in different bases, I think that may potentially be on the other side of the complexity tradeoff. 1e9u32 already has two internal letters, and when you add a numeric base prefix, you end up with three internal letters. The result doesn't feel especially readable, even with some _s thrown in, leaving aside the need to avoid the familiar e.

On top of that, I personally think that binary, hex, and octal literals look more clear written with shifts than with exponents. For instance, 0b101 << 15.

I would propose that we start by just considering the decimal case.

3 Likes

You don't even need UOM for this, just declare scale constants:

const K: i64 = 1000;
const M: i64 = 1000000;
const G: i64 = 1000000000;

pub const DURATION_1MIN: i64 = 60 * G;
1 Like

While this option does seem appealing, it will still be suboptimal if one would instead prefer to leave the precise integer type up to inference.

Today, existing code can assume that scientific notation always indicates a floating-point type. Requiring a suffix for integers in scientific notation would preserve that property.

I think I'd like to hear more about the places that need both power-of-10 scientific notation and integers. When I see scientific notation I think sigfigs and relative error, where floats work great. Sure, NA is exactly 602214076000000000000000, but I can't imagine any situation where computing with that in an i128 is better than an f64. File sizes perhaps, but then I'm just as likely to want 1 << 30, for which power-of-10 scientific notation doesn't help at all.

Spitballing: the talk of suffixes makes me ponder a slight abuse of SI to allow things like 1M == 1000 * 1000 or 1Ki == 1024. (I think these could be supported for integers and floats too, with 1.0G == 1.0e9.)

1 Like

(Nit picking alert:) While 1.000_000_000_000_000_1e16 (1016+1) is not representable exactly as f64, 1e16 does have an exact representation in f64, as it is 216×516, and the 216 part is handled completely by the exponent. Since 516 is exactly representable, so is 1016.

4 Likes

Then we could go all the way and add user-defined literals like in C++. Maybe as syntax sugar for macro invocation.