 # Scientific Notation for Integers

The project I am working on frequently converts between different units of integers. It is very common to write

``````pub const DURATION_1MIN: i64 = 60 * 1_000_000_000;
let timestamp_ns = timestamp_s * 1_000_000_000;
let fract = (d.fract() * Decimal::new(1000000000000000000, 0)).round();
``````

The underscore works well for up to millions (`7_000_000`) in terms of readability. It does not help reading/writing a literal longer than this (`1_000_000_000_000_000`). This is error-prone and hard to code-review.

However the compiler does not allow the scientific notation for integers:

``````let killo: i64 = peta * 1e12
``````

I am reluctant to write an explicit cast

``````let a:i64 = 1e15 as i64 // is this safe?
``````

because I am not sure it might be subject to floating errors like `0.999..e15` and converted to `999...i64`.

Even more useful notation is `let a: i64 = 123.45e15`. This is still a valid whole number as long as the number of fractional digits before 'e' is not greater than 15.

``````// integer literal for various types
let a: i64 = 10;
let a: i32 = 10;

// scientific notation for various types
let a: f64 = 123.45e15;
let a: i64 = 123.45e15;
let a: i64 = 123000e-3;
let a: i64 = 123.456e2; // compile error

const TIME_LIMIT_NS: i64 = 1.5e9; // 1.5 seconds
const TIME_LIMIT_NS: i64 = 1_500_000_000; // 1.5 seconds
``````

Something like this works for removing a very long string of zeros.

``````const PETA: i64 = 10i64.pow(15);
``````
2 Likes

`f64` can represent exactly all integers up to 2^53, so eg. `1e15 as i64` is correct. However, `1e16` no longer has an exact representation, so relying on `as` conversion is rather fragile.

3 Likes

I like this idea very much. Doesn't look hard to spec as it's only syntax, plus it can easily be prototyped as a macro.

To avoid ambiguity, perhaps we could support this by allowing integer type suffixes on scientific notation: `1e9u64`. Right now, scientific notation only allows a suffix of `f32` or `f64`. We could start allowing integer suffixes as well, and then enforce at compile time that the number expressed via the scientific notation is a whole integer (so `1.5e3u32` would work but `1.234567e3u32` would not) and fits in the specified type (so `1e6u32` would work but `1e6u8` would not).

18 Likes

maybe we could also add scientific notation for hexadecimal, octal, and binary literals:

``````const A: u64 = 0x12p5u64; // the same as 0x12_00000u64
const B: u64 = 0o34p5u64; // the same as 0o34_00000u64
const C: u64 = 0b101p5u64; // the same as 0b101_00000u64
``````

It uses `p` instead of `e` for the exponent-part symbol because `e` is a hex digit and because `p` is used for nearly the same purpose in hexadecimal float syntax for C/C++ (icr if Rust supports hex floats).

3 Likes

Note that allowing this would be an inference-breaking change: if a trait is implemented for `i64` and `f32`, today `foo(123.45e15)` will treat it as `f32` because that's a floating-point literal, whereas if it could be either then it'd be an ambiguity error.

(But as Josh says we could consider it an integer literal only if suffixed.)

Obligatory comment: these examples of "frequently convert[ing] between different units" sound to me like the best solution would be to have units of measure types (`let x = Femtoseconds::from(Seconds(1));`) instead of having these large constants in many places...

While I think there may be value in supporting this for literals in different bases, I think that may potentially be on the other side of the complexity tradeoff. `1e9u32` already has two internal letters, and when you add a numeric base prefix, you end up with three internal letters. The result doesn't feel especially readable, even with some `_`s thrown in, leaving aside the need to avoid the familiar `e`.

On top of that, I personally think that binary, hex, and octal literals look more clear written with shifts than with exponents. For instance, `0b101 << 15`.

I would propose that we start by just considering the decimal case.

3 Likes

You don't even need UOM for this, just declare scale constants:

``````const K: i64 = 1000;
const M: i64 = 1000000;
const G: i64 = 1000000000;

pub const DURATION_1MIN: i64 = 60 * G;
``````
1 Like

While this option does seem appealing, it will still be suboptimal if one would instead prefer to leave the precise integer type up to inference.

Today, existing code can assume that scientific notation always indicates a floating-point type. Requiring a suffix for integers in scientific notation would preserve that property.

I think I'd like to hear more about the places that need both power-of-10 scientific notation and integers. When I see scientific notation I think sigfigs and relative error, where floats work great. Sure, NA is exactly 602214076000000000000000, but I can't imagine any situation where computing with that in an `i128` is better than an `f64`. File sizes perhaps, but then I'm just as likely to want `1 << 30`, for which power-of-10 scientific notation doesn't help at all.

Spitballing: the talk of suffixes makes me ponder a slight abuse of SI to allow things like `1M == 1000 * 1000` or `1Ki == 1024`. (I think these could be supported for integers and floats too, with `1.0G == 1.0e9`.)

1 Like

(Nit picking alert:) While `1.000_000_000_000_000_1e16` (1016+1) is not representable exactly as `f64`, `1e16` does have an exact representation in `f64`, as it is 216×516, and the 216 part is handled completely by the exponent. Since 516 is exactly representable, so is 1016.

4 Likes

Then we could go all the way and add user-defined literals like in C++. Maybe as syntax sugar for macro invocation.