Scientific notation when formatting floating point numbers

Hello,

I strongly feel Rust should use scientific notation to print floating point numbers that would otherwise have an excessive number of digits.

For example this:

fn main() {
    println!("{}", 1e300);
    println!("{}", 1e-300);
}

I strongly feel it should print:

1e300
1e-300

Instead, it prints:

1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001

The many zeros make it impossible to read the number. It prevents printing numbers in tabular form in a readable way. It makes it impossible to reserve a reasonable amount of space for a number if we want to guarantee that it will fit.

In science, it is very common to store data as tab-separated or comma-separated text files, and having these long numbers makes such files unreadable.

(Almost?) any language I know supports printing numbers in scientific notation, usually in the standard library and usually by default. It really bothers me that Rust does not do this.

My suggestion would be: use scientific notation by default whenever it produces a shorter printout.

Thoughts? Thank you for your consideration!

Best, Oliver

7 Likes

I think that Rust's floating point printing has a rule that, whenever two numbers would result in the same floating point representation (because they are within one ULP from each other), the shorter string gets printed. I think it's very reasonable to use that rule but taking into account exponential notation too.

And I'd like to point out that ryu, which is state of art, already uses scientific notation whenever it's shorter. The stdlib could just switch to that for printing floats.

In meantime you could just use ryu in your own programs.

10 Likes

This is known as "E notation," it is common in calculators with limited character sets and was popularized by the languages C and Fortran.

I don't think E notation is a great default because it is not universally recognized by the standard libraries of all major programming languages. Suppose I write a program foo (in Rust) which prints floats to stdout, and my colleague (unfamiliar with Rust) writes a program bar which parses floats from stdin. If their language does not have built-in for support parsing numbers of the form 1e3 then my colleague needs to bring in a dependency or write their own floating point parser.

Floats in Rust can be written in e notation (let x: f32 = 1e10;), and in most other programming languages as well. Including not only writing floats in the language itself but accepting them while converting from strings

For example, this works in Python:

print(float("1e10")) # prints 1e+10

Showing Python both reads and prints floats in scientific notation.

Actually is there any programming language that can't read floats in scientific notation? C as another example can read 1e10 just fine with scanf("%f", ..), even though, like Rust, when printing with printf("%f", ..) it prints without scientific notation (10000000000.000000).

{:e} prints a float in scientific notation: LowerExp in std::fmt - Rust

11 Likes

Problem is that Rust lacks the equivalent of %g which uses whichever format makes more sense (for some value of making sense). There was an RFC but it was closed by the author when Debug for floats was changed to behave like %g.

And using {:?} is indeed a solution, but I think the e notation is ubiquitous enough that it should be available outside debug output, at least on an opt-in basis. This issue relates to the more general question of what exactly Display wants to be. User-facing output, sure, but "user" can range from completely nontechnical to extremely proficient to even another program or computer.

13 Likes

I'm curious which major programming language does not support parsing scientific notation?

Floating point numbers are mostly used in science, engineering and numerical math, where scientific notation is the standard.

I can't think at the moment of other good uses of floating point numbers. In particular, I would distrust any use of floating point numbers to calculate bank account balances, taxes, pay statements, or election results.

In other words, if someone would tell me that something supports floating point numbers but not scientific notation, I would say, no that's not complete support for floating point numbers.

2 Likes

I've never initiated an RFC, but I would like to see one on this one, is it hard?

If the concern is that non-technical users might be presented with scientific notation that they don't understand:

First, I wonder if the use of floating points was a good idea to begin with. Amounts of money, for example, should not be represented by floating points.

But even if we concede that there may be legitimate cases where floating point numbers should be presented to non-technical users, it would be an absolute necessity to consider an appropriate format, for example, to decide how many relevant digits should be presented. Never should the default representation be presented to a non-technical user.

I think floating point numbers should be by default printed exactly.

Approximate values, e.g. "0.1", "1e300" are confusing to beginners who do not understand the complicated logic of "shortest decimal value that round-trips correctly when parsed with rounding".

If you want to approximate in exponential notation with exact round-tripping, you can always use "{:.20e}".

There are two ways to print 1e300 exactly:

  • full decimal value, i.e. 1000000000000000052504760255204420248704468581108159154915854115511802457988908195786371375080447864043704443832883878176942523235360430575644792184786706982848387200926575803737830233794788090059368953234970799945081119038967640880074652742780142494579258788820056842838115669472196386865459400540160
  • hexadecimal exponential notation, i.e. 0x1.7e43c8800759cp996
1 Like

I don't see that as a desirable default, because it is nearly unreadable and quite surprising to most people. I wouldn't be able to tell from the top of my head if there even is an exact base ten representation of every f32 or f64 value.

Either way, this representation suggests a precision that does not exist. Someone seeing the value for 1e300 printed this way would either falsely conclude that we have three hundred digits accuracy or they would realize that most of the digits printed have no meaning.

Anyone who uses floating point numbers needs to know (and if they don't, probably soon find out), that floating point numbers are merely approximations of the intended values.

In fact, when we see numbers that look like (I'm just making these up):

1.23450000000001 1.23449999999999

we conclude that probably 1.2345 is the intended value but the imprecision of the floating point arithmetic shows.

Which representation is picked is a detail that most people don't need to know, as long as it approximates the intended value with the expected precision.

1 Like

By this logic, wouldn't 1.0000000000000000e300 be better? 1e300 suggests there is only 1 digit of precision.

Usually, but not always.

2 Likes

I don't know if I would prefer it, but it's a reasonable option. One issue would be that the precision does not translate to an exact number of digits in base ten.

There probably are people who do not understand the approximate nature of floating point numbers. But if the goal is to avoid confusing such people, then that's impossible.

That's not what I was trying to say.

Floating point numbers are not always approximations to intended values. For example, 1.5f32 might be the exact intended value, not an approximation. Similarly 2.0f32.powi(100) might be the exact intended value.

2 Likes

The number of significant digits is specific to the input and the computations done on that input. This can not be determined only from the float itself, but needs context. If I have an instrument that measures with two significant digits vs 6, then do some (possibly approximate) computations on them and then print them, the appropriate number of significant digits will vary.

That problem sounds like it is best solved by a crate. And no, I have no idea what the default representation should be, as it depends on the context what is best.

1 Like

I think this is a good rule. Most constants like 1, 1.5, 10, 1234, 0.1 do not change, and for others like 1E6 or 1.4E-5 it is easier to read.

My comment was primarily regarding E notation as opposed to scientific notation, e.g. the difference between 2.34e5 and 2.34×10⁵.

I want to say I recall using some languages which didn't support E notation out-of-the-box, either as literals or in standard library parsing routines, maybe it was VB5 or some Pascal variant? A google search suggests that Mathematica does not recognize E notation.

Consistency is more important here, in my opinion. All numeric types use decimal representation by default and floats behaving otherwise would be inconsistent.

And floats aren't unique in having long or inconvenient representations. u128::MAX = 340282366920938463463374607431768211455 which isn't very readable (though admittedly not as bad as your examples). Should it also use scientific notation?

2 Likes

Of course not, an integer is exact and might encode information distinct from the size of the value (for example odd or even), floats cannot.

Floats are also exact, every (finite) float corresponds exactly to one real number. I'm not sure exactly what you mean by "encode information distinct from the size" or why parity is a relevant property.

One thing I realized though, once RFC 3453 is implemented, some floats will have decimal representations over 4kb in length, which is... rather excessive.

3 Likes

While a floating point number might represent the exact intended value, it rarely does, and there is no way to tell from the value alone. The only way I can think of how you would know it is the exact intended value would be some simple constant in the code where I know why it is what it is.

Even if I start off with exact values, any simple math operation may generate a value that is no longer exact, let alone application of basic functions like sqrt, sin, cos, exp, etc.

Where floating points are usually used, which is numerical math, science and engineering, we have constants like pi that cannot be represented exactly, and much data is the result of measurements with limited precision (usually much less precision than an f64 could represent).

So the only way to keep your sanity, in my opinion, would be to assume that any floating point number you see is not exactly the intended value, but at best an approximation.

Therefore, it is only useful to print so many digits.For an f64, we have 53 binary digits, which corresponds to 16-17 decimal digits.

1 Like