Format of the exponent in the scientific notation

Ephicks · February 13, 2019, 4:23pm

I tried without great success to post this both on the user forum and on IRC. I post here since I don't think it is big enough for an RFC (maybe it is).

Context: formatting a float (f32 or f64) using the scientific notation for a human-facing output, especially for well aligned columnar data (a common use case, at least in sciences). I would like to mimick something like e.g. the F(J) column here or on the screenshot a the end of the post if the link is broken.

Current behaviour:

let v = 1.34e+2_f32;
assert_eq!("+1.34e2", format!("{:+e}", val));

Expected behaviour:

let v = 1.34e+2_f32;
assert_eq!("+1.34e+02",  format!("{:+e}", val));

Example: for a minimal table containing a single column and 2 rows (1.34e2 and -1.34e-2):

the current behavior using the format {:+e} (or {:+.2e}, ...) is

+1.34e2  
-1.34e-2

while the expected behavior would be:

+1.34e+02  
-1.34e-02

Remarks:

to keep the format expression simple, there is no way (to my knowledge) to modify the way the exponent part of the scientific notation is formatted
the current behavior is to use the smallest possible number of characters
- it is a valid choice for non-tabular data or for ASCII serializations such as CSV, JSON, ...
- (for ASCII serializations not intended to be read by a human, an equivalent to the %g would probably be more compact)
the expected behavior follows the C choice
- force the sign to be printed, use 3 (4) characters and pad with '0' knowing that the exponent range is [-38, +38] ([-308, +308]) for a f32 (f64).
- this solution is not the most compact, but it may be the best compromise to keep the format syntax complexity low and to allow well aligned columnar data

Wether you agree or not (I probably have an incomplete, biased view), I really would like someone to share is thoughts on the matter.

P.S: thank you for the great job you are doing, I really enjoy Rust

scottmcm · February 13, 2019, 6:34pm

I think this is the kind of thing where it would be helpful to see an example of the kind of thing you're trying to do, as part of the motivation section.

(Scientific notation seems like it'd hit the same peeve I have with disk size reporting that shows 9K and 1G in the same column, making comparisons nontrivial.)

Ephicks · February 15, 2019, 10:37am

Thank you for the feedback @scottmcm. I have edited the post, adding

Zarenor · February 15, 2019, 2:17pm

While I understand your qualm with different magnitude values being displayed this way, it’s extraordinarily common in the physics (and presumably other sciences) community - you just have to get used to reading the exponent before making any comparison. The reasoning behind displaying data this way is that the number of significant figures measured is important (In the given example, 3).

With that in mind, I think it may be better to have a crate to do this, rather than extending the language - I don’t know that we need this to be a compiler-standardized numerical format. Unfortunately, while there are several crates that look like they may have some functionality like this, it doesn’t appear that any of them are well-documented and recently updated, or very complete.

ExpHP · February 15, 2019, 2:41pm

Note I deliberately left concerns like this out of the {:g} proposal because the use case of user-facing output is easily serviced by third party libraries. The standard library is for features with high impact, or that can't go anywhere else.

I have many places where I output numbers in scientific notation and I don't even look at the mantissa. For numbers that vary wildly in magnitude, the exponent may be the only useful piece of information.

Ephicks · February 15, 2019, 4:35pm

First of all, thank you @Zarenor and @ExpHP for your answers
(Here after I do not use emphasis to shout, but to ease a quick reading.)

I wonder about the purpose of the format! syntax (used in println! and write!):

is it to be used for (possibly lossy) ASCII serializations?
is it to be used for user-facing outputs?
both?

In the case of lossless ASCII serializations, the precision must not be used (not to remove significant digits), and I tend to think that flags, width, ... are useless. The {:g} format would probably be the best (most compact) option.

If the first answer is the right one, I agree with you. But what is the point of using '+', width, '<', '>', ... in non-user-facing outputs?
So, unless I do not understand well, I think that one can revert the argument:
a particular ASCII serialization is probably best serviced by the third party library implementing it and the main purpose of println! is to build user-facing outputs. (Note e.g. that a JSON document containing integers or floats with a '+' sign is not valid).

In practice, I have the feeling that there is not always a clear separation between ASCII serializations and user-facing outputs. So the right answer is probably the 3rd one: format! is general purpose.

In my opinion, the first aim of an ASCII serialization is not to be as compact as possible. And the choice of C creators (as a good compromise between syntax complexity and output compactness) to
use %+03d (or {:+03}) to print the exponent of a float is not a random choice and is a choice which is still valid today.

So, I would like to know if the current choice implemented in Rust comes from a long process or is just an implementation detail.

P.S.: My personal choice would probably go for a more complex format! syntax allowing to choose the format of the exponent.

dcarosone · February 15, 2019, 10:25pm

It feels to me that format! is very much there for programmer convenience. It’s known not to be necessarily the fastest and most efficient, but it’s easy and there are debug and pretty-print-debug formats and other options. That convenience is high impact. So I think there’s a good case for a format that does what you want as a common expectation.

At the same time, offering every possible stringly knob for tweaking output is a sure path to line noise, less convenience due to more confusion, and bugs. “Engineering notation”, with the exponent always a multiple of 3, is also common enough for someone to want a similar formatting trait or option, and the list goes on.

I don’t know if changing the default format would be considered a breaking change, but I can imagine it might well be, especially if there is no way to get the old format back. So really this means more options, either way.

There was another thread a while ago with a similar discussion (maybe it was even yours?) where another possibility came up: implement the Display trait for a custom type wrapper just how you want. Either use that type throughout the code, or use it in a struct that represents your output table, with from/into the base type.

Ephicks · February 19, 2019, 9:20am

Thank you for your feedback @dcarosone.

No, it was not me. I have found several posts about the "Engineering notation" but I am not sure to have found the thread you are talking about.

Having found 4 years old posts, I do not expect short term changes.
I hope this post (together with previous/next ones) will help in a way or another.

system · May 20, 2019, 9:21am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Scientific notation when formatting floating point numbers	37	2063	August 11, 2024
Scientific Notation for Integers language design	14	5443	August 2, 2021
pre-RFC/draft: {:g}, or "floating points for humans"	14	3350	May 7, 2019
Display symmetry with number syntax libs	13	667	September 5, 2024
Format specifier for non-scientific notation? language design	10	1901	May 1, 2022

Format of the exponent in the scientific notation

Related topics