NOTICE: A final version of this RFC was posted here: General floating point formatting in Debug with {:g?} by ExpHP · Pull Request #2729 · rust-lang/rfcs · GitHub
So I've been working on the following RFC, but am starting to run out of steam (and vacation time!) and am not sure if I can finish it. This is a pretty lousy draft, but it's a feature I really want to see in Rust eventually. Hopefully I can get some feedback given what's there, or this can help spark discussion.
- Feature Name: float_gen_fmt
- Start Date: (fill me in with today's date, YYYY-MM-DD)
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)
Summary
Add :g
and :G
formatting specifiers for floating point numbers, backed by two new traits std::fmt::{LowerGen,UpperGen}
. These formats dynamically switch between fixed-point formatting and the exponential formats :e
and :E
based on the magnitude of a value.
The long-term plan is to allow this flag to be combined with :?
in a manner like #2226 for recursive formatting of floating point numbers in larger structs---however, this is not included in the current proposal, and is only mentioned for its relevance to the motivation.
Motivation
Rust currently has two ways to format floating point numbers:
- Simple (through
Debug
andDisplay
) - Exponential (through
LowerExp
andUpperExp
)
Either of these additionally support a mode of "round-trip precision," when no precision (.prec
) is provided in the format specifier. However, neither of these two formats are suitable for human-oriented interfaces in contexts where numbers may be of arbitrary magnitude.
The simple formatting scheme can sometimes force the reader to play a game of "count the zeros":
assert_eq!(
format!("{:?}", std::f64::MAX),
"179769313486231570000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.0",
);
This can frequently be an issue with values like 1e-10
, which may often show up as fuzz factors and tolerances in floating point computations. The only solution offered by the standard library is exponential formatting through :e
and :E
, however, exponential format can be taxing for humans to read for values on the order of 1
.
assert_eq!(format!("{:e}", 22.0), "2.2e1");
assert_eq!(format!("{:e}", 1.0), "1e0");
assert_eq!(format!("{:e}", 0.9), "9e-1");
assert_eq!(format!("{:e}", 0.0), "0e0");
Furthermore, this solution cannot be applied to larger structs, and in particular any Debug
implementors that use the std::fmt::DebugXyz
family of helpers. (this includes any type that uses #[derive(Debug)]
)
#[derive(Debug)]
pub enum StepKind<F> {
Fixed(F),
Ulps(u64),
}
#[derive(Debug)]
struct FloatRange<F> {
min_inclusive: F,
max_inclusive: F,
step: StepKind<F>,
}
let positive_normal_f32s = FloatRange {
min_inclusive: std::f32::MIN_POSITIVE,
max_inclusive: std::f32::MAX,
step: StepKind::Ulps(1),
};
// There is no way to print this struct with exponential notation.
assert_eq!(
format!("{:?}", positive_normal_f32s),
"FloatRange { min_inclusive: 0.000000000000000000000000000000000000011754944, max_inclusive: 340282350000000000000000000000000000000.0, step: Ulps(1) }",
);
Many other languages and utilities have a "general" or "generic" formatting mode which dynamically switches between simple and exponential format, making it useful for a much wider selection of data.
By far, one of the most exceedingly important use cases of human-readable floats is in debug output. But currently, anything using #[derive(Debug)]
has only one way to print its floats, which is poorly optimized for human readability. With a second future RFC to add support for "{:g?}"
, users will be able to utilize the existing Debug
machinery to easily inspect arbitrary data structures with floating point numbers of heterogenous magnitude:
// using enum-map = "0.4.1"
#[derive(enum_map::Enum, Debug)]
enum Kind { Default, Simple }
#[derive(Debug)]
struct Settings {
initial: f64,
step: f64,
}
fn main() {
let map = enum_map::enum_map!{
Kind::Default => Settings { initial: 7654.32101234, step: 1e-6 },
Kind::Simple => Settings { initial: 0.0, step: 0.1 },
};
println!("{:g?}", map);
}
Output:
{Default: Settings { initial: 7654.32101234, step: 1e-6 }, Simple: Settings { initial: 0, step: 0.1 }}
Accomplishing such a feat through an external library is nearly impossible without massive buy-in.
Guide-level explanation
Three formatting modes
Rust has three ways to format floating point numbers. These are:
A simple format, which is the default:
assert_eq!(format!("{}", 5.0), "5");
assert_eq!(format!("{}", 5.1), "5.1");
assert_eq!(format!("{}", 1.234e9), "1234000000");
assert_eq!(format!("{}", 1.234e-9), "0.000000001234");
An exponential format, which always uses scientific notation:
assert_eq!(format!("{:e}", 5.0), "5e0");
assert_eq!(format!("{:e}", 5.1), "5.1e0");
assert_eq!(format!("{:e}", 1.234e9), "1.234e9");
assert_eq!(format!("{:e}", 1.234e-9), "1.234e-9");
assert_eq!(format!("{:E}", 1.234e-9), "1.234E-9");
And a general format, which switches between simple and exponential based on magnitude:
assert_eq!(format!("{:g}", 5.0), "5");
assert_eq!(format!("{:g}", 5.1), "5.1");
assert_eq!(format!("{:g}", 1.234e9), "1.234e9");
assert_eq!(format!("{:g}", 1.234e-9), "1.234e-9");
assert_eq!(format!("{:G}", 1.234e-9), "1.234E-9");
Precision
All three of the above formats print to round-trip precision by default. This can be changed by adding a precision to the format specifier (e.g. .4
). When a precision is added to either the simple or exponential format, the output will always contain that many digits after the decimal point:
assert_eq!(format!("{:.4}", 5.1), "5.1000");
assert_eq!(format!("{:.4e}", 1.234e9), "1.2340e9");
When a precision is added to the general format, it is used as the maximum number of significant figures to display. It is also used as the maximum number of digits that large numbers are allowed to contain before they are switched to exponential format.
Not only does this give you control over the threshold for exponential format, but it also prevents a number like 1234.0
from being formatted as 1230
. The threshold for small numbers remains fixed.
assert_eq!(format!("{:.3g}", 500.0), "500");
assert_eq!(format!("{:.3g}", 5000.0), "5e3");
assert_eq!(format!("{:.3g}", 1.234e9), "1.23e9");
Reference-level explanation
Changes to the compiler and standard library
Two traits are added to core::fmt
and std::fmt
:
/// `g` formatting.
pub trait LowerGen {
fn fmt(&self, f: &mut Formatter) -> Result;
}
/// `G` formatting.
pub trait UpperGen {
fn fmt(&self, f: &mut Formatter) -> Result;
}
// Implementations matching those of LowerExp and UpperExp
impl LowerGen for f32 { ... }
impl LowerGen for f64 { ... }
impl<T: ?Sized + LowerGen, 'a> LowerGen for &'a T { ... }
impl<T: ?Sized + LowerGen, 'a> LowerGen for &'a mut T { ... }
impl UpperGen for f32 { ... }
impl UpperGen for f64 { ... }
impl<T: ?Sized + UpperGen, 'a> UpperGen for &'a T { ... }
impl<T: ?Sized + UpperGen, 'a> UpperGen for &'a mut T { ... }
The format_args!
builtin macro is updated to support :g
and :G
as format specifiers, which map to these traits. They are usable in all of the same ways that :e
and :E
are.
Some research necessary
This RFC may appear to be sparse on the precise details of the new format it proposes. This is because it will most likely benefit from a substantial review of the literature and implementations in other languages. There are lots of little knobs to turn, and strange and unusual edge cases that are difficult to anticipate.
For instance: Suppose it is decided that format!("{:g}", 1e-4)
should output "0.0001"
(a 5-character string), and now consider adding a width of 4
, so that it becomes format!("{:4g}", 1e-4)
. Should this change the output, since "1e-4"
fits into the width while "0.0001"
does not? (The author would posit "no," as it is unclear how to generalize this edge case into something that feels natural)
Specific formatting examples (tentative)
The following is a table of example outputs that showcase a number of the tunable knobs in the format. The columns for {:g}
in these tables propose one possible set of decisions. The decisions in this table are tentative and up to bikeshedding. The choices made here were largely informed by reverse engineering the Python implementation, with some adjustments to fit better into rust's existing formatters.
Some of the knobs visible in this table are:
- Exponential-format thresholds for large and small values, both with and without a specified precision.
- Whether to append a trailing
.0
for integers or a leading-
for-0.0
. - Whether to always print
p
significant figures when a precision.p
is provided, or to omit trailing zeros.
Without precision flags:
Value | {} |
{:?} |
{:e} |
{:g} |
Notes |
---|---|---|---|---|---|
1.0 |
1 |
1.0 |
1e0 |
1 [^a] |
No .0 is consistent with {}
|
0.0 |
1 |
0.0 |
0e0 |
0 [^a] |
|
-0.0 |
0 |
-0.0 |
0e0 |
0 [^b] |
No leading - is consistent with {}
|
1.234 |
1.234 |
1.234 |
1.234e0 |
1.234 |
|
100 |
100 |
100 |
1e2 |
100 [^a] |
|
1000 |
1000 |
1000 |
1e3 |
1000 [^a] |
Even though 1e3 is shorter |
... | ... | ... | ... | ... | |
1000000 |
1000000 |
1000000 |
1e6 |
1000000 [^a] |
|
10000000 |
10000000 |
10000000 |
1e7 |
1e7 |
Suggested default high cutoff |
0.0001 |
0.0001 |
0.0001 |
1e-4 |
0.0001 |
|
0.00001 |
0.00001 |
0.00001 |
1e-5 |
1e-5 |
Suggested default low cutoff |
(1.0f32 + EPSILON) |
0.10000001 |
0.10000001 |
1.0000001e-1 |
0.10000001 |
|
1e-7 * (1.0f32 + EPSILON) |
0.000000100 00001
|
0.000000100 00001
|
1.0000001e-7 |
1.0000001e-7 |
With precision flags:
Value | Precision | {:.p$} |
{:.p$?} |
{:.p$e} |
{:.p$g} |
Notes |
---|---|---|---|---|---|---|
1.0 |
p=3 |
1.000 |
1.000 |
1.000e0 |
1 [^a] |
Consistent with {}
|
1.234 |
p=2 |
1.23 |
1.23 |
1.23e0 |
1.23 |
|
1.234 |
p=3 |
1.234 |
1.234 |
1.234e0 |
1.234 |
|
1.234 |
p=4 |
1.2340 |
1.2340 |
1.2340e0 |
1.234 |
Truncates trailing zeros |
1e4 |
p=5 |
10000.00000 |
10000.00000 |
1.00000e4 |
10000 |
|
1e4 |
p=4 |
10000.0000 |
10000.0000 |
1.0000e4 |
1e4 |
Suggested high cutoff (support p digits before the decimal) |
1e-3 |
p=1 |
0.001 |
0.001 |
1e-1 |
0.001 |
Low cutoff is always 1e-5 (what Python does) |
60000000 |
p=1 |
60000000.0 |
60000000.0 |
6.0e7 |
6e7 |
Consistent with {:e}
|
(1.0f32 + EPSILON) |
p=10 |
1.0000001192 |
1.0000001192 |
1.0000001192e0 |
1.0000001192 |
Excess digits faithfully represent the binary value |
1e-7 * (1.0f32 + EPSILON) |
p=5 |
0.00000 |
0.00000 |
1.00000e-7 |
1e-7 |
(as opposed to 0 ) |
[^a]: If/when {:g?}
becomes possible, it should add a trailing .0
to these so that they are valid literals.
[^b]: If/when {:g?}
becomes possible, it should render this as -0.0
.
Drawbacks
Implementation difficulty
Efficient floating point formatting is not an easy problem. However, the author of this RFC has little expertise on the topic.
Suggests a "batteries included" philosophy
Formatting floating point numbers nicely for humans requires making a number of decisions that don't have clear answers. Different domains may disagree, for instance, on what the best thresholds are for switching from standard notation to exponential notation, and no matter what choices are made, not everybody will be pleased.
This is partly why this RFC focuses great energy into the developer-oriented aspects, and declares future interaction with Debug
as an explicit goal. Debug output does not need to be perfect for everyone. It seems unlikely that most people would even notice the difference between an upper threshold of 1e7
versus 1e16
(as anecdotal evidence, prior to writing this RFC, the author did not even realize that these two drastically different threshollds are used by Haskell and Python, respectively). Debug output largely just needs to maximize bang for the buck and please as many people as it can... and the status quo of System { count: 602214085700000000000000 }
sets a pretty low bar for improvement!
{:g?}
may become preferred over {:?}
This is a disadvantage of the planned future RFC. There is tons of code that (a) already exists, (b) uses {:?}
, and (c) ...probably would be better off using {:g?}
instead. Such code will likely be fixed very slowly, and much of it won't ever be fixed at all.
This is a natural part of code evolution. Most alternatives share this drawback; the only way to overcome it would be with breaking changes to the standard library formatting impls.
It also raises the question: should {:gx?}
be supported?
Rationale and alternatives
Alternative: Change the output of {:?}
A far more direct approach to the motivation; Introduce nothing new, and instead change the {:?}
representation for f32
and f64
to work more like {:g}
proposed here.
- Pro: No new APIs. No new traits, no changes to format specifiers.
-
Pro: Automatic adoption all over. The benefits will be reaped in many more places, such as the
assert_eq!
macro. -
Con: Massive breaking change! Although ideally there ought to be no code depending on
Debug
output representations, in reality this is far from the truth, and in practice there are even places that should depend on it (e.g.should_panic
patterns under certain conditions). About a year prior to the posting of this RFC, theDebug
representation of floats was changed to include a trailing.0
for integer values(FIXME link), and this did not go unnoticed. The changes listed here are of far greater magnitude. -
Pro/Con: Potential for misuse? People may use
{:?}
in human-oriented output because it "looks nicer."
Alternative: Just add {:e?}
and {:E?}
.
Without introducing any implementation of general floating-point formatting, just add {:e?}
specifiers. This would solve the issue presented in the positive_normal_f32s
example. However, the author of this RFC would conjecture that the set of clear-cut good use cases for {:e?}
is vanishingly small compared to {:g?}
.
Alternative: Make the formatting system extensible
(thanks to @crlf0710 for reminding me to add this)
This RFC proposes adding a new format, but as an alternative, we could make formatting extensible in a way that allows third party libraries to provide a new format. The big question is: ....how, exactly?
While this does dodge some difficult questions and allow the standard library to remain general-purpose, it could be a massive design effort that will require a much greater and far more complicated RFC.
Make this the alternative format {:#?}
for floats
(thanks to @ekuber)
Like some other alternatives, this is a breaking change.
Unfortunately, this would force people to use {:#?}
on structs as well. {:#?}
is a very space-consuming representation that is far from ideal for most use-cases.
Add another flag rather than a trait
(suggested by @hanna-kruppe)
Basically, like the above idea, except introduce a new flag rather than repurposing #
. This represents a large class of possible designs. However, the design closest to the current proposal would be:
-
{:g}
will callDisplay::fmt
with a flag set on theFormatter
, rather than using a new trait. -
{:g?}
will callDebug::fmt
with a flag set on theFormatter
, as it would in the upcoming proposal.
The pros and cons of that specific design compared to this RFC are:
-
Pro: Unified API (in some ways).
{:g}
and{:g?}
will internally work similarly to each other, in contrast to{:x}
and{:x?}
. -
Pro: Automatic support by newtype wrappers. Any newtype wrappers that already implements all of the existing
std::fmt
traits by calling{Trait}::fmt(&self.field, formatter)
will automatically gain support for{:g}
. (note that automatic support for the debug counterpart{:g?}
is already a goal of this RFC) -
Con: Poorer type-checking. When you write
format!("{:x}", 1.0)
, you get a type error. When you writeformat!("{:g}", 1i32)
, you may get a runtime error, or more likely, no indication of any problem at all.
Many types implementDisplay
in a manner that performs a series ofwrite!()
invocations without any regard for formatting flags. These types will all end up accidentally "supporting"{:g}
whether they were designed to support it or not. -
Con: Poorly unified API (in other ways). How do you support
{:e}
? ImplementLowerExp
. How do you support{:g}
? Be careful in yourDisplay
impl.
There is also conceivably a design that adds {:g?}
without ever adding {:g}
. This removes the drawbacks listed above, but replaces them with other obvious drawbacks.
Possible extension: Add {:f}
and std::fmt::Fixed
for fixed-point formatting
It is somewhat disappointing that floats use dedicated traits {LowerExp,UpperExp}
for exponential formatting and the general-purpose trait Display
for fixed-point formatting. Adding a new dedicated Fixed
trait with a :f
specifier would help create a more complete conceptual basis for talking about floating point formatting and may benefit newtypes that want to differentiate their Fixed
and Display
impls.
It is worth noting that Debug
already recursively applies the precision in {:.5?}
, meaning that this extension is not necessary to backwards-compatibly add support for recursive fixed-point formatting (it's already there!).
At first glance, this alternative appears to even open a migration path to changing the meaning of {}
or {:?}
for floats, by deprecating these impls and generating a fixable migration warning to add missing :f
specifiers to format literals. However, this is a false hope, as still nothing can detect or fix the usage of a function defined as fn foo<T: fmt::Display>
or fn foo<T: ToString>
called on a float.
Prior art
TODO: Needs a lot more review
- Python's "default" formatter switches to exponential notation for any value
>= 1e16
. If you write{:g}
without a precision, it instead behaves identical to{:.6g}
. - Haskell switches to exponential notation for any value
>= 1e7
.
Unresolved questions
TODO
Future possibilities
FIXME: some of the stuff mentioned in other parts actually belongs here