pre-RFC/draft: {:g}, or "floating points for humans"


#1

So I’ve been working on the following RFC, but am starting to run out of steam (and vacation time!) and am not sure if I can finish it. This is a pretty lousy draft, but it’s a feature I really want to see in Rust eventually. Hopefully I can get some feedback given what’s there, or this can help spark discussion.


  • Feature Name: float_gen_fmt
  • Start Date: (fill me in with today’s date, YYYY-MM-DD)
  • RFC PR: (leave this empty)
  • Rust Issue: (leave this empty)

Summary

Add :g and :G formatting specifiers for floating point numbers, backed by two new traits std::fmt::{LowerGen,UpperGen}. These formats dynamically switch between fixed-point formatting and the exponential formats :e and :E based on the magnitude of a value.

The long-term plan is to allow this flag to be combined with :? in a manner like #2226 for recursive formatting of floating point numbers in larger structs—however, this is not included in the current proposal, and is only mentioned for its relevance to the motivation.

Motivation

Rust currently has two ways to format floating point numbers:

  • Simple (through Debug and Display)
  • Exponential (through LowerExp and UpperExp)

Either of these additionally support a mode of “round-trip precision,” when no precision (.prec) is provided in the format specifier. However, neither of these two formats are suitable for human-oriented interfaces in contexts where numbers may be of arbitrary magnitude.

The simple formatting scheme can sometimes force the reader to play a game of “count the zeros”:

assert_eq!(
    format!("{:?}", std::f64::MAX),
    "179769313486231570000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.0",
);

This can frequently be an issue with values like 1e-10, which may often show up as fuzz factors and tolerances in floating point computations. The only solution offered by the standard library is exponential formatting through :e and :E, however, exponential format can be taxing for humans to read for values on the order of 1.

assert_eq!(format!("{:e}", 22.0), "2.2e1");
assert_eq!(format!("{:e}", 1.0), "1e0");
assert_eq!(format!("{:e}", 0.9), "9e-1");
assert_eq!(format!("{:e}", 0.0), "0e0");

Furthermore, this solution cannot be applied to larger structs, and in particular any Debug implementors that use the std::fmt::DebugXyz family of helpers. (this includes any type that uses #[derive(Debug)])

#[derive(Debug)]
pub enum StepKind<F> {
    Fixed(F),
    Ulps(u64),
}

#[derive(Debug)]
struct FloatRange<F> {
    min_inclusive: F,
    max_inclusive: F,
    step: StepKind<F>,
}

let positive_normal_f32s = FloatRange {
    min_inclusive: std::f32::MIN_POSITIVE,
    max_inclusive: std::f32::MAX,
    step: StepKind::Ulps(1),
};

// There is no way to print this struct with exponential notation.
assert_eq!(
    format!("{:?}", positive_normal_f32s),
    "FloatRange { min_inclusive: 0.000000000000000000000000000000000000011754944, max_inclusive: 340282350000000000000000000000000000000.0, step: Ulps(1) }",
);

Many other languages and utilities have a “general” or “generic” formatting mode which dynamically switches between simple and exponential format, making it useful for a much wider selection of data.

By far, one of the most exceedingly important use cases of human-readable floats is in debug output. But currently, anything using #[derive(Debug)] has only one way to print its floats, which is poorly optimized for human readability. With a second future RFC to add support for "{:g?}", users will be able to utilize the existing Debug machinery to easily inspect arbitrary data structures with floating point numbers of heterogenous magnitude:

// using enum-map = "0.4.1"
#[derive(enum_map::Enum, Debug)]
enum Kind { Default, Simple }

#[derive(Debug)]
struct Settings {
    initial: f64,
    step: f64,
}

fn main() {
    let map = enum_map::enum_map!{
        Kind::Default => Settings { initial: 7654.32101234, step: 1e-6 },
        Kind::Simple => Settings { initial: 0.0, step: 0.1 },
    };
    println!("{:g?}", map);
}

Output:

{Default: Settings { initial: 7654.32101234, step: 1e-6 }, Simple: Settings { initial: 0, step: 0.1 }}

Accomplishing such a feat through an external library is nearly impossible without massive buy-in.

Guide-level explanation

Three formatting modes

Rust has three ways to format floating point numbers. These are:

A simple format, which is the default:

assert_eq!(format!("{}", 5.0), "5");
assert_eq!(format!("{}", 5.1), "5.1");
assert_eq!(format!("{}", 1.234e9), "1234000000");
assert_eq!(format!("{}", 1.234e-9), "0.000000001234");

An exponential format, which always uses scientific notation:

assert_eq!(format!("{:e}", 5.0), "5e0");
assert_eq!(format!("{:e}", 5.1), "5.1e0");
assert_eq!(format!("{:e}", 1.234e9), "1.234e9");
assert_eq!(format!("{:e}", 1.234e-9), "1.234e-9");
assert_eq!(format!("{:E}", 1.234e-9), "1.234E-9");

And a general format, which switches between simple and exponential based on magnitude:

assert_eq!(format!("{:g}", 5.0), "5");
assert_eq!(format!("{:g}", 5.1), "5.1");
assert_eq!(format!("{:g}", 1.234e9), "1.234e9");
assert_eq!(format!("{:g}", 1.234e-9), "1.234e-9");
assert_eq!(format!("{:G}", 1.234e-9), "1.234E-9");

Precision

All three of the above formats print to round-trip precision by default. This can be changed by adding a precision to the format specifier (e.g. .4). When a precision is added to either the simple or exponential format, the output will always contain that many digits after the decimal point:

assert_eq!(format!("{:.4}", 5.1), "5.1000");
assert_eq!(format!("{:.4e}", 1.234e9), "1.2340e9");

When a precision is added to the general format, it is used as the maximum number of significant figures to display. It is also used as the maximum number of digits that large numbers are allowed to contain before they are switched to exponential format. Not only does this give you control over the threshold for exponential format, but it also prevents a number like 1234.0 from being formatted as 1230. The threshold for small numbers remains fixed.

assert_eq!(format!("{:.3g}", 500.0), "500");
assert_eq!(format!("{:.3g}", 5000.0), "5e3");
assert_eq!(format!("{:.3g}", 1.234e9), "1.23e9");

Reference-level explanation

Changes to the compiler and standard library

Two traits are added to core::fmt and std::fmt:

/// `g` formatting.
pub trait LowerGen {
    fn fmt(&self, f: &mut Formatter) -> Result;
}

/// `G` formatting.
pub trait UpperGen {
    fn fmt(&self, f: &mut Formatter) -> Result;
}

// Implementations matching those of LowerExp and UpperExp
impl LowerGen for f32 { ... }
impl LowerGen for f64 { ... }
impl<T: ?Sized + LowerGen, 'a> LowerGen for &'a T { ... }
impl<T: ?Sized + LowerGen, 'a> LowerGen for &'a mut T { ... }

impl UpperGen for f32 { ... }
impl UpperGen for f64 { ... }
impl<T: ?Sized + UpperGen, 'a> UpperGen for &'a T { ... }
impl<T: ?Sized + UpperGen, 'a> UpperGen for &'a mut T { ... }

The format_args! builtin macro is updated to support :g and :G as format specifiers, which map to these traits. They are usable in all of the same ways that :e and :E are.

Some research necessary

This RFC may appear to be sparse on the precise details of the new format it proposes. This is because it will most likely benefit from a substantial review of the literature and implementations in other languages. There are lots of little knobs to turn, and strange and unusual edge cases that are difficult to anticipate.

For instance: Suppose it is decided that format!("{:g}", 1e-4) should output "0.0001" (a 5-character string), and now consider adding a width of 4, so that it becomes format!("{:4g}", 1e-4). Should this change the output, since "1e-4" fits into the width while "0.0001" does not? (The author would posit “no,” as it is unclear how to generalize this edge case into something that feels natural)

Specific formatting examples (tentative)

The following is a table of example outputs that showcase a number of the tunable knobs in the format. The columns for {:g} in these tables propose one possible set of decisions. The decisions in this table are tentative and up to bikeshedding. The choices made here were largely informed by reverse engineering the Python implementation, with some adjustments to fit better into rust’s existing formatters.

Some of the knobs visible in this table are:

  • Exponential-format thresholds for large and small values, both with and without a specified precision.
  • Whether to append a trailing .0 for integers or a leading - for -0.0.
  • Whether to always print p significant figures when a precision .p is provided, or to omit trailing zeros.

Without precision flags:

Value {} {:?} {:e} {:g} Notes
1.0 1 1.0 1e0 1[^a] No .0 is consistent with {}
0.0 1 0.0 0e0 0[^a]
-0.0 0 -0.0 0e0 0[^b] No leading - is consistent with {}
1.234 1.234 1.234 1.234e0 1.234
100 100 100 1e2 100[^a]
1000 1000 1000 1e3 1000[^a] Even though 1e3 is shorter
1000000 1000000 1000000 1e6 1000000[^a]
10000000 10000000 10000000 1e7 1e7 Suggested default high cutoff
0.0001 0.0001 0.0001 1e-4 0.0001
0.00001 0.00001 0.00001 1e-5 1e-5 Suggested default low cutoff
(1.0f32 + EPSILON) 0.10000001 0.10000001 1.0000001e-1 0.10000001
1e-7 * (1.0f32 + EPSILON) 0.000000100
00001
0.000000100
00001
1.0000001e-7 1.0000001e-7

With precision flags:

Value Precision {:.p$} {:.p$?} {:.p$e} {:.p$g} Notes
1.0 p=3 1.000 1.000 1.000e0 1[^a] Consistent with {}
1.234 p=2 1.23 1.23 1.23e0 1.23
1.234 p=3 1.234 1.234 1.234e0 1.234
1.234 p=4 1.2340 1.2340 1.2340e0 1.234 Truncates trailing zeros
1e4 p=5 10000.00000 10000.00000 1.00000e4 10000
1e4 p=4 10000.0000 10000.0000 1.0000e4 1e4 Suggested high cutoff (support p digits before the decimal)
1e-3 p=1 0.001 0.001 1e-1 0.001 Low cutoff is always 1e-5 (what Python does)
60000000 p=1 60000000.0 60000000.0 6.0e7 6e7 Consistent with {:e}
(1.0f32 + EPSILON) p=10 1.0000001192 1.0000001192 1.0000001192e0 1.0000001192 Excess digits faithfully represent the binary value
1e-7 * (1.0f32 + EPSILON) p=5 0.00000 0.00000 1.00000e-7 1e-7 (as opposed to 0)

[^a]: If/when {:g?} becomes possible, it should add a trailing .0 to these so that they are valid literals.

[^b]: If/when {:g?} becomes possible, it should render this as -0.0.

Drawbacks

Implementation difficulty

Efficient floating point formatting is not an easy problem. However, the author of this RFC has little expertise on the topic.

Suggests a “batteries included” philosophy

Formatting floating point numbers nicely for humans requires making a number of decisions that don’t have clear answers. Different domains may disagree, for instance, on what the best thresholds are for switching from standard notation to exponential notation, and no matter what choices are made, not everybody will be pleased.

This is partly why this RFC focuses great energy into the developer-oriented aspects, and declares future interaction with Debug as an explicit goal. Debug output does not need to be perfect for everyone. It seems unlikely that most people would even notice the difference between an upper threshold of 1e7 versus 1e16 (as anecdotal evidence, prior to writing this RFC, the author did not even realize that these two drastically different threshollds are used by Haskell and Python, respectively). Debug output largely just needs to maximize bang for the buck and please as many people as it can… and the status quo of System { count: 602214085700000000000000 } sets a pretty low bar for improvement!

{:g?} may become preferred over {:?}

This is a disadvantage of the planned future RFC. There is tons of code that (a) already exists, (b) uses {:?}, and (c) …probably would be better off using {:g?} instead. Such code will likely be fixed very slowly, and much of it won’t ever be fixed at all.

This is a natural part of code evolution. Most alternatives share this drawback; the only way to overcome it would be with breaking changes to the standard library formatting impls.

It also raises the question: should {:gx?} be supported?

Rationale and alternatives

Alternative: Change the output of {:?}

A far more direct approach to the motivation; Introduce nothing new, and instead change the {:?} representation for f32 and f64 to work more like {:g} proposed here.

  • Pro: No new APIs. No new traits, no changes to format specifiers.
  • Pro: Automatic adoption all over. The benefits will be reaped in many more places, such as the assert_eq! macro.
  • Con: Massive breaking change! Although ideally there ought to be no code depending on Debug output representations, in reality this is far from the truth, and in practice there are even places that should depend on it (e.g. should_panic patterns under certain conditions). About a year prior to the posting of this RFC, the Debug representation of floats was changed to include a trailing .0 for integer values(FIXME link), and this did not go unnoticed. The changes listed here are of far greater magnitude.
  • Pro/Con: Potential for misuse? People may use {:?} in human-oriented output because it “looks nicer.”

Alternative: Just add {:e?} and {:E?}.

Without introducing any implementation of general floating-point formatting, just add {:e?} specifiers. This would solve the issue presented in the positive_normal_f32s example. However, the author of this RFC would conjecture that the set of clear-cut good use cases for {:e?} is vanishingly small compared to {:g?}.

Alternative: Make the formatting system extensible

(thanks to @crlf0710 for reminding me to add this)

This RFC proposes adding a new format, but as an alternative, we could make formatting extensible in a way that allows third party libraries to provide a new format. The big question is: …how, exactly?

While this does dodge some difficult questions and allow the standard library to remain general-purpose, it could be a massive design effort that will require a much greater and far more complicated RFC.

Make this the alternative format {:#?} for floats

(thanks to @ekuber)

Like some other alternatives, this is a breaking change.

Unfortunately, this would force people to use {:#?} on structs as well. {:#?} is a very space-consuming representation that is far from ideal for most use-cases.

Add another flag rather than a trait

(suggested by @rkruppe)

Basically, like the above idea, except introduce a new flag rather than repurposing #. This represents a large class of possible designs. However, the design closest to the current proposal would be:

  • {:g} will call Display::fmt with a flag set on the Formatter, rather than using a new trait.
  • {:g?} will call Debug::fmt with a flag set on the Formatter, as it would in the upcoming proposal.

The pros and cons of that specific design compared to this RFC are:

  • Pro: Unified API (in some ways). {:g} and {:g?} will internally work similarly to each other, in contrast to {:x} and {:x?}.
  • Pro: Automatic support by newtype wrappers. Any newtype wrappers that already implements all of the existing std::fmt traits by calling {Trait}::fmt(&self.field, formatter) will automatically gain support for {:g}. (note that automatic support for the debug counterpart {:g?} is already a goal of this RFC)
  • Con: Poorer type-checking. When you write format!("{:x}", 1.0), you get a type error. When you write format!("{:g}", 1i32), you may get a runtime error, or more likely, no indication of any problem at all.
    Many types implement Display in a manner that performs a series of write!() invocations without any regard for formatting flags. These types will all end up accidentally “supporting” {:g} whether they were designed to support it or not.
  • Con: Poorly unified API (in other ways). How do you support {:e}? Implement LowerExp. How do you support {:g}? Be careful in your Display impl.

There is also conceivably a design that adds {:g?} without ever adding {:g}. This removes the drawbacks listed above, but replaces them with other obvious drawbacks.

Possible extension: Add {:f} and std::fmt::Fixed for fixed-point formatting

It is somewhat disappointing that floats use dedicated traits {LowerExp,UpperExp} for exponential formatting and the general-purpose trait Display for fixed-point formatting. Adding a new dedicated Fixed trait with a :f specifier would help create a more complete conceptual basis for talking about floating point formatting and may benefit newtypes that want to differentiate their Fixed and Display impls.

It is worth noting that Debug already recursively applies the precision in {:.5?}, meaning that this extension is not necessary to backwards-compatibly add support for recursive fixed-point formatting (it’s already there!).

At first glance, this alternative appears to even open a migration path to changing the meaning of {} or {:?} for floats, by deprecating these impls and generating a fixable migration warning to add missing :f specifiers to format literals. However, this is a false hope, as still nothing can detect or fix the usage of a function defined as fn foo<T: fmt::Display> or fn foo<T: ToString> called on a float.

Prior art

TODO: Needs a lot more review

  • Python’s “default” formatter switches to exponential notation for any value >= 1e16. If you write {:g} without a precision, it instead behaves identical to {:.6g}.
  • Haskell switches to exponential notation for any value >= 1e7.

Unresolved questions

TODO

Future possibilities

FIXME: some of the stuff mentioned in other parts actually belongs here


#2

Another alternative would be to introduce this behavior behind the alternate debug output ({:#?}) for floats.


#3

I definitely support having this available.

As a way to try it out, we’re allowed to change Debug implementations. We could try a “whichever is shorter” output that still round trips.


#4

Instead of creating another output format within std itself, actually i think it makes more sense to add an “extension mechanism” to format_arguments etc, so this can actually be implemented in a lib?


#5

Is there a need for an extension mechanism? You can already write a function

fn my_display<'a>(s: &'a Something) -> impl Display + 'a { /*...*/ }

and then

println!("{}", my_display(&something));

Or am I missing something?


#7

Thanks, I initially intended to add that as an alternative, but I forgot.

That doesn’t address the difficulty of formatting floats located in arbitrary places inside structs. This can only be addressed by either providing more builtin formatting methods, or by making the formatting system pluggable.

One final note: The text in the RFC deliberately avoids using this language because I find it is not true of the implementations of %g that I enjoy using. It also seems more difficult to implement, with more surprising edge cases. This is why I focus on thresholds on absolute value.


#8

Ah, yes. I don’t like the idea of using formatting flags to format internal struct elements. For example I would prefer struct S { mask: u32, /*...*/, count: u32 } to display mask as {:x} and count as {}, and that kind of thing breaks down when you try to use a flag for complex structs instead of single elements. How internal fields are formatted should really be the responsibility of the struct’s internal implementation; if anything the struct can provide a configurable display method like fn display<'a>(&'a self, conf: Configuration) -> impl Display + 'a.

But that is an aside on the alternative, not of your main recommendation. I actually agree that {:g} is missing from the current floating-point display formats.


#9

I also wasn’t too keen on {:x?}, but it does make sense for homogenous data structures like arrays, Vec, and most notably &[u8] (the type of b-string literals). The main trouble is that, in the general case as you look at increasingly-nested data structures, most of them are heterogenous.

In any case, given that {:x?} does exist (as does recursive fixed-point formatting with {:.4?}), I’d like to capitalize on it. And in contrast to the existing debug-supported flags, {:g?} is a fairly reliable choice for use with heterogenous structures!


I might be picking nits here at your choice of the name display (since it seems to be drawing from the pattern used by std::path::Path), but I would argue that this does not scale for debugging. Debug and Display have very different requirements. Display should be used for user-facing output. I would implement Display for e.g. objects that represent file formats, command line arguments, or error messages. If you need to have a method to give a Display object, that’s totally reasonable because it surely won’t be dead code.

#[derive(Debug)] on the other hand, is implemented recursively for a reason. It should always be available with as little effort as possible, because the reasons for using it may appear and disappear on a whim. And a custom debug() -> impl Display method forces people to have to write manual Debug impls for structs containing that type.

I find it sad that e.g. python has a default impl for __str__ (our equivalent to Display) that calls __repr__ (our equivalent to Debug). Most people just conflate the two methods, and the end result is that we lose a very useful distinction in semantics.


#10

This suggests to me this should be a flag that goes into the formatting options (like #, precision, padding, etc.) rather than a new formatting trait. It doesn’t have to be # precisely (that does have the disadvantage of affecting too many other things, in particular making struct debug formatting very verbose), but a trait seems questionable even without this extension (each new trait has ripple effects on APIs) and this extension is wholly incompatible with it.


#11

This seems like a strange idea. {:g} and {:g?} feel like the natural extension of what we already have, which includes:

  • :e and :E formatting. No doubt, these formatting modes (and all others except Display and Debug) would need to be incompatible with the new flag.
  • recursive hexadecimal formatting ({:2x?}, {:2X?})

Can you please elaborate on this? To me it seems exceedingly rare for a type to implement traits like std::fmt::{LowerExp,UpperExp} or to depend on them in public APIs. I doubt any “ripple” effect observed should go much farther than, user wants to use {:g} somewhere, but a single crate providing a newtype wrapper must be updated to implement the trait first.

Per what I said at the top of the post, I don’t see what you mean here.


#12

When I say a flag that goes into the formatting options, I am talking precisely about something like recursive hexadecimal formatting. Because that’s how {:x?} is implemented: it calls Debug with a special flag set (fmt::Formatter::debug_{lower,upper}_hex()). While for hexadecimal integers there is a corresponding trait for historical reasons, this is mentioned in the RFC as a drawback forced by backwards compatibility. For new kinds of formatting we could avoid that duplication.

I don’t see the issue, x? also doesn’t work with other traits.

Well, all those newtype wrappers amount to quite a bit of code. There’s also generic helper functions that want to accept “anything that can be Debug-formatted”, if this is supposed to be the new good way to print floating point numbers for programmers, they’ll have to consider whether they adopt the new format. (This is admittedly not specific to the trait, any new way to format has this issue.)

Sorry, my mistake, it isn’t incompatible, just redundant.


#13

In response to this I elaborated in the RFC:

Add another flag rather than a trait

Basically, like the above idea, except introduce a new flag rather than repurposing #. This represents a large class of possible designs. However, the design closest to the current proposal would be:

  • {:g} will call Display::fmt with a flag set on the Formatter, rather than using a new trait.
  • {:g?} will call Debug::fmt with a flag set on the Formatter, as it would in the upcoming proposal.

The pros and cons of that specific design compared to this RFC are:

  • Pro: Unified API. {:g} and {:g?} will internally work similarly to each other, in contrast to {:x} and {:x?}.
  • Pro: Automatic support by newtype wrappers. Any newtype wrappers that already implements all of the existing std::fmt traits by calling {Trait}::fmt(&self.field, formatter) will automatically gain support for {:g}. (note that automatic support for the debug counterpart {:g?} is already a goal of this RFC)
  • Con: Poorer type-checking. When you write format!("{:x}", 1.0), you get a type error. When you write format!("{:g}", 1i32), you may get a runtime error, or more likely, no indication of any problem at all.
    Many types implement Display in a manner that performs a series of write!() invocations without any regard for formatting flags. These types will all end up accidentally “supporting” {:g} whether they were designed to support it or not.
  • Con: Confusing compared to other format types. How do you support {:e}? Implement LowerExp. How do you support {:g}? Be careful in your Display impl.

There is also conceivably a design that adds {:g?} without ever adding {:g}. This removes the drawbacks listed above, but replaces them with other obvious drawbacks.