Display symmetry with number syntax

Numbers can be readably be written as 12_345.678_9 or 0b1010_1010. But there is no corresponding format to output more readable numbers.

I suggest {:_n} where n is optional and defaults to 3 for decimal & octal and to 4 for hex & binary as in the above examples:

format_spec := [[fill]align][sign]['#']['0'][width]['.' precision]['_' [grouping]]type
…
grouping := count

Precision is documented as how many digits after the decimal point should be printed. This clearly would not include the added underscores, which may thus increase the length of this part. That is a feeble reason to put the underscore afterwards, but maybe nobody can remember this order, so it might come equivalently before or after precision.

4 Likes

What about the Indian numbering system where (provided I understand correctly) digits are separated into one group of 3, then successive groups of 2? Would be a bit of a shame to introduce a feature like this that excludes roughly 1.85 billion people.

There might be other digit grouping systems, I'm not an expert and Wikipedia didn't obviously call any others out. I only know about the crore and lakh because of an old Tom Scott video.

1 Like

Rust’s formatting already has zero l10n support, not even for the decimal separator, so I don’t think that’s an issue.

1 Like

As @jdahlstrom points out, that's sadly not a topic for Rust. If it ever were, this could be extended as {:_2_3}.

Btw. east Asians count in powers of myriads (万, man = 1_0000; 億, oku = 1_0000_0000) yet in arabic numbers still put the separator at powers of thousands (10_000; 100_000_000) western style. It rightly confuses the shit out of them. I've seen several get completely lost about simple millions.

Then again Murricans don't know what a billion really is, being lost on the weirdly mixed short scale (which should be written 1_000_000000.) Though Wall St. has yards of Dollars, meaning milliards. So it's not completely lost, though almost. :grin:

3 Likes

I think what you're discussing is functionality that the num_format crate provides; I don't see a reason for this to be in the language or standard library.

7 Likes

I think there'd be value in supporting a simple version of this, for more readable debug-oriented printing of numbers expected to be large. Full generality of something like this (including the use of a different separator, and localization-specific formatting) doesn't belong in the standard library, but a simple "insert _ every N characters starting on the right" seems worthwhile.

12 Likes

Generalising the Indian idea, we could handle the annoyingly chaotic UUID format natively. Depending on which one is specified to repeat {uuid:x_8_4_12} or {uuid:x_8_4_4_4_12} would be all it takes (plus .replace('_', "-").)

How about formatting integers with _ when using the #? format, grouping by 3 for simplicity:

let i = 123_456_789;
assert_eq!(format!("{i}"), "123456789"); 
assert_eq!(format!("{i:?}"), "123456789"); 
assert_eq!(format!("{i:#?}"), "123_456_789"); 
4 Likes

Without commenting yet on whether we should use # for that, if we consider such a change, we would want to use 4 for hex or binary.

Or maybe 8

This is the kind of thing that makes me think if we add this we'd need to take an input number. Formatting is such a matter of taste.

4 Likes

I also like binary formatting for floats with grouping for sign/mantissa/exponent bits. Ditto for hex floats. Though those are fairly niche uses, in the end using a crate is fine in practice.

No problem with grouping by 4 (or 8) for power of 2 bases.

As Debug output, I don't think we need to give the user arbitrary control over spacing though. Typically anyone who is thinking about display format enough to not accept the default spacing will also prefer the underscore to be a space, comma, or period. Any more complicated schemes should use a crate.

Formatting with plain #? works better for containers:

assert_eq!(format!("{:#?}", [1234567890]), "[\n    1_234_567_890,\n]");
2 Likes