[ACP Discussion] f32up and f32down suffixes

Currently, when writting something like 1.1f64, the compiler will round it automatically, which might not what exactly users want. For example, with let a=1.1f64 and b=11 as f64, we want b/a=10f64 and b%a near 0, but even div_euclid yields b%a~1.1

In case we want to control the rounding errors, it might be better to introduce some suffix like up/down/u/d/exact/...

up/down has the same rounding direction with calling next_up()/next_down(), but there is one difference: 1.1f32up==1.1f32(since 1.1f32 will be rounding to a little larger than 1.1)==1.1f32u(for abbr.) ==1.1u(ignore f32 type might be acceptable) (1.1f32).next_down()==1.1f32down==1.1d==...

You could notice that, 1.1f32 has the same value to its exact value, 1.10000001.....f32. There does not exists any function which could recognize 1.1f32 and 1.1000...f32 as different input values. This is why I introduce new suffixes here.

Another interesting thing is the exact form, I once want to use exact/exactup/exactdown to let the compiler check whether user has provided enough digits, but the grammar is very confused: (1.25e-1e-2f32) Maybe we should not omit f32/f64 indicator here, or perhaps we could write ud directly as exact, while ue/de for up_enough/down_enough which suggests enough digits are provided, even modifying the last digit by +1 or -1 does not change the rounding results.

For real world example, we might accept:

// e for exact
const PI:f64 = 3.141592653589793115997963468544185161590576171875f64e;
const PI:f64 = 3.1415926535897932f64ed;
const PI:f64 = 3.141592653589793f64eu;
// e for enough
const PI:f64 = 3.141592653589793115997963468544185161590576171875ud;
const PI:f64 = 3.1415926535897932de;
const PI:f64 = 3.141592653589793u;
1 Like

Maybe something that returns a RangeInclusive<f32>?

I'd rather have binary / hexadecimal literals which would sort of make this redundant. Why mess with decimal if you care about exact bit precision?

6 Likes

While I believe this is probably sufficiently niche that it's better done as a library, one case where this would be more useful than hex-literals is for comparing against some known value that isn't exactly representable as a float. For example, which of these expressions correctly implement the (mathematical) expression x < 1.3:

x <= 1.3_f32
x <  1.3_f32
x <= 1.3_f64
x <  1.3_f64

(It's the first and last lines, but to figure that out, you need to determine which way the constant is rounded in each type.)

You could avoid that headache if you could just write

x <= round_down!(1.3_f32);
x <= round_down!(1.3_f64);
4 Likes

That is only useful if your known exact value has a short decimal expansion that isn't exactly representable, which means the denominator is of the form 2^a * 5^b, b > 0 (in your example: 13 / 10). This seems like a very rare case.

As I said, it's quite niche. The point was that in some cases it is useful to have the operation "this decimal number rounded {down/up} to a float", and while hex-literals are more generally useful, they are not a substitute for that.

What would you do here?

// more than enough correct digits
const PI_LB: f32 = round_down!(3.14159265358979);

x <= PI_LB

Well, I'd do this:

// Exactly the right number of digits.
const PI_LB: f32 = 0x3.243F68;

How did you know you had enough digits in decimal? For some numbers 15 decimal digits is not sufficient there.

Oh, that's a very good question. I thought I knew but the reasoning wasn't entirely sound. If the actual value of π happened to be just very slightly above the correct PI_LB, then you could need some 23+ decimal digits to surpass it.

I did however remember that the rounded (to nearest) f32 value rounds up, so as long as the literal as written is less than half an epsilon below π, it truncates to the correct value.

I had thought of commenting that this is significantly harder to review, but I think the above showed that it's not so clear-cut.

Because I'd like to copy from WA and not have to think about it:

let pi: RangeInclusive<f32> = bikeshed!(3.1415926535897932384626433832795028841971693993751058209749445923);

and get the value that's above and below it.

(By putting way more decimal places than are possibly representable it means the right thing will happen and I don't need to double-check if it was done right or rounded or whatever.)

What is WA, is it WolframAlpha? It can do hexadecimal.

I'd argue it's easier to think about whether you have enough hex digits than it is to think about whether you have enough decimal digits (and review whether all 65 of them are correct!).

For hex literals, the compiler should warn you if you have too many bits.

To prove that 65 decimal digits is enough precision after rounding, you still have to think about binary representation. It's a non-trivial statement. In base 11 it wouldn't be enough for some numbers.

It's obviously a very niche kind of code. Whoever writes this "PI precisely rounded down" code needs to think about the binary representation anyway.

RangeInclusive might not cover all the cases.

This discussion mainly come up with a misuse of div-euclid/rem-euclid for floating types (See the BUG I've posted, maybe we need another ACP..)

In short, 11%1.1u yields ~1.1 and 11%1.1d yields ~0, while 11/1.1u==11/1.1d==10f64. To control the remainder manually, a 1.1d is required, and 1.1u will directly yields an logical error.

For some cases, if we want to control some rounding errors, manually specific the rounding mode is necessary, while 11%RangeInclusive<_> is not implemented now.

The binary literals is another exact representations, but that's for machines, not for humans.

We all knows that PI = 3.141592653589..., but who knows PI = 0x3.243F6A8885A3 ? The up/down suffixes show the relative errors, but the hexadecimal literals show nothing more than the stored value.

I suppose the proc-macro cannot do such things, but actually I'm wrong.

I'll write a crate soon, thanks for your suggestion.

1 Like

I'd still have to round/truncate it at the correct place.

I've found it almost impossible to create such a library.

For normal functions, they could not recognize whether the input is 1.1f32 or 1.100000001f32 since they have the same binary representations. Thus a feasible solution is proc-macro crates.

Here, we could obtain a literal from macro, but we can do nothing. The api that read literal is just "do-not-exist". Although I could read the exact value by converts the literal into str and then parse the str, but that is unstable and might change in the future.

IMHO, neither a proc-macro crate nor a function could perform rounding technique perfectly.

Machines use binary, hex is for programmers.

Very few people memorize PI to that many digits in either base. I guess memorizing the digits of PI is a relatively common hobby in some circles, but even those that do memorize this, don't memorize the digits of any other irrational numbers such as 2*PI or sqrt(3). But we can all look them up online or calculate them on WolframAlpha -- in either base.

It's not like you need to write down the digits of PI in code from scratch every day. Do you have real cases of code that would use this feature?

To me it seems like such a specialized feature that when it's needed, it's not really a problem to actually put in the 7 / 14 hex digits (perhaps with a comment that says what it is approximately in decimal).

One problem with round_down!(3.14159...) is that it requires way more digits than the precision we are actually going for. You need 7 digits in hex, which corresponds to the actual f32 precision, while you need ~24 digits in decimal to be sure, even though the actual precision is only ~8 digits. And more than 24 digits are required for numbers in a different range!

The big benefit of hex is that you can write down the exact value as represented. It's supported by C and C++ for that reason. It's easy to pass values in this format between different programs through text serialization formats.

A workaround is to print enough decimal digits of a rounded number so that they parse back as the same number. But there are several problems with that. One of the problems is that if you print an f32 with sufficient decimal digits and then parse it as f64, it will be a different number. Hex solves this problem.

It's not that hard for people who know what they are doing, who are the only people that need to do anything like this. The compiler will check that it can be exactly represented, and you can write a test such as:

const PI_LB: f32 = 0x3.243F68;
const PI_UB: f32 = 0x3.243F6C;
const _: () = assert!(PI_UB == PI_LB.next_up());

It's similar level of difficulty as specifying u32 / u64 constants in hex with the correct number of digits.

Maybe it's difficult for beginners, but beginners don't need to do this sort of thing, they just need std::f64::consts::PI.

1 Like

Actually it is some advantages, we could mark how big the error is:

const PI:f64 = 3.141592653589793u;
const PI:f64 = 3.1415926535897932d;

We could easily figure that, the PI value larger than 3.141592653589793, and smaller than the true value 3.1415926535897932.... For a hex representation, we do not have such convenience.

More, many data comes up with base 10 rather than 16. Using base 16 in float types will introduce additional inconsistancy.

And the final question, How about a float lager than 10? Or how about some vary large or small constants?

// Catastrophe:
pub const LIGHT_SPEED: f64 = 0x11de784a_f64 // how to write the suffix of f64? f64 is valid suffix, while it is also valid hex digits.
pub const G = 0x0.0000000049566185B767C5_f64; // 6.67e-11f64

Introducing hex digits perhaps no more than chaos and catastrophe.

I don't understand this example. You can't define the same constant twice.

Easy:

const ELEVEN_AND_A_HALF: f32 = 0xB.8;

Good point that _f64 doesn't work in these cases unfortunately. But you can solve both problems at the same time by including the p power-of-2 exponent. This notation is part of the IEEE-754 floating point standard:

pub const G: f64 = 0x4.9566185B767Cp-36_f64;

Personally I don't like that it uses decimal for the exponent, but oh well, it's the standard.

I show you 2 ways to define the same PI. You could just write the true value with enough digits, and a suffix to show the stored value is larger or smaller than it is here.

This is what I'm worrying about. Hex representations is less used to represent a real number. All the people are familar with LIGHT_SPEED = 299_792_458;, but not too many people could confirm that LIGHT_SPEED: f64 = 0x11de784a is correct. Using such representations will yields quantities of magic numbers, which might not a good idea.

The example isn't worrying because the decimal notation would still be supported, so you can still write this as 299_792_458.0.

The problem is that, for almost all constants, you could only got the normal form, not the IEEE-754 specific form, which makes hex notation useless. If we want to check the roundings, just use the exact value reported in papers, adding a u/d suffix, and compiler will check whether the rounding direction is correct. But for hex notations, we cannot make such easy things.

For example, you could got something like const PI: f64 = 3.141592653589793115997963468544185161590576171875 and confirm it with a simple line:

println!("{:.50}", 3.1415926535897932)

But for hex notation, convert consts to hex notation is not straight forward.

In your case, you suggests a non-obvious notation, which might only be done with some other program. That method is perhaps less attractive than just rounding by hand and hard code those rounded results:

// Yours:
const PI_LB: f32 = /*put core::f64::consts::PI to another function and got the result*/0x3.243F68;
// Just treat it manually:
const PI_LB: f32 = /*if core::f64::consts::PI > core::f32::consts::PI as f64{core::f32::consts::PI} else {core::f32::consts::PI.next_down()};
// here, both core::f64::consts::PI > core::f32::consts::PI as f64  and core::f32::consts::PI.next_down() could be calculated manually, resulting in */3.141592502593994140625;

I can't tell which method is complex, but since your work needs lots of converts, it might not be that fruitful.

I don't understand this proposal. In the original proposal, as far as I understand, the compiler doesn't "check" anything, it just rounds whatever is written up or down.

If you want to write it like this in decimal with full precision, you already can, so I don't see how the example is relevant. I did not propose removing decimal notation.