`TryFrom` for `f64`

That would work

We could also use Result<f64,f64> and return Ok(value) and Err(nearest_value) depending on whether the value is expressible exactly.

4 Likes

I'm a fan of that definition of the semantics.

For float->int it might be good to temporarily have an extra check until https://github.com/rust-lang/rust/issues/10184 gets completely fixed...

1 Like

This whole topic (and your post in particular) got me thinking about the floating point equivalent of real analysis, or I should, the lack of it. So far as I know there is no equivalent to real analysis (or complex analysis, or any other formal logic) over floating point numbers. Is there some equivalent? If there is, then the compiler could be upgraded with the rules for that formal analysis, which might allow it to do better code generation than it currently does (as well as find more bugs than it currently does).

I think you’re looking for numerical analysis.

Specifically, a useful – albeit not well-known – approach to treating quantized/discretized numbers rigorously is interval arithmetic.

1 Like

I can’t tell, but does numerical analysis have the concept of NaN? The closest I ever came to figuring out how to handle it was to use rational numbers and treat 0/0 as NaN. Most of the rest of the stuff seemed to come out OK that way, but I never did a formal analysis, so I’m not sure if my logic is sound.

Thank you for that link! It could be useful as the basis for a formal logic of floating point numbers. I think that if we can merge them with the idea I mention earlier (NaN as 0/0), then we might be able to reason over floating point numbers.

I’ve been thinking that the failure case could be something like

enum FloatConvertionError<Target> {
    NotRepresentable,
    Rounded(Target),
    Overflow,
    Underflow,
}

That should cover every case and also include the rounded value when possible.

5 Likes

Underflow would be rounded to 0 right? So the error gives the nearest integer except when that integer is not in the range of the type. Maybe it would also be good to differentiate between NaN and infinity

What about -0.0? E.g.:

fn main() {
    let a: f64 = 0.0;
    let b: f64 = -0.0;
    let c: isize = 0;
    let d: isize = -0;
    println!("a = {:?}, b = {:?}, c = {:?}, d = {:?}", a, b, c, d);
    println!("a == b {:?}", a == b);
}

yields

a = 0.0, b = -0.0, c = 0, d = 0
a == b true

which suggests that there is no lossless conversion from -0.0f64 to 0i64.

Clearly it cannot be lossless, as the distinction between +0.0f64 and -0.0f64 has been lost.

3 Likes

So, is this considered to be an error or not?

It depends on the desired semantics of your computation. Interval arithmetic was invented to facilitate establishing bounds on the error in a computation. The IEEE Floating Point rounding modes were invented in part to facilitate interval arithmetic. In some computations, -0.0 might be used to represent the interval (-𝜖 .. 0.0] in contrast to +0.0 representing the interval [0.0 .. +𝜖) for minimally-representable 𝜖.

1 Like

Which means that we need to decide if this conversion should result in an error or not.

I personally feel that if any conversion is lossy in any way at all, then the Err variant should be returned, probably wrapping something like @ekuber’s FloatConversionError. We might store the ‘best’ value possible in the variant, or we might have several choices and let the end user decide what to do. E.g.:

#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
struct RoundingChoices<Target> {
    upper: Target,
    lower: Target
}

#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
enum FloatConversionError<Target> {
    NotRepresentable, // For NaNs
    NegativeZero,     // -0.0f64
    Rounding(RoundingChoices<Target>),
    Overflow,         // Greater than what the target type can represent
    Underflow         // Less than what the target type can represent
}
1 Like

The domain of values of information that is represented by binary floating-point numbers is almost always approximate. (Decimal floating-point numbers are sometimes used to represent currency, in which case those values are often exact.)

Aside from NaNs, floating point numbers can specify the point zero, a finite set of point values within non-zero positive and negative representable spans, and ±∞. They thus serve as a quasi-logarithmic representation of the field of real numbers, which intrinsically has both infinite span and infinite point density within any interval. Each floating-point value represents an infinite number of real values that are numerically closer (in some sense) to the specified floating-point value than to either the next-higher or next-lower values in the selected floating-point representation.

This viewpoint calls into question the conceptual simplification that an integer value can be the functional equivalent of a floating-point value. Stated differently, both integers and floating-point values represent small intervals of the infinite real line, where the integer intervals are equally spaced around zero and the floating-point intervals are quasi-exponentially spaced around zero. Conversion from floating-point to integer is a mapping of one type of interval to the other. The question then becomes, what mapping is best (i.e., most accurate). It is obvious that this depends on the measure of “best”, and one approach is unlikely to satisfy all requirements.

7 Likes

Since floating-point values are all intervals, really, there is no lossless way to convert any u64 to f64. Even

(1: u64) != (1.0: f64)

You might as well make a best-effort rounding. The rounding error will be larger as the magnitude increases. Pick a limit and fail beyond that.

2 Likes

That said, 0.0 == -0.0, so it depends on what one considers lossy.

Precisely. That is the underlying problem with any finite-precision floating-point representation: what it means depends on how you want to use it. See my earlier post in this thread for an example where there is a semantic difference between -0.0 and +0.0.

1 Like

There certainly are, but note that even Vec::clone can generate something equal that's still distinguishable, though -- the capacity() can be different, and the as_ptr() usually is.

2 Likes