Scientific Computing - NaN as a 'None' Option or an 'Err' Result

Hi all.

I was communicating with a fellow on the IEEE-754 list. The discussion went a bit off topic so we communicated privately, and he encouraged me to discuss this with language developers.

This is mostly about handling errors in numerical computations. As we know, some computations have no meaningful results in studied maths, while we can define +1/+0 as +inf so that sane results may come back, stuff like 0/0 have no result that's acceptable in every situation. While sin(x)/x = 1 when x=0 may seem natural enough, it's not applicable to other places, not even the immediately obvious sin(x)/2x.

Rust pioneered in the field of safe programming, and introduced Option and Result as container types for what may be results that may also be errors. In my opinion, when NaN was introduced in IEEE-754-1985, NaN is exactly the None and Err branch of Option and Result, and I think it's very natural numerical results have similar properties, so that existing error handling facilities can be naturally extended to scientific computing scenarios.

Truth to be told, I'm developing a little programming language of my own. In doing so, I consulted the IEEE-754 list on their take on status quo in numeric handling in existing languages. I asked and they expressed that the 2008 edition of the standard is meant to be a "prescriptive" "meta" standard for languages. They have achieved some success in the numerical aspect - mostly with reproducibility, but so far language developers haven't taken up much on error handling.

I don't intend to have a fierce competition with Rust in every aspect, and very likely my language can't, and in case my effort did prove to be a failure, I hope my idea can get carried on some way somewhere.

Here's an except from the specification of the language I'm developing:

-- BEGIN EXCERPT --

A bit of background first.

The IEEE-754 standard for floating point arithmetic specifies handling of exceptional conditions for computations. These conditions can be handled in the default way (default exception handling) or in some alternative ways (alternative exception handling).

The 1985 edition of the standard described exceptions and their default handling in section 7, and handling using traps in section 8. These were revised as "exceptions and default exception handling" in section 7 as well as "altenate exception handling attributes" in section 8 in the 2008 edition of the standard - these "attributes" are associated with "blocks" which (as most would expect) are group(s) of statements. Alternate exception handling are used in many advanced numerical programs to improve robustness.

As a prescriptive standard, it was intended to have language standards to describe constructs for handling floating point errors in a generic way that abstracts away the underlying detail of system and hardware implementations. In doing so, the standard itself becomes non-generic, and described features specific to some languages that were not present in others.

The CXING language employs null coalescing operators as general-purpose error-handling syntax, and make it cover NaNs by making them nullish. As an unsolicited half-improvement, I (dannyniu) propose the following alternative description for "alternate exception handling":

Language ought to specify ways for program to transfer the control of execution, or to evaluate certain expressions when a subset (some or all) of exceptions occur.

-- END EXCERPT --

Note IEEE-754-2008 devoted an entire chapter to what's known as the "alternate exception handling".

The draft spec for my language is posted at https://dannyniu.github.io/cxing-specs/cxing-spec-draft.pdf

The excerpt is on page 8.

Now ends the chit-chat. My questions for this thread are: would Rust be interested in specifying floating point types as a first-class Number-Enum dual-identity type that can be used for handling numerical errors in the match expressions? Or have Rust already introduced similar mechanism?


My intention is that, Rust can apply its error handling wisdom to IEEE numerics. A chapter in a prescriptive standard (I'm talking about Chap-8 in IEEE-754) cannot be proven right if nobody puts it into test.

1 Like

I have the impression that a big part of why IEEE 754's error handling spec hasn't ever had much traction, is that the non-finite values (Inf and NaN) are only part of how it works, and the other part is out-of-band exception bits and hardware traps. CPU designers don't like out-of-band exception bits and hardware traps because they interfere with out-of-order execution, and programming language designers don't like them because the OS-level interfaces for dealing with hardware traps are really annoying to work with. I have some notes somewhere on an alternative design in which all results are in-band; unfortunately it requires changing the format of floating-point values.

All that said, Rust does already have something like what you're looking for:

fn process(v: f64) {
    use core::num::FpCategory::*;
    match v.classify() {
        Zero | Subnormal | Normal => process_finite(v),
        Infinite => process_infinite(v),
        Nan => process_nan(v),
    }
}

For x86 this is implemented by bit bashing on the integer equivalent of the representation (demo) which is, AFAICT, the best you can do without specific ISA extensions (the vfpclass* instructions are part of the "AVX512DQ" feature).

How close does this come to what you had in mind? I would be hesitant to make match apply directly to bare floating point values, because then people would probably want to be able to match against specific values and ranges of values, not just categories, which is a fraught thing to do for FP. I'd also be hesitant to make it easy to extract NaN payloads without also putting in stronger guarantees about NaN propagation (the current rules are here; they amount to 'you can't count on NaN payloads surviving arithmetic operations').

3 Likes

Thanks, that's exactly what I have on my mind. Added that we can put flow-control keywords in match arms, and/or make them have values, I think this is exactly the idiom Chapter 8 envisioned for.


CPU designers don't like out-of-band exception bits and hardware traps because they interfere with out-of-order execution, and programming language designers don't like them because the OS-level interfaces for dealing with hardware traps are really annoying to work with.

That's an interesting remark for engineer students. I'll digest those.

Match arms can certainly have values; in the example above, all you have to do is add -> f64 to the function head of process and it will start returning whatever f64 value process_* return. (In the playground link, don't forget to add -> f64 to the declarations of process_* as well, or it won't compile.)

Match arms performing control flow may or may not already be possible depending on what you mean by that. This works...

   let mut v = initial_estimate();
   loop {
       v = refine_estimate(v);
       match v.classify() {
           FpCategory::Nan => { return Err(...); }
           _ => {}
       }
       // ...
   }

(although it'd probably be clearer as if v.is_nan() { ... }) but more complicated things (particularly involving closures) may not work as you would like.

If there were a stable way to opt-in to niche optimization, then we could have a FiniteF32 such that Option<FiniteF32> and Result<FiniteF32, FpError> are both four bytes in size and mapping between them and f32 would be very cheap if not free.

However, there are complications. In practice making overflow (infinity) an error means that almost every operation must be fallible, which might be too annoying. On the other hand, if infinities are okay but NaNs are not, then for example multiplication is still fallible because 0 * inf is NaN. Defining all operations as saturating to the min/max finite number would be fine in many use cases, but is no longer zero-overhead.

4 Likes

If we're using Result, then I think it matches the "deferred" handling as prescribed IEEE-754-2008 Chapter 8 for language specifications. The programmer can then decide when and what to do with a failing computation.

Chapter 8 is one of the least-tried part of the IEEE-754 standard, mostly because the lack of talent in the cross section of both programming language users and numerical specialists.


There're deferred and immediate transfer of control prescribed by Chapter 8, among other ways to handle numerical exceptions, such as recording the error to a log, or substituting the value of an expression.

I recall a remark from "Programming with POSIX Threads", that basically says "asynchronous cancellation is never one really want or need". I think the same can be said of numerical exceptions, deferred handling is almost certainly always better than aborting the program midway forcibly.

See also the experience with C++ exceptions and (to a lesser extent) Rust panics.

Could you elaborate on this? To me it sounds like it's the opposite. If operations return a Result then you have to handle that immediately before continuing to use the value within. I believe the idea was instead to continue execution even if some operation returned NaN or overflowed/underflowed, and then check that at the end of the current "block".

To me it sounds like the opposite.

If I recall correctly, the occurrence of a Result doesn't interrupt the current execution flow like a C++ exception or a POSIX signal.

If operations return a Result then you have to handle that immediately before continuing to use the value within.

Yes, but we don't have to use it right away. We can decide how to use it after we gather some other information.

I believe the idea was instead to continue execution ...

That's also what I interpret "deferred" mean.

From a certain point of view, the IEEE rules for NaN propagation are rules for implicit application of Result::map to the various operations defined on fXX values that are numbers.

2 Likes

But this is the normal IEEE 754 inf/qNaN semantics, surely, if hardware traps are absent/ignored? Everything looks like it works, non-finites just get propagated. One could even say that floats are implicitly monadic, with slightly more complex semantics than your regular Result/Either – NaN only ever gets you more NaNs, but from inf you can sometimes get back to finite (eg. 1 / inf = 0).

In principle one could make this an explicit language feature that a float behaves both as a primitive and as an enum and does all the bit-bashing for you, extracting mantissa, payload, nan bits or whatever you need but also does basic arithmetic operations.

That might make sense for a language focusing on heavy float use. Combine with let-else you could then write

let val @ f64::Finite(_) = input else { bail!("oh noes!") }; or whatever you need from that.

Combine further with variant-types and you now have a value the compiler knows is finite.

Thinking about it, an enum might be the wrong way to look at it? Perhaps having specialized patterns for floats would be better, where patterns can overlap or be refinements of broader patterns. Maybe one wants to match all the finites, or only positive numbers or anything outside the 0-1 range etc, but sometimes also directly match on exponent or mantissa. A negative value might be an error condition for some algorithms. Then NaN-matching would be just another pattern, not an enum variant.

because then people would probably want to be able to match against specific values and ranges of values, not just categories, which is a fraught thing to do for FP.

matching a value +- error could be a first-class feature if it's important enough for a language.

Sure! That's not what I meant to say though (see the example later on).

But if you want to do any other operation on the value in the result then you do have to check it immediately. You cannot delay this check until you have performed all other operations (including those on the result of this operation itself).

Imagine for example a situation where you want to compute (a + b) + c. If a + b returned a Result you would have to stop and check that before computing the addition between its value and c. What I interpret as delayed instead is to execute the whole (a + b) + c, even if the a + b results in an error/nan, and only check at the end whether the whole expression ("block" I believe it's called in the standard) caused any error. This is basically how NaN propagation works, though with extra hardware flags you could also detect whether any overflow/underflow happened as well.

1 Like

If a + b returned a Result you would have to stop and check that before ...

That makes sense. And this is why my initial post in this thread asked if there's interest in having floating-points as a first-class number-enum dual-identity type with NaNs one of the enumerations.

Alternatives with Results are possible, and as you've pointed out, it's clumsy to program with.

I've thought about this off and on for a while, and I have yet to come to API design that works well.

Conceptually, with extreme control over niche optimization (that I'm pretty sure rustc doesn't have at the moment) it is possible to have a Result<FiniteF32, FpError> that is identical in representation to f32 and furthermore can implement Add<Result<FiniteF32, FpError>> in such a way that it compiles down to exactly the same as f32 addition (as infinities and NaNs in IEEE 754 are generally sticky). But that's a lot of machinery to end up with a structure that is maybe just as efficient as using f32 directly, and the end result is still kind of clunky, especially since you can match on classify instead for something that already works today and probably has the performance characteristics you want (if not the force-you-to-handle-errors-correctly characteristics). FiniteF32 has its uses for niche optimization, but because it's not closed under most basic arithmetic operations due to pretty rare overflow issues, it's hard to come up with a good API that lets you do arithmetic on it.

IEEE 754 exception support however does provide some interesting ideas for API design that's not focused on individual operations. One idea that might be worth exploring is an API that can catch exceptions not for an individual operation but for a block of code, à la std::panic::catch_unwind:

fn catch_fp_exceptions(except: FpErrorMask, f: impl FnOnce() -> f64) -> Result<f64, FpErrorMask> {
  feclearexcept(FE_ALL_EXCEPT);
  let val = f();
  match fetestexcept(except) {
    0 => Ok(val),
    err = Err(err)
  }
}

Strictly speaking, this doesn't work under the LLVM floating-point model, since we'd have to compile f in strictfp mode. But it's possible to make this work in practice simply by relying on noinline to prevent any possibility of code motion across the FP exception checking boundary and being somewhat fastidious over when code is evaluated at compile-time versus run-time. There are some nice advantages to using hardware traps (such as being able to pinpoint which operation caused everything to go haywire), but from what I could find searching for code that uses FP exceptions, in practice the method I wrote seems to suffice for literally every check for an FP exception.

2 Likes

That's not always true. For example:

    let a = 1.0f64;
    let b = 1e-20;
    let c = 0.0 / 0.0;
    let d = (a+b).powf(c);
    println!("{c} {d}");

c is NaN but d is 1.

1 Like

You may want to read http://tom7.org/nand/nand.pdf (or watch the associated video) which implements logic gates when representing 0 as NaN and 1 as Inf through standard IEEE float operations.

6 Likes

The origin of this thread stem from a discussion over exception handling in math errors and language facilities that supports them. My peer suggested me to seek feedback from language developers, because even though IEEE-754 is a prescriptive standard for floating point arithmetic and programming languages that use them, my peer is no longer familiar with the state of art in languages.

IEEE-754 has grown beyond a hardware standard since its 2008 revision, so would you mind telling us, how does your narrow-width FP hardware help with languages with regard to mathematical exception handling?

With "you" I was referring to jdahlstrom and tczajka who were talking about whether or not NaN can turn back to another value through float operations.

(emphasis mine)

While I can't say how much extra work it requires for the hardware design, the IEEE exception bits don't prevent out-of-order execution. A floating point arithmetic operation may only ever choose to set an exception bit, but doesn't get to read its previous value, nor unset it. This means that, assuming no traps, the compiler and CPU may reorder floating point operations almost freely, just not past the specific operations that read or clear the exception bits.