Scientific Computing - NaN as a 'None' Option or an 'Err' Result

RalfJung · September 11, 2025, 3:44pm

For a compiler, that means it could not reorder float operations around opaque function calls, which is pretty bad.

zackw · September 11, 2025, 4:26pm

Also, as far as the out-of-order engine is concerned, updates to the exception bits are read-modify-write operations on a single register. If you don't specifically add logic to the CPU that says "most (not all!) of the instructions that can touch this register have a commutative effect on it, so RMW ops on this register usually (not always!) don't inhibit reordering", RMW ops on that register will inhibit reordering.

The extra logic has a cost, at least in die area and design validation search space.

quaternic · September 12, 2025, 1:31pm

That's only if opaque function calls are allowed to change the current rounding mode or observe and clear exceptions in the caller's context, which I'm not proposing.

With regards to the floating point environment (fenv), we can roughly split a hierarchy of three kinds of functions, each a subset of the previous kind:

(unannotated) may assume the default state and that exceptions are not observed
extern "Rust-fenv" interact with the fenv like a floating point operation
extern "Rust-nofenv" do not depend on the fenv state and preserve it

The implementation of each may only call functions of the same or subsequent kind. Using the ABI string for this is not my preferred solution, but it does minimize new additions for the purposes of this explanation.

Specifically, extern "Rust-fenv" functions use the current rounding mode as an input, and may raise exception status bits as an output. Because they are not allowed to use the exception status bits as an input, they can be reordered with other operations of the same kind (if otherwise allowed).

Crucially, the above classification does not include operations that change the current rounding mode or observe or clear exceptions in the caller's context. Those are only needed to unsafely implement properly scoped forms, which the standard library could provide. A simplified API might look like:

/// Evaluate a floating point computation with the given rounding mode,
/// and return any exceptions it raises.
extern "Rust-nofenv" fn with_float_rounding<T>(
    rounding_mode: Round,
    f: extern "Rust-fenv" fn(&mut T),
    args: &mut T,
) -> FloatExceptions { ... }

RalfJung · September 12, 2025, 1:47pm

Oh, it's definitely possible to define this in a way that still allows local reasoning. But that's not what is specified in IEEE 754.

quaternic · September 12, 2025, 6:39pm

Are you saying that the standard requires something that fundamentally prevents local reasoning? What specifically?

It does require that users have access to operations like clearFlags and testFlags (which need not be called by those names, nor be actual functions). However, it also gives significant freedom for the language to define how the flags are scoped. Among others, the language standard should specify defaults for:

Whether flags raised in invoked functions raise flags in invoking functions.

Whether flags raised in invoking functions raise flags in invoked functions.

So it seems conformant to say that a function that uses the language-defined ways of accessing the status flags has its own independent state, and the compiler inserts the clearing and restoring of the hardware register only for those functions. And if we did that, it would just be helpful to require some more direct opt-in.

quaternic · September 12, 2025, 7:41pm

A simple implementation could make the operations that may access those bits in a non-accumulating way serialize the pipeline. Then it's just a case of wiring all the flag outputs from the out-of-order arithmetic units OR together into the accumulating status register. That should be relatively minimal, while only pessimizing code by O("how often is the status register explicitly cleared or tested").

Clearly there's a cost, but I'm wondering how that cost compares to other ways that the hardware could provide the same information.

zackw · September 13, 2025, 3:27am

I'm not a CPU architect but my understanding is that having to have special logic for the flag bits register in the first place is the problem; not the details of what the logic should be.

Many of the conditions that, in IEEE 754, cause exceptions to be raised, also produce a special value in the result register (Inf or NaN); if we could figure out a way to do that for the "underflow" and "inexact" conditions as well, we wouldn't need the exception bits. I have actually penciled out a way to do that; long story short, how do you feel about giving up one bit of mantissa?

ais523 · September 13, 2025, 5:20am

Given that this is a hypothetical hardware implementation of floats, you don't need to give up a bit of the mantissa – instead you just add an extra bit to the floating-point registers (meaning that floats are wider in the CPU than they are in memory). The compiler and operating system would need to know about it to be able to spill it (the compiler spills if it needs the register for something else, the operating system kernel spills it when doing a context switch), but programs written for existing IEEE floats would be able to run unmodified. (Note that it may make sense to do the same thing with integers, too – IIRC some processors do that so that they don't need to implement core-global flag bits and all the logic that comes along with them.)

RalfJung · September 13, 2025, 9:15am

I don't know what the standard says, since it is behind a paywall so clearly its authors don't actually want people to read and comment on it. (I know I can find copies online, no idea how legal that is. But I am disinclined to bother with this for non-free standards.)

It seems you are saying the standard prescribes so little about error flags that both global state and per-function state for the error flags would be allowed? Interesting, I didn't know that. That'd make it kind of impossible to write portable code that works against all implementations, and anyway no implementation actually picked it up so it's kind of a moot point.

(Per-function state also has its problems of course, e.g. for inlining/outlining transformations.)

You can't just make f32 be 33 bits in memory, there's just no space for the extra bit when the float is in a struct/array/anything.

ais523 · September 13, 2025, 10:53am

No, you make f32 32 bits in memory (the extra bit is only stored to memory as a consequence of a register spill, and registers are untyped so a floating point register is not an f32). The software would need to handle the exception bit before storing the value into memory if it cared about what exceptions had occurred. It's quite common for the best representation of a value for performing calculations to be different from the best representation for storing it in memory.

This sort of hardware-driven optimisation is easy to take advantage of in assembly code, but hard to express in Rust or to handle in compiler backends like LLVM. For example, u64::midpoint can be expressed in two hardware instructions on x86-64 (add the two numbers, then halve the resulting 65-bit integer), but Rust+LLVM currently compile it as four even when compiling for size – I suspect the reason is that LLVM simply doesn't consider the possibility of using a 65-bit intermediate result. (It is unclear which approach is better when compiling for speed – the two-instruction version is 2× faster on some processors and 3× slower on others – so I don't want to call the current codegen wrong except when size is the primary optimisation goal.)

I suspect the correct approach for a language like Rust to take, if it wanted to be able to express / interface with this sort of hardware feature, would be to add the "register format" as a separate type, in addition to the "memory format" – that way, users who needed things like precise float exception tracking would have a way to express it. C effectively does this when compiling for x87: C compilers targeting x87 generally make double the 64-bit memory format and long double the 80-bit register format. But I don't think this sort of hardware feature is implemented commonly enough for a change like that to currently be beneficial at the software level (and it would likely be inherently architecture-specific even if it were).

RalfJung · September 13, 2025, 11:12am

Registers are an implementation detail. As far as the Rust spec/AM is concerned, the in-memory representation is the only representation of any type. I know C plays some funky games where a double may use 80bit precision but then gets truncated to 64bit at an unspecified moment in time; Rust does none of this since it is awfully underdefined.

ais523 · September 13, 2025, 11:52am

The aim with this sort of hardware feature is normally to write the code in such a way that it's a) defined in the abstract machine, but b) "obviously" maps to the hardware, in such a way that the optimizer is able to easily recreate the code you had in mind.

For example, the carry flag is a register-like detail of x86-64 that doesn't exist in the Rust abstract machine. But (using unstable features) you can write code that is obviously intended to use it:

#![feature(bigint_helper_methods)]
pub fn add_128bit(a: (u64, u64), b: (u64, u64)) -> (u64, u64) {
    let (answer_low, carry) = a.0.carrying_add(b.0, false);
    let (answer_high, _) = a.1.carrying_add(b.1, carry);
    (answer_low, answer_high)
}

From the Rust abstract machine point of view, carry here is just a bool (that conceptually has a stack slot allocated to it), and thus if you compile this for a processor that doesn't have a carry flag it will still give the correct result. But if you compile this for x86-64, using a nonzero optimisation level, LLVM will notice that this is exactly the situation the processor-level "add with carry" instruction is designed for and will actually store carry in the processor's carry flag – it isn't ever stored in memory, and isn't ever stored in a general-purpose register (even though a bool would normally be stored in a general-purpose register).

Most of the time, when you're trying to support a hardware feature in a language, this is the optimal way to do it: you specify a software API, in terms of things that are observable to the abstract machine, that just happens to do the same thing that the hardware implements natively. Then you ensure that if someone uses that API and you're compiling for a processor that supports the instruction natively, you compile it into code that makes use of the relevant hardware feature.

The situation with C and x87 is weird because a) it allows you to do this sort of thing, but b) in practice programmers usually don't. If you want to get optimal performance from the x87 floating point unit whilst ensuring that the results mathematically match those from the abstract machine, you have to convert your floating-point values from double to register long double whenever you load them from memory to do calculations on them, then from register long double back to double fo storage. In practice, most C programmers don't actually write code like that, so C compilers have been known to do it for them (on occasion, even without being asked), leading to results that don't exactly match the results you would get on the abstract machine and which appear to convert the numbers between 64-bit and 80-bit format arbitrarily. (I know RalfJung knows this, but for other thread participants who haven't seen it: even Rust has not always been immune to x87's extra precision causing deviations from the abstract machine and memory unsafety as a consequence. It is theoretically possible to write correct float code using x87, but it is very slow compared to the incorrect version and, apparently, easy for a compiler to get wrong.)

quaternic · September 13, 2025, 1:30pm

My understanding is that IEEE 754 is not mandating how the programming language should be structured. It really just wants the operations it defines to be available, and for the language to define how that language maps to those operations.

Yeah, it's not ideal. Note that you may have access through that paywall via your university. I'd also check the university library.

If you do, I'd recommend reading at least chapters 1 and 2 to get a good overview. Then, chapter 7.1 is the relevant part for the status flags, and you might find 10.4 interesting on the "literal meaning" of source code as it relates to allowed optimizations.

zackw · September 13, 2025, 2:57pm

FYI, this is incorrect. Except for the x87, which was a bad design for a whole lot of reasons, floating point registers do in fact track what size of value was most recently deposited into them. I don't know off the top of my head what any current-gen CPU actually does if you follow a single-precision load with double-precision arithmetic, or vice versa, without a conversion instruction in between; but I'd expect at least a pipeline bubble, and I wouldn't be surprised by a trap.

That is not an option I, personally, am interested in pursuing. I have been doing system programming long enough to remember how much of a trainwreck the x87 "registers are extended precision at all times" semantics were...

.... as you go on to describe yourself, in fact! For readers who aren't familiar with the history here, "writing code like that" wasn't even possible in the era when the x86 family's only option for floating point was the x87, because many C compilers didn't even have long double, and most of those that did have long double also had fancy register allocators that would ignore explicit register qualifiers when optimizing.

Perhaps you are thinking that the situation won't be as bad for one extra bit in registers, particularly if it's an "inexact result" flag rather than extra actual precision. However, inexactness needs to be a sticky condition. Rather like NaN, once you've got an inexact value, everything computed from that value needs to also be marked as inexact. If the "this is inexact" bit is discarded when a value is written to memory, we'll have the same problem as we did with the x87. Results won't be reliably recorded as inexact. The defensive programming required to catch all inexact results before they escape registers is likely to be difficult to write, and also likely to hurt performance (I'm thinking particularly about operations on large arrays, where no value stays in registers for longer than absolutely necessary because it needs to get out of the way of new inputs). Compilers are likely to have bugs in this area, so all the defensive programming in the world might not even help!

And on top of all that, a "this result is inexact" marker is just more useful if it's part of the actual value rather than a separate flag. In scientific applications I think people would want it to make it all the way to final output.

For these reasons, the only kind of in-band signal that I'm interested in exploring, as an alternative to IEEE 754 exception bits, is a signal that's fully embedded in the value. The register representation must be the same as the RAM representation and even the external data interchange representation. So I ask again: How do you feel about giving up one bit of mantissa?

ais523 · September 13, 2025, 3:37pm

At least on x86-64, it definitely doesn't trap – you can even mix floating-point and integer instructions on a vector register (and both SSE and AVX do scalar floating-point arithmetic by storing the float in the least significant entry of a vector register and operating on it there – there aren't floating-point registers distinct from the vector registers, unless you count the x87 registers which only remain for backwards compatibility and that SSE/AVX can't use). I think some x86-64 processors do have extra latency when you mix instructions that treat the same register as different data types – but I also think that most modern processors hardly care (and I thought that, on the processors where it matters, the penalty was caused by int/float mixing rather than by f32/f64 mixing).

A plan which requires everyone else in the world to change their floating-point format is never going to succeed. This seems clearly inferior to either of the alternatives (exception flags that apply to the CPU core as a whole, or exception flags that apply to individual registers). Sure, if you want to remember inexactness long-term you have to add extra instructions to read the flag and store it alongside the data – but that's much easier than trying to get everyone you interoperate with to change to a different floating-point format. It's not so much the bit of mantissa that matters, but the loss of compatibility.

jcranmer · September 15, 2025, 6:39pm

Since the topic of floating-point models come up, I'd recommend the paper I wrote for C++ (P3715) as having a good deal of useful background reading. Rust is obviously not C++, but I do have a lengthy explanation of how floating-point models work in practice in compiler implementations and across different programming languages, and a summary of some of the salient aspects of IEEE 754.

When I first got access to IEEE 754, I was actually quite surprised by how little it says. Honestly, you're not really missing that much not having access to it; reading Annex F of the C specification and the TS 18661 extensions for IEEE 754 conformance (both of which have publicly-available of drafts) gives you a good idea of what it contains. As a practical matter, C's definitions generally overrule IEEE 754 anyways (e.g., IEEE 754 says functions like sin need to be correctly-rounded; C explicitly says otherwise).

The main thing about IEEE 754 is that it primarily isn't about dictating the behavior of programming language semantics, rather the hardware semantics for IEEE 754. There are some recommendations for programming languages, but usually on the level of should rather than shall.

IEEE 754 explicitly states that

Language standards should specify defaults in the absence of any explicit user specification, governing:

[...]

Whether flags raised in invoked functions raise flags in invoking functions.

Whether flags raised in invoking functions raise flags in invoked functions.

breath

I covered this in more detail in my C++ paper, but to give the short summary here: x87 only supports long double as a computation type. C has FLT_EVAL_METHOD which allows intermediate computation to be done in this type instead of the declared type, but it requires that the value be converted back into the declared type at certain points. Most compilers don't implement these rules correctly; of the ones I've tested, only icc does. Instead, most compilers just pretend that the x87 instructions implement float and double arithmetic and will, for example, spill a register mid-expression as a double instead of a long double. This behavior doesn't match FLT_EVAL_METHOD == 2, which is why those compilers also say FLT_EVAL_METHOD == -1 instead. It's also considered a bug, but one that nobody really cares to fix because x87 just sucks in general.

Topic		Replies	Views
Pre-RFC: Dealing with broken floating point language design	27	12948	March 25, 2019
(Pre-?)Pre-RFC: Range-restricting wrappers for floating-point types language design	13	3997	March 25, 2019
[Pre-RFC?] NonNaN type libs	62	5879	March 25, 2019
Should `x / 0.` be an error? language design	12	10932	March 25, 2019
-ffast-math as a floating-point type ideas (deprecated)	3	1838	March 25, 2019

Scientific Computing - NaN as a 'None' Option or an 'Err' Result

Related topics