Scientific Computing - NaN as a 'None' Option or an 'Err' Result

The aim with this sort of hardware feature is normally to write the code in such a way that it's a) defined in the abstract machine, but b) "obviously" maps to the hardware, in such a way that the optimizer is able to easily recreate the code you had in mind.

For example, the carry flag is a register-like detail of x86-64 that doesn't exist in the Rust abstract machine. But (using unstable features) you can write code that is obviously intended to use it:

#![feature(bigint_helper_methods)]
pub fn add_128bit(a: (u64, u64), b: (u64, u64)) -> (u64, u64) {
    let (answer_low, carry) = a.0.carrying_add(b.0, false);
    let (answer_high, _) = a.1.carrying_add(b.1, carry);
    (answer_low, answer_high)
}

From the Rust abstract machine point of view, carry here is just a bool (that conceptually has a stack slot allocated to it), and thus if you compile this for a processor that doesn't have a carry flag it will still give the correct result. But if you compile this for x86-64, using a nonzero optimisation level, LLVM will notice that this is exactly the situation the processor-level "add with carry" instruction is designed for and will actually store carry in the processor's carry flag – it isn't ever stored in memory, and isn't ever stored in a general-purpose register (even though a bool would normally be stored in a general-purpose register).

Most of the time, when you're trying to support a hardware feature in a language, this is the optimal way to do it: you specify a software API, in terms of things that are observable to the abstract machine, that just happens to do the same thing that the hardware implements natively. Then you ensure that if someone uses that API and you're compiling for a processor that supports the instruction natively, you compile it into code that makes use of the relevant hardware feature.

The situation with C and x87 is weird because a) it allows you to do this sort of thing, but b) in practice programmers usually don't. If you want to get optimal performance from the x87 floating point unit whilst ensuring that the results mathematically match those from the abstract machine, you have to convert your floating-point values from double to register long double whenever you load them from memory to do calculations on them, then from register long double back to double fo storage. In practice, most C programmers don't actually write code like that, so C compilers have been known to do it for them (on occasion, even without being asked), leading to results that don't exactly match the results you would get on the abstract machine and which appear to convert the numbers between 64-bit and 80-bit format arbitrarily. (I know RalfJung knows this, but for other thread participants who haven't seen it: even Rust has not always been immune to x87's extra precision causing deviations from the abstract machine and memory unsafety as a consequence. It is theoretically possible to write correct float code using x87, but it is very slow compared to the incorrect version and, apparently, easy for a compiler to get wrong.)

3 Likes