I think NaN is perfectly analogous to nullptr. It's the same bad design: the absence of a value, but baked into the actual type. I am pretty sure that f64
and f32
are most of the time used as a number, and not as maybe-a-number. I don't have statistical data, but I guess that about 95% of code does assume that their floating point data isn't not a number. Just like probably 85% of code assumes that the objects it uses aren't nullptr. As we all know, the difference between what we think the code does, and what the code actually does, is bugs.
How is nullptr avoided in Rust? It's the Option
type. It is returned where a computation may return nothing instead of the actual result, and where a variable may contain a value or nothing.
I think the same pattern should be applied to floating point types.
I think there are mainly two use cases considering NaN:
-
Control Flow.
It is required to check the validity of a computation at runtime.
Example:
A division may divide by zero, so the programmer has to check if the dividend is zero before dividing. This use case will always require a runtime check, and it can be done before division by checking the dividend, or after computing by checking the result for NaN.
-
Asserting a number.
A computation is assumed to not return not a number (it should return a number).
Example:
Computing the average age of two persons. As mentioned before, a simple addition may return NaN where one operand is infinity and the other is negative infinity. In this group of use cases however, the programmer assumes that each operands is finite anyways. I estimate that roughly 80% of code assumes all the variables are finite numbers (finite and not not-a-number).
So does accessing an element in an array. But Rust was designed to be ergonomic and safe, so we decided to check array indexing every single time.
Code that uses numbers[i]
assumes that the index is in range, but Rust will check for it anyways. Every time. Even in release builds. Rust checks the array index, even though the programmer probably feels really confident that the array index will never fall out of bounds. There is however a performance optimization: the programmer can disable this behavior for release builds. This corresponds to the second use case.
For cases where the Programmer is uncertain about the index bounds, he uses get
: numbers.get(i).unwrap_or(1)
. This corresponds to the first use case.
The same could be done for floats. For example with division:
There could be two ways to divide a number.
-
a.div(b).unwrap_or(1)
, which returns an optional. This method could also be called 'try_div' or something similar. This is used for the first use case, where computation would require a check anyways.
-
a / b
just assumes the computation will return a number. The result would have to be checked internally. This could panic if the operands are not valid, just like indexing directly by using numbers[i]
. These checks will always be performed, if the programmer does not actively request omitting the checks for the 1% of code where a performance bottleneck has been measured.
I think the perfect solution would be to have all default floats (f64
, f32
, ...) garantueed to never contain NaN, proven at compile time. I know this is a big request. But just adding a separate non-NaN type won't make the average programmer produce safer code. This way, in memory we could represent Option<f64>
just like the old f64
with NaN representing None, just like the OP suggested.