So in another thread, I was talking about how it's useful to have functions/methods that have the same semantics as hardware features, so that you can write code that uses those functions/methods and they get compiled into the code that runs efficiently on the hardware, whilst still being well-specified on processors where those hardware features don't exist.
And then I remembered a quirk of most modern floating-point hardware. It turns out that some IEEE floating-point values (the "subnormal numbers", formerly known as "denormal numbers") are very difficult for hardware to deal with. Some floating-point units (including those typically used on x86-64) are extremely slow at processing subnormal numbers (sometimes more than 400× slower), to the extent that it even becomes a security vulnerability (Wikipedia pointed me to this paper (PDF), which demonstrates how a web page can read information across a security boundary by tricking the browser into doing floating-point calculations on the information it wants to read and timing how long it takes). Some floating-point units (e.g. the vector unit on 32-bit ARM) can't handle subnormals at all – they process them as though they were alternative encodings of 0 and never output them. This means that a Rust program written to use the full range of f32
and f64
is, on many modern processors, going to risk performance issues (either because it might encounter a subnormal that causes a huge slowdown at runtime, or because it has to use a slower floating-point unit for fear that it might encounter a subnormal that would cause the faster floating-point unit to produce incorrect results).
Instead, the fastest option nowadays is normally to use an alternative floating-point representation in which the subnormals are considered out of range – subnormals are not allowed as inputs (with hardware typically interpreting bit patterns that would normally represent a subnormal number as though zero had been provided instead), and if the output of the operation would be a subnormal number, the floating-point unit produces zero instead. Many modern FPUs provide support for this behaviour, usually as an option, but sometimes as the only behaviour they support. This is a different behaviour from Rust's existing f32
and f64
– but it's an internally consistent and well-specified behaviour, so it's possible to imagine NormalF32
and NormalF64
types that implement it. For many applications, these would be more useful than f32
and f64
would be – they vectorise better on some processors, and avoid the "worst-case timing" that can happen on others, whilst generally being accurate enough for most floating-point calculations in practice (all the subnormal numbers have extremely small absolute values, much smaller than any nonzero number that would be used by any typical calculation).
It seems like it would be useful to support these types in Rust – and that would have the useful side effect of providing niched floating-point types (with all the subnormals serving as niches). It wouldn't necessarily produce the same results as f32
and f64
would, but the results would nonetheless be well-specified and likely adequate for most programs (indeed, more useful than f32
and f64
due to having better and more consistent performance). Many C compilers support the infamous -ffast-math
option which gives speed but with the cost of inconsistent results and an unclear specification (and sometimes even sacrificing soundness), and even though options like that are abhorrent to most Rust programmers, they are quite popular and exist because they serve an actual need. NormalF32
/NormalF64
seem like a good start to satisfying the requirements of people who would consider using -ffast-math
(which might potentially be important for competing with C), but with a defined specification and without the unsafety.
The niche is also in a really nice spot (bit patterns corresponding to nonzero positive integers below 223 / 253). In particular, if the allocator avoids using addresses above 253 (and most allocators do in fact do that, although of course it isn't a stable guarantee), every non-null pointer value fits into the niche of the 64-bit version, and that fact surely has to be useful for some purpose.