Add float types with niche

Considering that stable and flexible custom niche optimizations are still a long way off, should Rust's standard library add floating-point types with niches, such as FiniteF* or NonNaNF* ? This would allow direct and concise implementation of NaN-boxing techniques using Rust's enum syntax. For example:

enum NaNBox {
    Float(NonNaNF64),
    NaN,
    Bool(bool),
    Int(i32),
    Char(char),
}

fn main() {
    assert_eq!(size_of::<NaNBox>, size_of::<f64>);
}

Alternatively, more detailed custom floating-point types could be defined:

enum MyFloat32 {
    NonNaN(FiniteF32),
    Inf,
    QNaN,
}

fn main() {
    assert_eq!(size_of::<MyFloatF32>, size_of::<f32>)
}
2 Likes

A problem with this is that NaNs aren't represented by a single bit flag, but by a specific bit pattern in the exponent part. Meaning a Option<NonNaNF64> will be the same 16 byte size as Option<f64>, as there's no usable niche.

Niches aren’t bit flags but values. And the canonical qNaN is a single unique value (well, modulo the sign bit anyway…) Supporting NaN payloads would be trickier without compiler magic but they’re super obscure. And if compiler magic is needed, that’s an argument for inclusion in std!

1 Like

I recently wrote the opinion that std should probably not gain a NotNaN/NonNaN wrapper type, because it’s unclear what the behavior of most operations under that restriction should be. I now think that was too strong a statement; a type which simply does not have any operations other than those which naturally do not produce NaN, and which can provide niches, Eq, and Ord, does seem worthwhile.

5 Likes

At the library level, NonZero and NonNull communicate their valid range to the compiler via attributes, rustc_layout_scalar_valid_range_start/end, and the compiler uses the remainder as the niche. You can also see this in the internal representation for a type like char -- playground:

error: layout_of(char) = Layout {
           size: Size(4 bytes),
           align: AbiAlign {
               abi: Align(4 bytes),
           },
           backend_repr: Scalar(
               Initialized {
                   value: Int(
                       I32,
                       false,
                   ),
                   valid_range: 0..=1114111,
               },
           ),
           fields: Primitive,
           largest_niche: Some(
               Niche {
                   offset: Size(0 bytes),
                   value: Int(
                       I32,
                       false,
                   ),
                   valid_range: 0..=1114111,
               },
           ),
           uninhabited: false,
           variants: Single {
               index: 0,
           },
           max_repr_align: None,
           unadjusted_abi_align: Align(4 bytes),
           randomization_seed: 18446462607322775572,
       }

Note that this does not do anything special for the surrogate range, 0xD800..=0xDFFF, but in theory we could have two valid ranges and skip the surrogates as niches too.

Similarly, floating point NaN has two continuous ranges (for positive and negative), and a NonNaN could theoretically mark two valid ranges that exclude all NaNs. But like char, we don't have to be perfect for niche purposes, so for f32 we could use valid 0x0..=0xff800000 to leave all negative NaNs as niches. At the library level, we would still enforce that positive NaN isn't allowed either.

3 Likes

Yes, and conveniently things like infinities are not a problem for Ord:

Really, for computation probably the right path will basically always be to turn them back into f32s, do lots of computation, and then only at the end convert back. Checking in rust for NAN at every step is probably going to destroy the perf.

4 Likes

Yes, but the matter might be slightly more complex. Of course, one could simply treat f* as u* to construct niche, but the NaN-boxing technique in the example might not be implemented very effectively. Specifically, it seems that the current Rust compiler isn't particularly intelligent in leveraging niches. For instance, if one simply constructs:

struct NicheFloat64(u64 but only valid in ...)

Then one issue (If there are any mistakes, please point them out. My understanding of some low-level behaviors isn't very thorough) is that if the enum includes another non-zero-sized variant, such as:

enum Foo {
    A(NicheFloat64),
    B(u8)
}

The size of Foo does not appear to remain the same as NicheFloat64 (My guess is that it's because the rust compiler currently makes a strict distinction between enumerated tagged values and enumerated data) , even though it theoretically could. To circumvent this behavior, I think a possible trick is:

struct Head(u16) // Covers the exponent bits, marking the niche

pub struct NicheFloat {
    head: Head,
    mid: u16,
    tail: u32
}

This way, it seems possible to use NaN-boxing to a certain extent normally. Currently, I believe that if the memory layout is carefully designed, there should be almost no performance overhead?

Problem here, I believe, is that padding bytes are uninitialized and thus cannot be used to store discriminants. If you got a Foo and don't know if it's A or B, you can't read any of the upper seven bytes because you don't know if they're init or not.

Foo doesn’t have to have padding in any of its bytes. In particular, it can have the following layout.

     0  1  2  3  4  5  6  7
A =  f6 f6 f6 f6 f6 f6 f6 f6
B =  u8 pp pp pp pp pp dd dd
  • f6 = bytes of A's NicheFloat64 field
  • u8 = bytes of B's u8 field
  • dd = discriminant value encoded into NicheFloat64's niche
  • pp = bytes that can be left as padding (or zeroed or made part of the discriminant area; whichever is convenient for the compiler)

The compiler doesn’t currently support this kind of layout, but it could.

2 Likes

The compiler also doesn't use a niche for Either<char, u8>, but it could in theory.

1 Like

Yes, true, of course. I sort of confused it with the case where the padding is internal to the value inside B.

I believe that the correct way to do a niched float type is to ban most NaN values, whilst leaving a few available. For example, you could ban all NaN values other than "indefinite" and "canonical" (I can't remember whether these are the same as each other or not, but it probably doesn't matter).

That way, operations that produce NaN will still have a value to return – you just change the NaN value to one of the legal values (probably "canonical"). On some hardware, this might need an explicit check for non-canonical NaNs to canonicalise them. However, many hardware floating-point implementations won't generate NaN values that they didn't see in the input (other than indefinite, canonical, and quiet versions of signalling NaNs from the input), meaning that canonicalisations would be zero-cost on those platforms (the only runtime performance cost would happen at the time you converted, e.g., f64 to F64WhereNaNsAreCanonical – operations within the niched float type would have no runtime overhead).

I haven't found a use for this yet, because I don't work with floats much, but it seems like the sort of type that probably would be useful for a lot of users.

EDIT: thinking about it, the simplest version to name, understand, and implement would be QuietF32/QuietF64 – a float type that could hold any value other than a signalling NaN. "If this float is a signalling NaN, convert it to its quiet version" is an operation that was specifically designed to be easy to implement in hardware (and which most modern hardware does implicitly on every float operation), and can be done easily in software too; float operations (other than bitwise float operations AND/OR/XOR) can't produce a signalling NaN unless there was a signalling NaN in the input, so very few changes would need to be made to the generated code; and the range of signalling NaN bit patterns is contiguous (except for the sign bit) so it could be understood by the compiler's existing support for niches.

4 Likes

The main use case for nan boxing seems to be interpreter runtimes for dynamic languages that use a lot of float. I.e. JavaScript. But maybe there are other scripting languages like that.

EDIT: I seem to remember other interpreted languages having their numbers tagged in other ways. I have a memory of reading about a Lisp with 62 bit integers, using the top few bits to tag pointers in cons-cells and similar things.

I understand your perspective, but I'd like to point out that SNaN is a valid f* value in Rust. The generated QuietF* type still requires validation during initialization, which might involve a software check rather than an optimized hardware instruction. This approach could potentially impact performance in cases where distinguishing between NaN types isn't necessary.

Adding a niche is obviously going to require validation if you're initialising the values using the primitive float values – but that's true regardless of where the niche is. The QuietF* approach seems to avoid all the other possible downsides, though (and if you operate entirely on quiet floats rather than regular floats, you don't have to pay the performance penalty of validation anywhere).

I imagine that the usual way to create a QuietF* would not be TryFrom, but rather a new that quiets signalling NaNs rather than rejecting them. On most processors, this is a single instruction (adding negative zero to the float in question, which converts signalling NaN to quiet NaN and is otherwise a no-op with usual FPU configurations) – and I imagine that it would usually be possible to optimise it out (you could optimise it out if the float were a constant or if it was produced by float arithmetic instructions, and there aren't many other places that people commonly obtain floats from; the most likely source of a signalling NaN would be when deserialising the float from a binary format, in which case the code in question is likely I/O-bound rather than CPU-bound and the extra instruction doesn't cost anything).

3 Likes

So in another thread, I was talking about how it's useful to have functions/methods that have the same semantics as hardware features, so that you can write code that uses those functions/methods and they get compiled into the code that runs efficiently on the hardware, whilst still being well-specified on processors where those hardware features don't exist.

And then I remembered a quirk of most modern floating-point hardware. It turns out that some IEEE floating-point values (the "subnormal numbers", formerly known as "denormal numbers") are very difficult for hardware to deal with. Some floating-point units (including those typically used on x86-64) are extremely slow at processing subnormal numbers (sometimes more than 400× slower), to the extent that it even becomes a security vulnerability (Wikipedia pointed me to this paper (PDF), which demonstrates how a web page can read information across a security boundary by tricking the browser into doing floating-point calculations on the information it wants to read and timing how long it takes). Some floating-point units (e.g. the vector unit on 32-bit ARM) can't handle subnormals at all – they process them as though they were alternative encodings of 0 and never output them. This means that a Rust program written to use the full range of f32 and f64 is, on many modern processors, going to risk performance issues (either because it might encounter a subnormal that causes a huge slowdown at runtime, or because it has to use a slower floating-point unit for fear that it might encounter a subnormal that would cause the faster floating-point unit to produce incorrect results).

Instead, the fastest option nowadays is normally to use an alternative floating-point representation in which the subnormals are considered out of range – subnormals are not allowed as inputs (with hardware typically interpreting bit patterns that would normally represent a subnormal number as though zero had been provided instead), and if the output of the operation would be a subnormal number, the floating-point unit produces zero instead. Many modern FPUs provide support for this behaviour, usually as an option, but sometimes as the only behaviour they support. This is a different behaviour from Rust's existing f32 and f64 – but it's an internally consistent and well-specified behaviour, so it's possible to imagine NormalF32 and NormalF64 types that implement it. For many applications, these would be more useful than f32 and f64 would be – they vectorise better on some processors, and avoid the "worst-case timing" that can happen on others, whilst generally being accurate enough for most floating-point calculations in practice (all the subnormal numbers have extremely small absolute values, much smaller than any nonzero number that would be used by any typical calculation).

It seems like it would be useful to support these types in Rust – and that would have the useful side effect of providing niched floating-point types (with all the subnormals serving as niches). It wouldn't necessarily produce the same results as f32 and f64 would, but the results would nonetheless be well-specified and likely adequate for most programs (indeed, more useful than f32 and f64 due to having better and more consistent performance). Many C compilers support the infamous -ffast-math option which gives speed but with the cost of inconsistent results and an unclear specification (and sometimes even sacrificing soundness), and even though options like that are abhorrent to most Rust programmers, they are quite popular and exist because they serve an actual need. NormalF32/NormalF64 seem like a good start to satisfying the requirements of people who would consider using -ffast-math (which might potentially be important for competing with C), but with a defined specification and without the unsafety.

The niche is also in a really nice spot (bit patterns corresponding to nonzero positive integers below 223 / 253). In particular, if the allocator avoids using addresses above 253 (and most allocators do in fact do that, although of course it isn't a stable guarantee), every non-null pointer value fits into the niche of the 64-bit version, and that fact surely has to be useful for some purpose.

6 Likes

This sounds crazy until you do the mental math and realize that 0..2^53 is just 1/2048th of an eight-byte type's available bit patterns. So even though the absolute number of niches is hilariously large, it's still almost nothing relative to the full range. Exponents are fun.

2 Likes