Pre-RFC Introduction of Half and Quadruple Precision Floats (f16 and f128)


That would make sense, so that would be


  • f16, IEEE and ARM:
    • aarch-*
    • arm-*
    • armv5-*
    • armv7-*
  • f128, IEEE and Power:
    • powerpc-* ideally, but it would need to be gated behind a cpuid like flag, where it would only be available on hardware later than Power 9.

optimizations available, not supported natively afaik

  • f16
    • x86-*
    • x86_64-*
  • f128
    • x86_64-*


but it would need to be gated behind a cpuid like flag, where it would only be available on hardware later than Power 9.

We could use a compile-time feature flag, e.g. via target-feature: #[cfg(target_feature = "f128")]. But since users cannot easily re-compile std we might need to ship new targets with the feature enabled (e.g. powerpc9_64-...). We have done things like these in the past (e.g. with arm ± neon), so we might be able to do them again here.

This plan sounds good to me. We want to offer the f16 type in the platforms that natively support it anyways. Once we have an implementation that works there we can try it in other platforms and just see what happens and how good LLVM support is.

This will need an RFC before stabilization, but if someone is willing to put in the work, I don’t think an RFC is needed to land the implementation backbone to rustc as long as the type is only exposed via stdsimd - we don’t even need to expose it in nightly at first (e.g. we can use an off-by-default feature flag in stdsimd to only use it in our CI at first), so PRs welcome I guess :slight_smile:

cc @alexcrichton or do you see any reason why we shouldn’t give this a try ?


My only concern would be the lack of literal support, given that as far as I can tell, only primitive types ( u*, i*, f*, arrays, string literals) have support for literals, perhaps an RFC for something that would allow for literals of any type? because although it being gated behind std::simd means that a lot of newer users would not be using it, it would still be confusing how this

let x: f64 = 10.0;

works and this

let x: f16 = 10.0;



Don’t we already have u/i128 types that lack native support on some platforms, but are supported by the language everywhere because they can always fall back on a software implementation? Why would we want f16/f128’s existence to depend on the architecture when u/i128’s doesn’t?


What I had in mind is that std::arch would just:

pub type f16 = __rustc_f16_literal_type;

and literal support for it would just be in rustc as usual.

This means that

use std::arch::f16;
let x: f16 = 1.0;  // OK
let y = 1_f16; // ERROR
let z = 1___rustc_f16_literal_type; 
// ^^^ERROR: requires feature rustc_f16_literal_type

For all we know we might want to expose f16 everywhere, so adding any sort of special casing for it at this point might not be worth it. For experimentation purposes, the above might just be enough.

User-defined literals is a different orthogonal problem to f16, so while one can use it as a motivation for the other, i would prefer to avoid mixing both proposals.


That just raises another question for me :slight_smile:

How is something like NonZeroU32 even expressible for resolution at type-check time without something like a dependent type? Would it not need to become something like let x: Result<NonZeroU32, Error> = 1;, to account for the fact that the literal might actually be 0?


With f16, the main use is in graphics and storage, and in that case, SIMD support is more important that primitive support. Software fallback (and a lot of hardware implementations) is just to convert to f32, in which case, there isn’t much need for it to be a primitive that is present on all platforms.


From the RFC for u/i128, it’s mentioned that Rust used to have f128. Can anyone that was around back then elaborate? Especially on the choice to add it in, and why it was removed? as @thestinger said in the comments that the reason for removal was unclear


This comment suggest the lack of libm support for f128 as the main reason for its removal. From the PR that removed f128, and the minutes of the meeting where the removal of f128 was decided it appears that the main reason for its removal is that its support wasn’t even close to being finished before the Rust 1.0 release and there never was an RFC for adding it.

Looking at libm’s documentation:

It is also provided for _Float128 and _Float64x on powerpc64le (PowerPC 64-bits little-endian), x86_64, x86, ia64, aarch64, alpha, mips64, riscv, s390 and sparc.

So while _Float128 is supported by libm on a lot of platforms, it is not supported on all platforms. This shouldn’t be a blocker though, since, for example, libm does not support _Float16 anywhere yet LLVM does, so it might be worth it to investigate whether LLVM supports _Float128 in some of the platforms that libm does not like wasm or arm.

In the PR @thestinger mentioned:

there’s no harm in having it available for use with the libgcc_s (libquadmath) f128 support. […] The support is compiler-rt is maturing, and it now supports addition, subtraction, multiplication and has an initial (but buggy) comparison implementation. It will soon have full support for the required operations on 64-bit without needing libgcc_s.

So there seems to be a libquadmath/libgcc_s library that supports these and could be used in place of libm, and that back then compiler-rt was starting to get support for these.


Well it’s currently proposed that const fns should be allowed to panic!, with the message becoming a compilation error if that path is hit, so one option would be to use that.


I think that there is a good case for supporting f16 directly:

  • LLVM will automatically emulate it if it isn’t supported in hardware by converting to f32, performing the operation on f32 and converting the result back down to f16. This actually produces the exact same results as native f16 support since all intermediate values are rounded to f16 and back.

  • For the same reason, we don’t actually need libm support for f16: we can just use the f32 methods.

  • We shouldn’t bother supporting the ARM alternative half-precision format. Both IEEE and AHP are supported in hardware, but AHP is considered deprecated and is disabled by default (you need to mess with the FP control register to enable it).

The case for f128 is much weaker due to the lack of proper hardware and libm support, so I think we should just leave that one to a library.


Let’s clear up some things first:

  • RISC-V does not have f16 support, much less mandatory. Consulting the list of standard extensions, you’ll find f32 (F), f64 (D) and f128 (Q) but no f16. (The future vector extension will include f16, but it’s not there yet and using the vector unit and vector register field for individual scalar f16s is kind of wasteful.)
  • Doing f16 arithmetic in f32 and rounding back to f16 afterwards is not equivalent to proper f16 support because it rounds twice. Thus it can, for example, have 1 ulp error for basic operations like arithmetic or multiplication (where the error should be <= 0.5 ulp). Larger rounding error is especially worrying with f16 which only has 11 bits of precision to begin with.
  • There is apparently some compiler-rt etc. support for fp128 in LLVM (e.g. – though I expect it’s probably at least as buggy and inconsistent as i128 support was.

With that in mind and some other points raised in this thread, my two cents are:

  1. I second the recommendation to split the f16 and f128 proposals.
  2. For f16, I am somewhat concerned by platform-specific types, but it probably beats having two use a soft float library to get correct rounding etc. (which in turn is IMO better than going through f32 and getting double rounding and possible non-determinism)
  3. For f128, I am also concerned about the lack of concrete use cases, but given that there’s probably no big expectation of performance there, having a portable f128 type seems fine if someone’s willing to put in the effort to actually make it work across all our targets (which is a big if!).


Are you sure about this? For 32-bit floats, it’s okay to perform basic operations (*, /, -, +, sqrt) with 64-bit floats and then round back to 32-bit – asm.js uses this to encode 32-bit float operations via explicit Math.fround calls ([1], [2]). If I’m parsing this article correctly the same might hold for 16-bit floats (since the mantissa of 16-bit floats is less than half that of 32-bit ones)? I haven’t tested this in practice, however.


Oh, you’re right, thanks for the correction! Double rounding is generally a problem but I sloppily claimed it applies here without checking. I agree with your reading that double rounding does not make a difference in the cases of f16 add/sub/mul/div/sqrt implemented via the equivalent f32 operation (for sqrt it’s really close but still holds).


Regarding splitting the proposal, the previous consensus was to implement the types in std::arch which AFAIK shouldn’t need an RFC to be included. This was because one of the major usecases for f16 is SIMD, and as both do not have proper hardware support on many platforms, so it would make sense as a std feature rather than a language primitive.

Given all of the above, it seemed like an RFC wasn’t necessary so there is no plans AFAIK to create an RFC for either type.


Some additions to “the standard library” (especially when they are large, or have a large design space, or are potentially controversial) benefit from RFCs as well. And even if it was “just” a PR with FCP, the two types are sufficiently independent that I’d prefer two separate PRs.