## Summary

Introduction of 16-bit floats (f16, half precision) and 128-bit floats (f18, quadruple precision).

## Motivation

16-bit floats are used in some data formats for storage and in Computer Graphics, such as OpenGL, OpenEXR, JPEG XR. Half precision floats are used for increased dynamic range over 8 and 16 bit integers, and only requiring half the amount of space required for storage, memory and bandwidth.

128-bit floats are used in scientific computing and other computation, with the increased accuracy provided by the extra bits available, up to 33-36 decimal digits of accuracy. This can be useful for computation where accuracy is required. It can also be used for computation of double precision results with increased accuracy, reduced rounding and minimizing overflow.

## Guide Level explanation

It will be the same as `f32`

and `f64`

, except the types being named `f16`

and `f128`

.

## Reference Level

LLVM has support for half and quad precision first class types. This should allow for implementation similar to how `f32`

and `f64`

are currently implemented in the language.

## Drawbacks

`f16`

: It does not have many applications to do with arithmetic and mainly storage/application. It may be more apt to pull it into a library instead.

`f128`

: Not many processors have native support for 128-bit operations (None of Rustâ€™s Tier 1 Architectures), which has similar drawbacks to `f64`

on 32-bit systems and `f32`

on 16-bit systems.

## Unresolved Questions

Given the drawbacks of `f128`

, should `f80`

be implemented instead? This is used by `gcc`

as `long double`

and is also available in LLVM.

There are two different versions of 128-bit floats in LLVM, one based on IEEE 754-2008â€™s binary128 floats, and `ppc_fp128`

, which is two 64-bits.

ARM has two different half precision float types, the IEEE one and an alternative one. How will support be provided?

Should it be implemented in `std`

? or should only the `core::intrinsics`

be added?