### Motivation

The performance of some portable SIMD vector operations can, in some cases, be significantly improved by making assumptions about the kind of arithmetic that is allowed.

For example, some of the horizontal reductions like `min`

/`max`

benefit from assuming math to be
finite, that is, absence of `NaN`

s, while other horizontal reductions like `sum`

and `product`

greatly benefit from assuming math to be associative since that allows tree-reductions.

The same applies to other operations in `std`

, like `Iterator<Output: Float>::sum()`

where in the absence of fast-math the elements must be summed linearly, but if one could assume associative arithmetic one could perform a tree-reduction using SIMD vector intrinsics.

Also, plain floating-point arithmetic, like, `x = x*x*x*x*x*x*x*x;`

, which today must be written using sequences of `fmul_fast(x, fmul_fast(x, fmul_fast(x, ...)..)`

.

#### Summary

This RFC allows users to specify and detect exactly which kinds of assumptions about floating-point math is the program allowed to make.

#### Guide-level explanation

The `#[fp_math]`

attribute allows specifying at crate, module, function, or block scope, which IEEE754 restrictions the compiler is allowed to lift in which contexts. For example:

```
fn foo(x: f32x4, y: f32x4) -> f32x4 {
let (w, z) =
#[fp_math(assume = "associativity")] {
// All fp math is associative, reductions can be unordered:
let w = x.sum();
let z = y.sum();
(w, z)
};
let m = (w + z) * (x + y);
#[fp_math(assume = "finite")] {
// All fp math is assumed finite, reduction can assume NaNs
// aren't present:
m.max()
}
}
```

The attribute can be used to detect the assumptions in a given context at compile-time:

```
impl f32x4 {
fn sum(self) -> f32 {
if #[fp_math(is = "associative")] {
// tree reduction
} else {
// linear reduction
}
}
}
```

and it also controls the optimizations allowed for code that operates on floating-point values and vectors in general:

```
let x: f32 = ...;
// Must perform 7 ordered multiplies
x = x*x*x*x*x*x*x*x;
#[fp_math(assume = "associative)] {
x = x*x*x*x*x*x*x*x;
// can be reduced to 3 ordered multiplies:
// x *= x;
// x *= x;
// x *= x;
}
```

Finally, we could add `rustc`

flags that allow setting these externally for each crate. This would allow recompiling the whole crate hierarchy with certain `fast-math`

flags enabled. For this to be effective we would probably need to easily allow users to recompile std themselves, since shipping pre-compiled versions of `std`

for each possible combinations of floating-point math flags would add significant overhead to the release process.

#### Implementation

The `#[fp_math(assume = "..")]`

attributes just set LLVM fast-math flags. The hard part is making the `#[fp_math(is = "...")]`

detection work.

For assumptions set at crate scope, we could use the same mechanism as for `cfg!`

. We could expand this to making detection work inside a function for the assumptions set at function scope.

However, propagating assumptions across call-sites in such a way that is detectable, and for example, useful for the `sum`

example above, seems like a very hard problem to solve in practice. If there are LLVM intrinsics to detect fast-math flags, maybe we could emit those, but I donât know if these exist. If they donât, `#[fp_math(is = "...")]`

can be limited to detect assumptions set at crate / function scope.

### Alternatives

We currently just add functions to `core::intrinsics`

to perform these operations with different guarantees (e.g. `fmul_fast`

). For the portable vector types and for Iterator we could do the same, for example, add `.sum_unordered()`

variant of `sum`

that assumes associativity.

EDIT: an alternative mentioned by @scottmcm would be to use type wrappers instead to propagate these assumptions, for example, `Finite<Associative<f32>>`

, see here: Pre-Pre-RFC floating-point math assumptions (fast-math) .

### Prior Art

The C++ parallel STL, which solves part of this problem using coarse-grained execution policies. Most C/C++ compilers have `-ffast-math`

options that can be used to enable assumptions about floating-point arithmetic at the translation-unit level.