Motivation
The performance of some portable SIMD vector operations can, in some cases, be significantly improved by making assumptions about the kind of arithmetic that is allowed.
For example, some of the horizontal reductions like min
/max
benefit from assuming math to be
finite, that is, absence of NaN
s, while other horizontal reductions like sum
and product
greatly benefit from assuming math to be associative since that allows tree-reductions.
The same applies to other operations in std
, like Iterator<Output: Float>::sum()
where in the absence of fast-math the elements must be summed linearly, but if one could assume associative arithmetic one could perform a tree-reduction using SIMD vector intrinsics.
Also, plain floating-point arithmetic, like, x = x*x*x*x*x*x*x*x;
, which today must be written using sequences of fmul_fast(x, fmul_fast(x, fmul_fast(x, ...)..)
.
Summary
This RFC allows users to specify and detect exactly which kinds of assumptions about floating-point math is the program allowed to make.
Guide-level explanation
The #[fp_math]
attribute allows specifying at crate, module, function, or block scope, which IEEE754 restrictions the compiler is allowed to lift in which contexts. For example:
fn foo(x: f32x4, y: f32x4) -> f32x4 {
let (w, z) =
#[fp_math(assume = "associativity")] {
// All fp math is associative, reductions can be unordered:
let w = x.sum();
let z = y.sum();
(w, z)
};
let m = (w + z) * (x + y);
#[fp_math(assume = "finite")] {
// All fp math is assumed finite, reduction can assume NaNs
// aren't present:
m.max()
}
}
The attribute can be used to detect the assumptions in a given context at compile-time:
impl f32x4 {
fn sum(self) -> f32 {
if #[fp_math(is = "associative")] {
// tree reduction
} else {
// linear reduction
}
}
}
and it also controls the optimizations allowed for code that operates on floating-point values and vectors in general:
let x: f32 = ...;
// Must perform 7 ordered multiplies
x = x*x*x*x*x*x*x*x;
#[fp_math(assume = "associative)] {
x = x*x*x*x*x*x*x*x;
// can be reduced to 3 ordered multiplies:
// x *= x;
// x *= x;
// x *= x;
}
Finally, we could add rustc
flags that allow setting these externally for each crate. This would allow recompiling the whole crate hierarchy with certain fast-math
flags enabled. For this to be effective we would probably need to easily allow users to recompile std themselves, since shipping pre-compiled versions of std
for each possible combinations of floating-point math flags would add significant overhead to the release process.
Implementation
The #[fp_math(assume = "..")]
attributes just set LLVM fast-math flags. The hard part is making the #[fp_math(is = "...")]
detection work.
For assumptions set at crate scope, we could use the same mechanism as for cfg!
. We could expand this to making detection work inside a function for the assumptions set at function scope.
However, propagating assumptions across call-sites in such a way that is detectable, and for example, useful for the sum
example above, seems like a very hard problem to solve in practice. If there are LLVM intrinsics to detect fast-math flags, maybe we could emit those, but I donât know if these exist. If they donât, #[fp_math(is = "...")]
can be limited to detect assumptions set at crate / function scope.
Alternatives
We currently just add functions to core::intrinsics
to perform these operations with different guarantees (e.g. fmul_fast
). For the portable vector types and for Iterator we could do the same, for example, add .sum_unordered()
variant of sum
that assumes associativity.
EDIT: an alternative mentioned by @scottmcm would be to use type wrappers instead to propagate these assumptions, for example, Finite<Associative<f32>>
, see here: Pre-Pre-RFC floating-point math assumptions (fast-math) .
Prior Art
The C++ parallel STL, which solves part of this problem using coarse-grained execution policies. Most C/C++ compilers have -ffast-math
options that can be used to enable assumptions about floating-point arithmetic at the translation-unit level.