Pre-Pre-RFC floating-point math assumptions (fast-math)


#1

Motivation

The performance of some portable SIMD vector operations can, in some cases, be significantly improved by making assumptions about the kind of arithmetic that is allowed.

For example, some of the horizontal reductions like min/max benefit from assuming math to be finite, that is, absence of NaNs, while other horizontal reductions like sum and product greatly benefit from assuming math to be associative since that allows tree-reductions.

The same applies to other operations in std, like Iterator<Output: Float>::sum() where in the absence of fast-math the elements must be summed linearly, but if one could assume associative arithmetic one could perform a tree-reduction using SIMD vector intrinsics.

Also, plain floating-point arithmetic, like, x = x*x*x*x*x*x*x*x;, which today must be written using sequences of fmul_fast(x, fmul_fast(x, fmul_fast(x, ...)..).

Summary

This RFC allows users to specify and detect exactly which kinds of assumptions about floating-point math is the program allowed to make.

Guide-level explanation

The #[fp_math] attribute allows specifying at crate, module, function, or block scope, which IEEE754 restrictions the compiler is allowed to lift in which contexts. For example:

fn foo(x: f32x4, y: f32x4) -> f32x4 {
  let (w, z) = 
  #[fp_math(assume = "associativity")] {
      // All fp math is associative, reductions can be unordered:
      let w = x.sum();
      let z = y.sum();
      (w, z)
  };
  
  let m = (w + z) * (x + y);
  
  #[fp_math(assume = "finite")] {
      // All fp math is assumed finite, reduction can assume NaNs 
      // aren't present:
      m.max()
  }
}

The attribute can be used to detect the assumptions in a given context at compile-time:

impl f32x4 {
    fn sum(self) -> f32 {
        if #[fp_math(is = "associative")] {
            // tree reduction
        } else {
            // linear reduction
        }
    }
}

and it also controls the optimizations allowed for code that operates on floating-point values and vectors in general:

let x: f32 = ...;

// Must perform 7 ordered multiplies
x = x*x*x*x*x*x*x*x;

#[fp_math(assume = "associative)] {
  x = x*x*x*x*x*x*x*x;
  // can be reduced to 3 ordered multiplies:
  // x *= x;
  // x *= x;
  // x *= x;
}

Finally, we could add rustc flags that allow setting these externally for each crate. This would allow recompiling the whole crate hierarchy with certain fast-math flags enabled. For this to be effective we would probably need to easily allow users to recompile std themselves, since shipping pre-compiled versions of std for each possible combinations of floating-point math flags would add significant overhead to the release process.

Implementation

The #[fp_math(assume = "..")] attributes just set LLVM fast-math flags. The hard part is making the #[fp_math(is = "...")] detection work.

For assumptions set at crate scope, we could use the same mechanism as for cfg!. We could expand this to making detection work inside a function for the assumptions set at function scope.

However, propagating assumptions across call-sites in such a way that is detectable, and for example, useful for the sum example above, seems like a very hard problem to solve in practice. If there are LLVM intrinsics to detect fast-math flags, maybe we could emit those, but I don’t know if these exist. If they don’t, #[fp_math(is = "...")] can be limited to detect assumptions set at crate / function scope.

Alternatives

We currently just add functions to core::intrinsics to perform these operations with different guarantees (e.g. fmul_fast). For the portable vector types and for Iterator we could do the same, for example, add .sum_unordered() variant of sum that assumes associativity.

EDIT: an alternative mentioned by @scottmcm would be to use type wrappers instead to propagate these assumptions, for example, Finite<Associative<f32>>, see here: Pre-Pre-RFC floating-point math assumptions (fast-math) .

Prior Art

The C++ parallel STL, which solves part of this problem using coarse-grained execution policies. Most C/C++ compilers have -ffast-math options that can be used to enable assumptions about floating-point arithmetic at the translation-unit level.


#2

IMO adding compiler flags like -ffast-math is a bad idea as it allows you to silently break code written by other people. Having the attributes for your own code is a great idea though.


#3

Maybe one could limit these flags to only act on code that is inside #[fp_math(is = "...")]. That would allow recompiling the whole dependency tree with fast math flags, but they would only have an effect on those dependencies that explicitly allow it.

I think there is a difference between “fast-math” hints to the optimizer, and “fast-math” hints to Rust code. Allowing Rust code to implement and use different high-level algorithms depending on “fast-math” looks like a goal that might be worth exploring. But maybe people should just offer two different algorithms with two different names and be done with it (maybe using cargo features to select different algorithms if desired).


#4

Yes that would be safe, I’m not sure I see a usecase where that would be useful though. If a dependency can support fast math wouldn’t they just add #[fp_math("…")] themselves?


#5

Attributes for this kinda scare me because they lead to the obvious question of “why doesn’t Iterator::sum() use fast math in my fast math block?”, which would be really hard. (Just getting overflow checking right is already hard in core.)

What about new types instead? Like was talked about for nnan in


#6

Something like Finite<Associative<T>> would do the trick here. We’d need a way to parametrize code by Floating point types, but that’s a different problem.


#7

I think allowing finer-grained (compared to what is offered by C/C++) control over fp-math assumptions and optimsations is something great. It goes along well with the explicitness design of Rust, giving the programmer explicit control over everything and allowing to make the best of the compiler. This should definitely be explored. Having this would be a great advantage over C++ for a lot of fp-heavy software, particularly game engines and other things to do with graphics/rendering.

However, we really need to explore the best design/syntax for this feature. I find the attribute syntax proposed by the OP a little ugly, but I don’t have better ideas.


#8

FWIW I think it would be better to just have floating point wrapper types like Finite<T>, Associative<T>, etc. that control the arithmetic behavior of the type just like we currently have Wrapping<T> (see this comment by @scottmcm: Pre-Pre-RFC floating-point math assumptions (fast-math)) .

That would mean that you can’t really influence externally the assumptions that other libraries do. For that, these libraries would need to be generic on the floating point type, and support taking these type wrappers. But this is an orthogonal problem to solve. There are pre-RFCs about adding a Float trait so that one can abstract over f32 and f64, so I can imagine that we would be able to implement this trait for Finite<f32> for example as well.


#9

I like @gnzlbg’s approach, because it encapsulates in the type system whether such simplifying approximations to full IEEE FP behavior are appropriate (or not), rather than trying to project that knowledge into various algorithms. Such orthogonality supports type-generic algorithms and algorithm reuse.


#10

The attribute-ish approach presented in the OP is not workable in my opinion. It would introduce a second, completely ad-hoc, way of introducing monomorphization, and magically reach across abstraction barriers. Pivoting to using types to carry this information from one lexical scope to another seems much better to me for these reasons.

But the type approach also have many limitations. You’d have to convert between the regular float types and these wrappers in many places, which is already very annoying if you juggle individual scalars, and becomes even worse when collections or other abstractions are involved (e.g., have fun mixing Associative<F> with cgmath::Vec2<f32>).

And even if those problems turn out to be manageable, we may still want a crate-/module-/block-level attributes for convenience. It just shouldn’t affect code outside of the lexical scope it’s applied to.


#11

There are, conceptually, three different ways to integrate different arithmetical semantics in the same program. Use:

  • different scalar types
  • different operations: mul() is not mul_assoc()
  • different rules in different times / spans of code

Of these, the third one (pertaining to #[attributes]) is the absolute worst.

Rust has a great type system. Use it! Or at least name the operations differently. Don’t pretend that the types and the operations are just the same when clearly different semantics apply in different times or in different parts of the world. There’s no such thing as code geography. The same truths apply everywhere!

Values and calculations migrate. The soundness of the code requires that some essential invariants are preserved. How else can you refactor your code fearlessly?


#12

This is the status quo and it doesn’t work well either.

How else can you refactor your code fearlessly?

The problem is that whether fast-math assumptions affect correctness or not does not only depend on the code executing the math, but it also depends on which values the inputs are allowed to take, and which accuracy does the user expect / need the outputs to have.

While these are often know by the user of math code, these are rarely known by the writers of the floating-point algorithms, and there are at least 5 assumptions that users might want to specify:

  • Trapless<T>: whether floating-point arithmetic can be assumed not to trap (e.g. on signaling NaNs)
  • Round{Nearest,0,+∞,-∞}<T> : whether the rounding mode can be assumed
  • Associative<T>: whether floating-point arithmetic can be assumed to be associative
  • Finite<T>: whether floating-point arithmetic can be assumed to produce numbers
  • Normal<T>: whether floating-point arithmetic can be assumed to produce normal numbers (as opposed to denormals/subnormals)

These assumptions can be combined at will. If you do the math, we wouldn’t be introducing a couple of types or extra functions, but O(100) floating point types or O(1000) functions.


@rkruppe

But the type approach also have many limitations. You’d have to convert between the regular float types and these wrappers in many places, which is already very annoying if you juggle individual scalars, and becomes even worse when collections or other abstractions are involved (e.g., have fun mixing Associative with cgmath::Vec2).

An implication of the type-based approach is that it turns all library code using concrete floating-point types on their APIs into bad code because it hardcodes assumptions about the inputs that the users cannot improve. This could mean that a lot of code would need to become generic over the floating-point type throughout the ecosystem.

It is a bit unclear to me at this point how making all the ecosystem generic on floating-point types would look like, and whether that would be worth it (it also introduces a cost). Also I would expect that a lot of people would want to start doing negative reasoning with this, to express that an API only takes floats that can’t be NaN or subnormal.

IMO the type-wrapper approach looks to be the most promising one to me, but it is going to have to come with a good Float trait to be minimally useful, and we might want to think about a couple of other traits that people might want to use to constrain their implementations, and also, whether Associative<Finite<T>> and Finite<Associative<T>> should unify and things like that.


#13

A type-wrapper based approach could allow this via parametrized crates and modules if we ever get those, so the attributes for these cases could be seen as a stop-gap solution.


#14

There is no such thing as a trap on signaling NaNs – in the “default” floating point environment, at least, but current Rust does not expose any way to alter the fp environment, nor would it be sensible to assume an altered fp environment by default. What IEEE recommends, and what all performance-oriented languages should do (and indeed what C and C++ and LLVM, for example, do), is to allow programmers to opt into alternate exception handling on a per-block basis or similarly fine-grained basis. If that is done, almost all optimizations are severely restricted anyway (even compared to the normal, not -ffast-math state).

Likewise, the default floating point environment which any optimizing compiler needs to assume already (and which LLVM indeed assumes) implies ties-to-even. Newtypes that override the rounding mode to be different from this default can make sense, but it makes no sense to frame this as an optimization hint or as an “can be assumed”. It simply always operates with different rounding mode, period.

Probably (though mostly for NaNs, I expect), but it seems dangerous to intermingle this use case with optimization hints – it would presumably not be unsafe to produce a NaN from a computation, the result of the calculation would just be unspecified and maybe optimized accordingly. So it seems inappropriate to use it for any semantic invariants.

Such a mechanism seems very far away so I don’t think it should block this feature, though if there are concrete proposals for parametrized crates and modules we may want to look at how forward compatible the fast-math stuff is with those proposals.

Besides, the proposals I have seen would not by themselves allow changing what literals mean or reduce the amount of clutter from having to use the wrapper types (or generics to abstract over them), just deduplicate the generic parameters and bounds.


#15

GCC has an optimization flag that “assumes that no use-visible trap will happen”. I am not certain which optimizations this enables. Maybe it can just elide checks for errno or similar when calling onto libm? EDIT: the docs say “This is useful, for example, when running a program using IEEE “non-stop” floating-point arithmetic.”

Likewise, the default floating point environment which any optimizing compiler needs to assume already (and which LLVM indeed assumes) implies ties-to-even. […] current Rust does not expose any way to alter the fp environment

So what happens if one calls into C and C alters the FP environment? Is that undefined behavior?

I mentioned these under the assumption that changing the FP environment is a valid thing that a user might want to do, in which case an optimization backend cannot assume anything about the rounding mode. The purpose of these type-wrappers in this context would be to tell the optimizer that it is allowed to assume a particular rounding mode, and which rounding mode that is.

but it seems dangerous to intermingle this use case with optimization hints

Yes, I didn’t meant to convey that we should turn the type-wrappers into traits that enable optimizations when used as bounds, but rather, that we might want to add some traits alongside to Float that can be used to constrain the types of Float that generic code might want to accept.

I’ll expect the wrappers to have safe checked constructors and checked arithmetic by default when -C debug-assertions is enabled as well as unsafe unchecked constructions. That might allow us to provide implementations of these extra semantics traits over Float for the wrappers.

Besides, the proposals I have seen would not by themselves allow changing what literals mean or reduce the amount of clutter from having to use the wrapper types (or generics to abstract over them), just deduplicate the generic parameters and bounds.

Yes I would expect this to be the case as well.

FWIW you mentioned

The attribute-ish approach presented in the OP is not workable in my opinion. It would introduce a second, completely ad-hoc, way of introducing monomorphization, and magically reach across abstraction barriers.

I am really unsure on how useful the module level attributes will be if there are no ways for users to specify these externally, and if they only affect the code within the module and not the code the module calls to. The type-wrapper approach solves these issues, but at the cost of having to make library code generic, and user code verbose :confused:


#16

Very curious that it’s enabled by default, by C standard rules the program itself can’t install such trap handlers without opting into fp env access (and then this is restricted to the lexical scope in which it’s activated, can’t affects other compilation units) and I’m not aware of any operating systems or the like installing trap handlers for floating point exceptions by default – that would likely break more programs than it would fix!

I can also find no details about its effects on GCC’s optimizers, maybe it’s as incompletely implemented as -fwrapv and doesn’t actually prevent that many optimizations?

For the record, Clang has this flag too, but from looking through the driver it apparently only disables the -ffast-math optimizations if they are also specified, it does not affect the normal optimizations. Indeed, assuming any fp exception can trap means you can’t do very many optimizations (can’t even remove dead operations!), so LLVM likely has zero interest in supporting such a mode except as a byproduct of programs opting into fp env access, and certainly not by default. rustc should follow this lead (as it already does in practice emitting fadd etc.)

Sure. Well, more specifically, functions that change the fp env should save it when called and restore it before returning. I am not sure whether this is automatic or manual in C code that uses fp env access, but if it’s manual I am rather sure it’s UB to change anything about it and return without restoring it to the default. This is practically an (admittedly obscure) ABI issue. Nothing else can make sense if you want to have separate compilation and optimize floating point operations without doing whole program analysis (blegh).

Indeed any optimizer will be so restricted it’s not even funny by a program manipulating the fp env, which is why the ability to do so should be opt-in and limited to code that needs it. And such code will likely have little use for giving back freedom to the optimizer – if it didn’t care, it wouldn’t be accessing the fp env. (And it can always move some operations to a different scope that doesn’t opt into fp env access and can thus be optimized normally.)

Unsafe constructors and optional checks in safe constructors don’t fit together. Either it’s UB to have a NaN in the “not-NaN” type, in which case it must always be checked in safe code, or it’s not UB and merely undesirable, in which case unsafe is misplaced. But regardless, the design of such semantic newtypes is a very different problem.