Suggestion: FFastMath intrinsics in stable

I'd love to see some experiments about this on nightly. Maybe make a proposal as a compiler MCP, like the safe transmute stuff did?

Re fn-s..

Wouldn't it usually make sense to provide only a small number of implementations for a given mathematical fn anyway? For example LLVM provides just two versions of sqrt built-in: "precise" and "approximate".

This would create a group of related fn-s. If scopes like this were introduced

each fn could be annotated with which approximation context it is tailored for

#[float_fn_group(sum, reassoc)]
fn sum_reassoc()...

#[float_fn_group(sum)]
fn sum()...

A compiler then could yield a compilation error if sum_reassoc was used in a context where #![approximate(reassoc)] is not "active".

Conversely the compiler could issue a warning suggesting to use sum_reassoc if sum was used inside a #![approximate(reassoc)] context.

The idea here is that sum and sum_reassoc would be coded separately and thus can use different intrinsics etc. Sharing code between them is envisioned as a future extension of this proposal. It is not clear to me now how that could look syntactically.

I have given this some thought, and while the conversation drifted away from it, I would like to reiterate for posterity's benefit: I do not believe that this should be done at the level of types akin to LooseFloat or FastF32 and such that have been proposed, including in this thread, because in general the floating point environment and operations on it are more subtle than operations on individual types, and things like the Wrapping type are a fairly painful solution to interact with. My recent efforts with designing a SIMD API have only strengthened my opinion on this matter: it increasingly fell apart at the edges.

Nor do I believe it should be a compile-level only flag. While it is somewhat vague, I believe the only ergonomic solution is to make it possible to specify the behavior of floating point computations at the block scope level.

Importantly, this kind of mechanism would allow the behavior to be statically encapsulated and nested within each other, which would permit one to potentially nest both "precise" and "loose" floating point algebras, while also being able to e.g. include another person's floating point math function and potentially control the precision and performance of it if it does not impose a specific floating point constraint.

And I believe the most appropriate approach would be one that also enables Rust to account for "broken" floating point environments which do not adhere exactly to the standard. Indeed, I believe they are essentially the same thing, to some extent.

I have other thoughts about this matter, and experiments I would like to consider given infinite time, but may not get around to, so I am being very general here.

I believe one thing that may prove worth investigating is examining the relationship between types and traits in the first place. Realistically, anything like this that was also sound would be quite an extension of the compiler's abilities in the first place, and it may be most useful to dig into the language fairly directly.

5 Likes

This sounds more similar to floating point semantics in C and I agree that it's reasonably ergonomic (in my case, to relax semantics at coarse granularity and make it strict in blocks, but could go the other way). I'm curious how you envision the nesting working, and what syntax would be used for the likes of

x.iter().map(sensitive_closure).sum();

where you want strict semantics in the mapped function (often comes with negligible perf penality due to easy vectorization) but associative math in the sum? Note that vectorization of the mapped function may only happen if associative math is enabled in the sum.

With the FastF32 approach, one can write it this way.

let a: f32 = x.iter().map(sensitive_closure).sum::<FastF32>().into();

Admittedly, the ergonomics become much less friendly in other settings.

1 Like

What will happen if inside a block with enabled fast math you will call a non-generic function from a different crate? Should compiler re-compile it with new settings or use its strict version? If its the former, how it should handle this function if it affects several crates, i.e. should compiler re-compile an affected sub-tree of dependencies? What about trait methods, would be traits compiled with fast-math settings be compatible with strict version?

I think that the intrinsics approach with generic settings mask should be preferred. While somewhat unergonomic, it is the simplest, gives full control to programmers, and open to building abstractions on top of it.

1 Like

So, it is my belief that in truth, no function is non-generic: as long as it uses floating point computation, vectorization features, or other things which must bend to the nuances of the machine, it is generic over floating point registers, it is generic over the floating point environment, and it is generic over ISA instructions and feature levels. This applies to any larger segment of code as well.

In other words, the type parameter is there, but it is already there and it is up to us to choose how to acknowledge it. So I believe trying to define a special "wonky float type" to try to "contain" this issue is problematic not because we shouldn't view this as an issue at the level of a type system, but because it is my experience that FastFloat32 and the like are no more ergonomic than the intrinsics proposal, and because functionally, f32 and f64 are already parameterized by this. That is, the currently-hidden parameter alters existing computations in practical situations without being visible to the programmer except by rigorous testing.

And we have addressed similar issues with type inference and monomorphization in the past. While it may appear different, this is functionally the same issue, and soluble the same way. I believe you cannot avoid the type variable by using special compiler-intrinsic functions, as it will leave existing arguments to the floating point environment invisible and affecting programs in subtle and hidden ways, and the programmer control is not much different in the end. I will acknowledge there are merits to it... it has a certain simplicity from the compiler view... but I believe that our current success with handling these type parameters is mostly because of lack of ambition, not trying to do anything too interesting, and pretending the current occasional miscompilation when optimizations are enabled is not happening.

So any solution to "fast math" must also somehow address existing """optimizations""" that are performed on ordinary operations.

And this is why I addressed the block scope level as my main object of attention: the first function to receive this type argument in a Rust program is, unavoidably,

fn main<F, I>()
where
    F: FpEnv,
    I: InstSetArch,
{

But I don't think anyone should have to write it out like that for a basic program. Thus: a certain amount of type inference seems appropriate, or alternate annotations, or otherwise being sneaky about it.

1 Like

It is not. LLVM considers it UB to use any floating point environment other than the default LLVM expects. LLVM will happily fold floating point operations under the assumption that a specific floating point environment is used.

The only case where this actually affects the output of floating point operations is when using x87 floating point instructions as the registers use 80bit floats in that case, but spilled floats use 32 or 64 bits. The x87 instructions aren't used on x86_64 and the i686 target, only the i586 target uses them. You should avoid them precisely because optimizations affect the result of floating point optimizations.

3 Likes

It seems like clang supports STDC FENV_ACCESS, so it should be possible, provided the frontend (namely, rustc) emits the right attributes. I ran into this problem (of no fenv_access in rust) while working on a CPU emulator (I ended up completely just using my own fp stack because it was otherwise annoying).

Yep; I made a thread about it a few months ago:

You can either tell LLVM you're using a specific rounding mode other than the standard one, or ask LLVM to produce code that works regardless of the rounding mode (at the cost of killing all floating point optimizations). The latter is what FENV_ACCESS does in C.

1 Like

Just closed 3 days ago :sob:. And yeah, my issue is the same. I need to perform floating-point calculations with a particular floating-point environment (which can be chosen at runtime by the user interacting with the emulated CPU), and it's very annoying I either have to emulate floating-point operations (to the time cost penalty that entails) or drop to C (or assembly).

There's benefit to both. The latter is the most generally useful, since it would be necessary if you had runtime selection of rounding mode (unless you want too many functions).

I'll note that there is a similar thread for Clang on cfe-dev about how to handle isnan under -ffast-math. Many of the optimization considerations came up there as well.

3 Likes

From the top of that thread:

this option is described as: "Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs."

If the type double cannot have NaN value, it means that double and double under -ffinite-math-only are different types

So, uh, a refinement type of double? Then why not make them different types? Curiously most of the operations under the new type would be unsafe though as the caller guarantees them to not overflow. But everything would work out with the typing of safe checked_* functions.

Using C as precedent has its limits because a significant portion of their discussion is focussed on how to make it work with existing code, a problem that Rust doesn't necessarily need to share as f32 was always clearly defined as IEEE float.

1 Like

I definitely have an interest in seeing this given that I'm writing a vector algebra library where operations often have most of their components set to zero or one, but in a way that's easily inlined. The problem comes when multiplying every combination of terms and having the compiler know so many of them are zero, but still doing every multiplication and addition anyways. I've written a wrapper type that calls the fast math operations, has guards to prevent NaNs and infinities from entering, and implements various traits so that it can be swapped out for safe floats if need be. It's actually been working pretty well as long as the assertions aren't debug_asserts. (I had to pay for my hubris when those guards were getting skipped in release mode.)

Seeing a monster 32x32 multiplication drop down to only a few instructions where it can get inlined was very satisfying to see. My biggest concern at this point is the fact that it's nightly only, so stable users will only be able to use a small handful of the optimizations that should algebraically be possible, and may break depending on the compiler version.

1 Like

It is not. LLVM considers it UB to use any floating point environment other than the default LLVM expects. LLVM will happily fold floating point operations under the assumption that a specific floating point environment is used.

Shrug. By floating point environment I mean the sense that it is used in IEEE754 ("programming environment"), including the total implementation, not "settings that alter FP execution details on a moment to moment basis" per se, so I am just repeating myself. However, LLVM also can impose results-altering transformations in ways that are not necessarily expected and not necessarily conformant.

The only case where this actually affects the output of floating point operations is when using x87 floating point instructions as the registers use 80bit floats in that case, but spilled floats use 32 or 64 bits. The x87 instructions aren't used on x86_64 and the i686 target, only the i586 target uses them. You should avoid them precisely because optimizations affect the result of floating point optimizations.

"only"? I don't think that is correct. Last I checked, ARMv7 Neon registers used to handle floats will flush denormals, even when used in Rust code that emits only LLVMIR (vectors of) floats. And this is what I mean by being generic over the environment and ISA features.

and as far as

You should avoid them

goes? I do not think we should talk about what people "should" or "should not" do. If we know something is risky and may lead to incorrect code, we can

  • make the compiler stop emitting such incorrect code at all
  • make an actual path to using the relevant risky code correctly
  • at the very least actually emit a warning or caveat

Currently we do none of those, even in the face of users directly inducing rustc to miscompile code in ways we know and can predict, so if this is such a strongly-known "should", why isn't that knowledge embedded in the compiler?

I don't think it is very well known because the only official targets to which it applies are the i586 ones. The i686 ones already use SSE instead of x87 floats. i586 basically only exists for compatibility with very old systems. I don't think anyone ships programs for i586, especially not for scientific or gaming purposes where "well behaving" floating point operations are much more important than the average program.

People actually hit this miscompilation where they try to disable SSE2 with an x86_64 or i686 target and just FUBAR their build surprisingly often. It tends to happen most with the "hobby OS" types (who simply don't want floats, but may not correctly disable it, and may not realize they also have to disable the x87 FPU), but others hit it when e.g. they are doing some kind of program verification setup and believe that vectorization may complicate their model, so they try to disable it, but then run into floating point issues.

While I realize this is a few niche sets of programmers, they add up! Especially because, well, some people are drawn to Rust precisely for it seeming more amenable to developing an OS or doing program verification.

I should note that I am totally okay, personally, with just dropping support for i586, but that would still leave us needing to solve for users trying to use other x86 targets and misconfiguring their build. I realize that all compiler settings are technically unsafe by Rust standards, but I believe that the nature of that is not really understood by users.

Hmm, FiniteF32 as a sibling to NonZeroU32 is probably the most practical suggestion that's come up this whole thread. (f32, not NaN, not infinity)

I would propose NonNanF32 (see threads like [Pre-RFC?] NonNaN type), since ±∞ really don't cause the same problems that NANs do. (Including the infinities is still Eq+Ord, for example.) And things like log(0) => -∞ and exp(-∞) => 0 are quite handy.

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.