Equivalent of `#pragma STDC FENV_ACCESS ON`

I recently wrote some code where I needed to perform floating point calculations in a non-default floating point environment. I wanted to write it in Rust, but Rust doesn't support this, so I ended up having to use C instead.

Background

The two standard parts of the floating point environment are (1) the rounding mode and (2) exception settings (i.e. whether to raise a signal in various cases like overflow or divide by zero). But there can also be nonstandard, platform-specific parts. In my case, I wanted to set an option that disables support for denormal numbers (also known as subnormal numbers). Though disabling denormals isn't standardized in either the IEEE float standard or the C spec, it's a fairly common option across multiple processor architectures. I'm currently on x86, where this exists in the form of the DAZ (denormals are zero) and FTZ (flush to zero) flags in the MXCSR register.

Disabling denormals is typically done for performance reasons. Traditionally, if a floating point arithmetic instruction (say, multiply two floats) happens to have a denormal number as either an input or a result, it can trigger a microcode assist, making the instruction run literally 10-100x slower than usual. More modern processors sometimes avoid this penalty but not always. If you disable denormals, the processor skips the microcode assist and just treats the numbers as zero, sacrificing precision for performance. (In my case, I wasn't actually interested in performance, but rather trying to exactly match the behavior of another system which I knew did disable denormals.)

In C, there is a standardized API in <fenv.h> to change the rounding and exception modes. As I said, disabling denormals isn't standardized in C, but at least on macOS, you can do it with an extension to that API, by calling fesetenv(FE_DFL_DISABLE_SSE_DENORMS_ENV).

How it affects Rust

Rust doesn't provide a way to change the floating point environment: not in the standard library, and not even in the libc crate, where the <fenv.h> functions are not available.

Of course, you can define your own bindings to those functions. But doing so would be unsound.

From a hardware perspective, the floating point environment is a global setting that affects all computation done in the program. From a Rust perspective, this is unfortunate. We prefer to reason in terms of an abstract machine, stating that a * b has a single correct output independent of architecture or settings.

In an ideal world, to support alternate floating point modes, we'd probably start by adding new, explicit methods on f32 and f64, like add_with_denormals_disabled or add_with_rounding_mode or whatever. From there, to achieve better ergonomics, we would make wrapper types whose impls for Add, Mul, etc. forward to these special methods. Alternately, if it was absolutely necessary to modify the behavior of regular floating point types, perhaps the compiler could add a new mechanism based on lexical scope. Anything but global dynamic state!

Unfortunately, this approach is impossible to implement performantly on current hardware. You could implement add_with_denormals_disabled as a three-part operation: change the mode, perform the addition, then restore the old mode. Technically, the floating point environment is thread-local, not literally global, so this approach would be correct. But it would also be extremely slow. On x86, for example, changing the environment often requires a full pipeline flush.

So if we want to support non-default floating point environments at all, it seems like the only way to do it is to make the source language's standard floating point types reflect that dynamic state.

It's not that Rust needs to do anything special to opt in to this. The floating point environment is (typically) a hardware feature, which affects the behavior of all floating-point instructions. It already works.

The problem has to do with compiler optimizations.

For example:

pub fn are_denormals_disabled() -> bool {
    let smallest_normal: f32 = 2f32.powf(-126f32);
    smallest_normal / 2f32 == 0f32
}

pub fn is_rounding_mode_set_to_upward() -> bool {
    16777216f32 + 0.5f32 == 16777218f32
}

Normally, rustc can and does optimize both of these functions to return a constant false. But that would be invalid if they might be executed with a non-default floating point environment.

The same problem affects C compilers. C attempts to solve it with a compiler directive:

#pragma STDC FENV_ACCESS ON

which must be present in the source code of every compilation unit whose code may be executed in a non-default environment. Compilers are supposed to take this as a signal to produce environment-agnostic code; in practice that means disabling most optimizations involving floating point numbers.

The good news

Clang already supports FENV_ACCESS.

It used to not; the relevant functionality was only added around 2018. (Meanwhile, GCC still doesn't support it; supposedly there's an -frounding-math option that does the equivalent, but when I tested it with C versions of the test cases above, it didn't actually stop them from being miscompiled.)

On the LLVM IR side, there are two different parts that make this work.

First, there is an attribute:

strictfp

This attribute indicates that the function was called from a scope that requires strict floating-point semantics. LLVM will not attempt any optimizations that require assumptions about the floating-point rounding mode or that might alter the state of floating-point status flags that might otherwise be set or cleared by calling this function. LLVM will not introduce any new floating-point instructions that may trap.

This attribute can be applied to both function definitions and function calls, and Clang applies it to all of both when compiling a translation unit with FENV_ACCESS.

Second, there are the constrained floating-point intrinsics, which Clang emits instead of normal floating-point instructions in such translation units. For example, dividing by two might normally produce this IR instruction:

  %4 = fdiv float %3, 2.000000e+00

but with FENV_ACCESS on, it instead produces:

  %4 = call float @llvm.experimental.constrained.fdiv.f32(float %3, float 2.000000e+00, metadata !"round.dynamic", metadata !"fpexcept.strict") #2

I'm not really sure why LLVM needs both a function attribute and special instructions, but it does.

(Note: The constrained intrinsics also allow making more fine-grained assumptions – e.g. you can assume a specific non-default rounding mode, or say that the rounding mode is unknown but exceptions are definitely disabled. That likely has use cases as well.)

The ugly bits

I'd like to propose exposing this LLVM functionality in Rust. But this raises some conceptual questions.

  1. What would our equivalent of FENV_ACCESS look like?

    Perhaps make it a function attribute? Maybe name it #[strictfp] to match LLVM?

    Or perhaps it would be saner to just ensure that all functions in a binary are compiled as strictfp. This would probably imply making separate targets, e.g. x86_64-apple-darwin-strictfp or similar. Adding targets is a big hammer, but we do have build-std now, and there is some precedent for having target variants based on ABI differences. There could also be variants that assume a specific non-default environment rather than being agnostic.

  2. What about const fn?

    Right now, floating point is not allowed in const fn due to semi-related issues. But we probably will allow it eventually, so what then? One option is to make floating point non-const if you're doing it from strictfp code. But that might not play well with eventual trait support for const fn. Another is to just say: well, strictfp allows running the same code in different rounding modes, and compile time just happens to be an environment where the rounding mode is set to default. That would mean the same expression can produce different results depending on whether it's surrounded in const { }, but this result may be unavoidable anyway (see the previous link).

  3. How to deal with strictfp functions calling non-strictfp functions?

    Or in other words, how do we deal with the possibility that some code will be run with a non-default floating point environment despite not being compiled with that in mind.

    Most likely we should just call this undefined behavior and have the compiler prohibit it. However, it would obviously be extremely limiting if strictfp functions couldn't call anything. There could be some API that temporarily switches the rounding mode to default and then executes any given non-strictfp function, but that would still be slow…

    Making strictfp a target variant would avoid this problem.

    Alternately, it might be possible to declare that running a non-strictfp function in the wrong environment is, in fact, not UB. It might be possible to say: in this situation, floating point calculations may nondeterministically produce the wrong result, but everything else works.

    But that is probably not viable. I'd be totally unsurprised if some compiler optimization can be coaxed into corrupting memory if floating point calculations behave wrongly. Or even if the compiler is resilient to that attack, humans might not be: people may write unsafe Rust code that is only sound if floating point behaves correctly.

    That said, even without changing the floating point environment, Rust floating point is already not consistent or predictable, thanks to a combination of architecture differences and compiler optimizations. See:

4 Likes

In theory, we could do constant evaluation in a non-default floating-point mode, assuming we can accurately reflect the desired behavior.

In any case, I'd like to see this functionality exposed in some way. This would also, potentially, allow for optimizations like FMA and other divergences from standard floating point.

I've been working on something related to this, but from the direction of wanting to ease the floating point rules as well (e.g. a principled way of supporting "fast math" operations — or at least the safe subset).

In a week or so I was planning on finishing up and posting a pre-rfc, but basically I think it would look like a #[float_model(...)] function attribute (similar to what you mention for #[strictfp]). I think the ideal way for this would probably fucntion somewhat like target_features — but I don't think it can reuse the existing mechanisms here, which leads to some complexity, that IDK if there's stomach for.

Yeah, this would work with an "I know the floating point environment is some specific non-standard one" setting, though not with an "I don't know the floating point environment" setting.

In C, the standard FENV_ACCESS pragma is "I don't know the environment", but it seems there are some nonstandard command-line arguments for setting a specific environment: for example, GCC has -mfp-rounding-mode for rounding (but Clang doesn't), and Clang has -fdenormal-fp-math for denormal handling (but GCC doesn't).

In any case, it looks like LLVM IR supports both setting a specific rounding mode and exception behavior (using arguments to the constrained FP math intrinsics), and setting specific denormal behavior (using a function attribute). Or even if this was lacking in some way, we could always have rustc assume a specific environment, but tell LLVM to be agnostic. This would be suboptimal but safe.

Setting a specific floating point environment does have major advantages: in addition to avoiding issues with const as you mentioned, it also avoids nerfing the optimizer. I'm not sure if LLVM supports all the same optimizations with constrained floating point intrinsics as it does with regular floating point instructions, but at least in theory it could, as long as the rounding mode and denormal handling is known, and exceptions are off. (Having exceptions on, or potentially on, inherently hinders optimizations since it makes floating point operations side-effecting.)

On the other hand, being environment agnostic has its own advantages:

  • Avoids forcing the compiler to model every floating point environment someone might want to run code in. Right now, we don't even perfectly simulate the default FPU behavior across supported architectures, though this can be fixed. When it comes to obscure nonstandard environment settings (there are things more obscure than disabling denormals :slight_smile:), it may be easier to punt.

  • Easier to write code that dynamically changes the environment. For example, in my actual use case (that I ended up writing in C), I'm just a shared library living in the same process as arbitrary other code and I want to be a good citizen. So my function changes the floating point environment, performs a batch of calculations (large enough that the penalty of changing isn't a big deal), then changes back before returning.

    Since I'm telling C to be environment-agnostic, I can treat the environment change as just a function call. But if I were doing it in Rust and Rust only supported known environments, then at minimum I'd need functions with different environment settings in the same crate – an entry point function set to the default environment, and the bulk of the code set to denormals-disabled – and there would need to be some magic "call with changed environment" builtin to let an environment-A function call an environment-B function by changing the mode, calling the function, and changing back.

    On the other hand, "call with changed environment" would also be useful with an agnostic setting, and critical if it were implemented as a per-function setting rather than a separate target.

Interesting. This would set a precedent for "standard floating point types can behave differently depending on settings" (though that is already true to some extent due to architecture differences). In the pre-RFC you should explain why you decided on that approach, rather than using a wrapper type or special intrinsics for fast math operations.

I could imagine using a single float_model attribute for both fast-math and environment settings, but if so, it would have to behave very differently between the two. Fast-math code is safe to call and be called by non-fast-math code; the same is not true for code assuming different environments. For that reason, I'm thinking it may be best to expose these two features in entirely different ways. But if so, it may be good for the fast-math attribute to use a more specific term than float_model.