Help Us Benchmark Saturating Float Casts!


TL;DR: please profile/benchmark your code on nightly-2017-11-12 or later with and without -Zsaturating-float-casts so we can make an informed decision! We are hoping to enable this flag by default in future versions to plug a long-standing soundness hole!


Since long before Rust 1.0, float->int casts have had a soundness hole: if the input value is larger or smaller than the target type can hold (or NaN), the result is Undefined Behaviour. This is because we just lower these casts to LLVM intrinsics, and that’s what those intrinsics say.

For instance, -1.0 as u8, 300.0 as i8, NaN as u32, and f32::INFINITY as u128 are all currently UB in Rust.

Proposed Solution

The reason this took so long to fix is that, well, we didn’t know what to do in this case! After some long discussion (which you can find in the float->int issue) saturating appeared to be the most reasonable solution.

Specifically, these behaviours would be guaranteed at runtime (note: small means “very negative”):

  • TOO_LARGE_FLOAT as int == int::MAX
  • TOO_SMALL_FLOAT as int == int::MIN
  • NaN as int == 0

At compile-time (const fn), these casts are currently errors. This is the most conservative thing to do while the runtime semantics are being decided, eventually constant evaluation should match runtime. Constant evaluation is not affected by the -Z saturating-float-casts flag because it is not performance critical.

You can see these two tests for a detailed enumeration of interesting cases.

The arguments in favour of this behaviour are that:

  • It matches the spirit of floats, which favour saturating to ±Infinity.
  • Creates the closest possible value, and least “random” value.
  • Matches the behaviour of ARM’s vcvt instruction, making this have no overhead there.
  • Should optimize reasonably well on other platforms.

Panicking at runtime for these cases was considered unacceptable for two reasons:

  • Early testing found it to be very slow.
  • All other casts currently always succeed (int -> int wraps while float -> float and int -> float saturates to ±Infinity).

Wrapping at runtime was considered to be a bit too random (int -> int wrapping is traditional, has a simple/useful bitwise interpretation, and is very uniformly supported).

Testing It Out

Behaviourally, saturating is technically backwards-compatible because applications relying on this behaviour were relying on Undefined Behaviour. However we take performance seriously, and we want to understand the performance impact of this change before moving forward and putting this change through the proper RFC process. There were some preliminary measurements of this change, but we found the performance impact to be very workload-specific. As such, we’d be interested in recieving community feedback!

The proposed solution has been merged under the unstable, nightly-only -Zsaturating-float-casts flag. You will need the nightly from 2017-11-12 or newer, as earlier nightlies either don’t have the flag, or have a prelimary version that does more work than necessary (which could affect measurements). If your applications or libraries have any benchmarks, we’d be interested in seeing how the results compare when built with and without -Zsaturating-float-casts, e.g.:

> cargo +nightly bench

> RUSTFLAGS="-Zsaturating-float-casts" cargo +nightly bench

… and then post the results here.

Elsewhere In Undefined Casts

The sibling of this issue is the u128 -> f32 cast having Undefined Behaviour because u128 can exceed the maximum finite value of f32. This was recently fixed by saturating (at runtime as well as at compile time). Saturating matches how out of bounds float literals behave, and is compliant with IEEE-754. For further details, see issue #41799 and the PR that fixed it.


On x86_64 my float-heavy code is about as fast or actually slightly faster with -Zsaturating-float-casts. I’m already clamping values when casting (mainly to convert f32 to saturated u8), so I guess this optimizes well.


Just a note for anyone who hasn’t read the issue report:

The current implementation does not include specialized architecture-specific sequences, and while LLVM can optimize the generic version to some extent, it doesn’t end up taking full advantage of hardware guarantees. In particular:

  • As @rkruppe noted, in theory there should be “no overhead” on ARM because the ARM instruction for float-to-integer-casts, vcvt, already has exactly the proposed behavior. But the current implementation produces a lot of extra assembly anyway (tested on 2017-11-13 nightly).
  • On x86-64, the potential for improvement is lower because the hardware acts differently out of the box, so the proposed behavior will always require a sequence of instructions. But it can still be made shorter.

On the other hand, using architecture-specific intrinsics would probably inhibit autovectorization, so the whole thing’s kind of a mess, short of improving LLVM itself…

Anyway, this is all probably not a big deal, but I thought it was worth noting that this isn’t the final word on performance.


Is there an easy way to grep the MIR or LLVM IR to see how heavily I and my dependencies are actually using float casts, or if they get optimized away etc?


I don’t think there’s an easy way, but I am skeptical of how useful that would be anyway. It would tell you whether there are any such casts in the program, but not whether they are actually executed and how frequently. If in doubt, just benchmark anyway :slight_smile: Moreover, if there are casts in the source program that get optimized away and therefore don’t impact the performance, that would be useful data as well (since it means saturating float casts can still be optimized away).


Slightly related, but how about just deprecating “T as U” where U is a smaller numeric type?

Introducing such a simple syntax for something so dangerous seems to have been a serious mistake, and the recommended API should be something like “x.truncate()” or “x.try_into()” returning Option, instead of “x as U” (as well as unsafe “unsafe_truncate” for the current behavior).

Even having “as” at all for numeric types might be a mistake, since it makes sign vs zero extension implicit, and thus forcing people to write “x.zext()” or “x.sext()” might be better.

If the syntax is deprecated, maybe performance is not as critical?


That’s a separate concern, because even if the syntax were deprecated it would still exist for years at minimum, and UB outside of unsafe code can’t be tolerated even via deprecated features. And we’d still have to provide replacements, and performance of those replacements are still critical even if there’s no dedicated syntax for them, so the real-world benchmarking in this thread remains necessary.

(It’s also worth mentioning that float to int casts don’t need to be dangerous if one’s hardware or backend handle it properly. This isn’t something that’s fundamentally broken.)

There might be some valid arguments for moving towards removing as from the language entirely someday, but that’s going to be a long and contentious RFC and needn’t sidetrack us from making progress here.


No change for my doom renderer which actually does a fair bit of casting to and from floats (since doom used integers, but I use floats). Before, my benchmark (of processing all levels in DOOM1 in memory) had an average 1.291s (over 5 runs), afterwards it had 1.292s (over 5 runs). Well within error margin.

not saturating
RUST_LOG=0 target/release/rs_doom --check --iwad ~/data/wad/DOOM.WAD  0.63s user 0.25s system 68% cpu 1.288 total
RUST_LOG=0 target/release/rs_doom --check --iwad ~/data/wad/DOOM.WAD  0.63s user 0.24s system 67% cpu 1.300 total
RUST_LOG=0 target/release/rs_doom --check --iwad ~/data/wad/DOOM.WAD  0.64s user 0.22s system 67% cpu 1.271 total
RUST_LOG=0 target/release/rs_doom --check --iwad ~/data/wad/DOOM.WAD  0.62s user 0.24s system 66% cpu 1.286 total
RUST_LOG=0 target/release/rs_doom --check --iwad ~/data/wad/DOOM.WAD  0.67s user 0.24s system 69% cpu 1.311 total

RUST_LOG=0 target/release/rs_doom --check --iwad ~/data/wad/DOOM.WAD  0.63s user 0.26s system 69% cpu 1.283 total
RUST_LOG=0 target/release/rs_doom --check --iwad ~/data/wad/DOOM.WAD  0.72s user 0.20s system 71% cpu 1.291 total
RUST_LOG=0 target/release/rs_doom --check --iwad ~/data/wad/DOOM.WAD  0.65s user 0.21s system 66% cpu 1.282 total
RUST_LOG=0 target/release/rs_doom --check --iwad ~/data/wad/DOOM.WAD  0.73s user 0.18s system 69% cpu 1.316 total
RUST_LOG=0 target/release/rs_doom --check --iwad ~/data/wad/DOOM.WAD  0.69s user 0.20s system 68% cpu 1.288 total

Well done, everyone! Really happy this soundness hole is going away!


My medium-heavy-FP code is about 1% slower with -Zsaturating-float-casts in optimized builds.


My 3d renderer, which is very FP heavy but doesn’t do a ton of casting between floats and ints, had no change between the previous behavior and saturating floats.


Interesting, thanks for reporting! Is this code public? I’d like to try to reproduce and investigate.


The code isn’t public, sorry.


encode_jpeg benchmarks from the image crate show a bit of slowdown:

 name        base ns/iter  saturate ns/iter  diff ns/iter  diff %  speedup 
 bench_gray  11,739,248    12,590,876             851,628   7.25%   x 0.93 
 bench_rgb   34,507,766    42,325,700           7,817,934  22.66%   x 0.82 

rustc 1.23.0-nightly (fa26421f5 2017-11-15)
image 796b65b84


I have a program that reads data from several sensors in real time and writes them out to a file. Later it reads them back and converts to PNG. Finally it loads the PNGs, draws stuff on top, and writes them back.

I’m measuring the FPS of reading the sensors (I don’t think there is much FP here, except for thread scheduling to try to meet the desired FPS), plus the timing of the latter steps (where there is float casting going on). OTOH I have a lot of deps so there could well be more casting than I know about.

The code is available, but you obviously can’t reproduce the first part since you don’t have the hardware. However I could make the raw data files available for reproducing process and render.

(ideal) unsound saturating
compile (sec) 698.76 748.45 (+7%)
sensor 1 (FPS) 15 14.981 14.971 (no effect)
sensor 2 (FPS) 30 29.619 29.618 (no effect)
sensor 3 (FPS) 1000 999.40 999.30 (no effect)
sensor 4 (FPS) 3000 2894.7 2948.8 (+2% could be noise)
process (sec) 613.83 614.60 (no effect)
render (sec) 332.79 331.82 (no effect)


Benchmarks of my audio manipulation program, which uses some casts here and there, but not that many:

without saturating floats: 110,845,560 ns/iter (+/- 755,520)
with    saturating floats: 111,232,224 ns/iter (+/- 880,856)

Edit: On a different machine (32bits hardware):

without saturating floats: 310,772,782 ns/iter (+/- 1,085,169)
with    saturating floats: 309,757,906 ns/iter (+/- 1,163,095)

On the same 32bits machine, but when I leave out the actual file reading and writing from the benchmark:

without saturating floats: 306,382,149 ns/iter (+/- 1,036,747)
with    saturating floats: 305,011,520 ns/iter (+/- 1,118,252)


@vickenty example is a nice example where this is gonna hit users the most. Maybe it deserves a deeper look to check for low hanging fruits.


@arthurprs I agree. I’ve previously done some investigation ( but failed to find much optimization potential short of making the casts themselves faster.


Earlier today I was trying my hand at getting the run-pass test suite passing on wasm32-unknown-unknown, but alas I hit a failure in the test! I believe that in wasm32 we’ll actually want to unconditionally turn on the saturating cast behavior because an out-of-bounds cast is actually a trap!

So the problem I’m seeing is that when executing the test it’ll fail with integer result unrepresentable which I think means an out of bounds cast. I minimized the test to:

use std::f64;

fn bar<T>(t: T) -> T { t }                          
pub fn foo() -> f64 {                               
    if bar(f64::INFINITY) as i8 != i8::max_value() {
        extern { fn exit(); }
        unsafe { exit(); }                          

I couldn’t really quite follow the wasm IR, but I figured I could take a stab at the LLVM IR. So instead of a wasm target the IR here is generated for x86_64-unknown-linux-gnu with optimizations disabled

define double @foo() unnamed_addr #2 {                                         
  %_1 = alloca {}, align 1                                                     
; call wut::bar                                                                
  %0 = call double @_ZN3wut3bar17h434e9ad8e9a22fa7E(double 0x7FF0000000000000) 
  br label %bb1                                                                
bb1:                                              ; preds = %start             
  %1 = fptosi double %0 to i8                                                  

I think that this IR may not be correct? @rkruppe correct me if I’m wrong though! According to LLVM’s documentation the fptosi instruction is undefined behavior if the source doesn’t fit in the destination, but in this case I think it’s unconditionally putting f64::INFINITY into the destination of i8? I think this is what’s causing a trap on wasm as that instruction generates a trap instead of doing anything else.

@rkruppe is this expected behavior? Or maybe I’ve misdiagnosed? I can certainly open an issue with more details if that would help!


@alexcrichton Huh. The IR containing an unconditional fptosi is intentional and ought to be harmless. LLVM says that fptosi etc. results in undef instead of triggering immediate undefined behavior (note the difference between “behavior is undefined” and “results are undefined”). My implementation exploits that. If wasm traps instead of giving a nonsense result, we’ll have to deal with that somehow even if it’s technically not compatible with LLVM IR (but then again, LLVM largely ignores the possibility for float operations to trap). I’ve opened for tracking it and possibly further discussion.