Help Us Benchmark Saturating Float Casts!

comex · November 14, 2017, 2:33am

Just a note for anyone who hasn’t read the issue report:

The current implementation does not include specialized architecture-specific sequences, and while LLVM can optimize the generic version to some extent, it doesn’t end up taking full advantage of hardware guarantees. In particular:

As @hanna-kruppe noted, in theory there should be “no overhead” on ARM because the ARM instruction for float-to-integer-casts, vcvt, already has exactly the proposed behavior. But the current implementation produces a lot of extra assembly anyway (tested on 2017-11-13 nightly).
On x86-64, the potential for improvement is lower because the hardware acts differently out of the box, so the proposed behavior will always require a sequence of instructions. But it can still be made shorter.

On the other hand, using architecture-specific intrinsics would probably inhibit autovectorization, so the whole thing’s kind of a mess, short of improving LLVM itself…

Anyway, this is all probably not a big deal, but I thought it was worth noting that this isn’t the final word on performance.

Topic		Replies	Views
Tackling Undefined Behaviour Casts	10	3067	March 25, 2019
Pre-RFC: Add explicitly-named numeric conversion APIs libs	26	4940	March 11, 2020
Implementing a Fast, Correct Float Parser internals	4	4499	September 28, 2021
Avoiding PartialOrd problems by introducing fast finite floating-point types	46	5156	March 25, 2019
Why so complex way to calculate i32::MAX? compiler	8	1238	September 17, 2023

Help Us Benchmark Saturating Float Casts!

Related topics