Help Us Benchmark Saturating Float Casts!

Just a note for anyone who hasn’t read the issue report:

The current implementation does not include specialized architecture-specific sequences, and while LLVM can optimize the generic version to some extent, it doesn’t end up taking full advantage of hardware guarantees. In particular:

  • As @hanna-kruppe noted, in theory there should be “no overhead” on ARM because the ARM instruction for float-to-integer-casts, vcvt, already has exactly the proposed behavior. But the current implementation produces a lot of extra assembly anyway (tested on 2017-11-13 nightly).
  • On x86-64, the potential for improvement is lower because the hardware acts differently out of the box, so the proposed behavior will always require a sequence of instructions. But it can still be made shorter.

On the other hand, using architecture-specific intrinsics would probably inhibit autovectorization, so the whole thing’s kind of a mess, short of improving LLVM itself…

Anyway, this is all probably not a big deal, but I thought it was worth noting that this isn’t the final word on performance.

4 Likes