Tackling Undefined Behaviour Casts


#1

Currently the result of certain floating-point casts are Undefined (as in can cause Undefined Behaviour):

  • https://github.com/rust-lang/rust/issues/15536: f64 as f32 will produce an Undefined result if input cannot be represented by the output. From discussing on the #llvm irc, my understanding is that this generally means that the input is finite, but exceeds the minimum or maximum finite value of the output type. ex: 1e300f64 as f32

  • https://github.com/rust-lang/rust/issues/10184: f* as i/u* will produce an Undefined result if the input cannot be represented by the output when rounded to the nearest integer (rounding towards 0, signed or unsigned as appropriate). ex: 1e10f32 as u8. Note that e.g. -10.0f32 as u8 is defined as 0.

This is an annoying wart on Rust’s current implementation, and we should fix it. Note that at least on x86_64 linux the example f64 as f32 cast just produces inf (which is is pretty reasonable IMHO), while the f32 to u8 example seems to produce completely random results (not sure if actual undefs are being made, but that seems believable).

I’m happy with these “nonsense” casts having unspecified behaviour so that we can e.g. inherit whatever the platform decides to do, as long as it doesn’t violate memory safety like the current design can. A solution that doesn’t add overhead seems ideal to me. Having to specify that e.g. 1000.0 as u8 == u8::MAX may be too cumbersome. Although note that this has a complex interaction with cross-compilation and const-evaluation.

I lack the requisite familiarity with LLVM to know what the best way forward is, though. I’d also be interested to hear if there are usecases for these casts having specified behaviour.


Pre-RFC: Explicit Opt-in OIBIT for truly POD data and safe transmutes
#2

This makes undefs.


#3

Just to be clear, are you referring to the following?


#4

Yes


#5

My understanding of the Rust literature is that creating undefs is undefined behavior (it’s in the list), but that only some of the LLVM uses of undef actually lead to undefined behavior (e.g. floating point division by an undef). Is it possibly to articulate which instances of undef actually lead to UB, for example “just the ones that lead LLVM to UB”, or is it more complicated than that?


#6

My first thought on this, is make such things panic, as it’s already a case with wrapping arithmetics. I’m not a language designer, so I’m not sure I have a voice here, but as a language user it seems reasonable and consistent behavior.


#7

I agree that it would make sense for them to panic in debug builds, but it is still necessary to figure out what should happen for builds without overflow checks.


#8

Undefs, huh? Undefs are fun. They tend to propagate. After a few minutes of wrangling…

#[inline(never)]
pub fn f(ary: &[u8; 5]) -> &[u8] {
    let idx = 1e100f64 as usize;
    &ary[idx..]
}

fn main() {
    println!("{}", f(&[1; 5])[0xdeadbeef]);
}

segfaults on my system (latest nightly) with -O.


#9

You can access platform-specific behavior through LLVM intrinsics; on x86, for example, you can use @llvm.x86.sse.cvttss2si and friends. A bit annoying, but workable.

There are essentially three behaviors Rust can provide: saturate, fail (either Option or panic), and platform-specific. No matter what as does, it’s probably a good idea to make all of these available as standard library functions. I would guess the right default for as is to panic in debug builds, and use platform-specific behavior in release builds. This parallels integer overflow: the performance cost of checking the conversion by default is probably too high.


#10

We’ve previously established that as is an unchecked op regardless of build mode (1000u32 as u8 just truncates), so doing anything special in debug builds is almost certainly not going to happen. We do however have plans for “checked cast” variants somewhere in the std lib.