Pre-RFC: Explicit Opt-in OIBIT for truly POD data and safe transmutes

Can you elaborate on this? Does this happen when you transmute invalid representations? Does this happen with plain integers as well?

There are also issues with reading from padding that this doesn’t address.

1 Like

I’d also like an example of that. It seems wrong. I know that casts can cause UB, but transmute just copies the bits verbatim. All bit strings of length 32 are valid binary32 floats (though about 2^23 are NaNs) and all bit strings of length 64 are valid binary64 values (though about 2^52 are NaNs). There are no invalid representations that a transmute could produce, unless some layer between Rust and the silicon is actively breaking NaN with weird bit patterns (I say “actively” because I can’t imagine anything to be gained from such a restriction).

That’s a great point. I always took the read of padding bits to be functionally equivalent to a read of uninitialized memory: UB only in the sense of non-deterministic jumps based on it, etc. But even if not memory unsafe, converting padding bits to non-padding is probably not what you want.

So how about changing the ReprC bound to ReprPacked. You can still have fully aligned access, by dealing with your own padding fields—which is what you should be doing in the situations I’ve mentioned anyway.

I’m sure you’re right but I’m baffled by this a bit—all float bit patterns are valid (I was in fact tempted to call the trait BitSafe because of this definition). Is there any reference I could read which supports this?

In Rust, it’s considered unsafe to read an uninitialized value, which includes padding. If we were to go forward with this RFC, presumably we’d need to weaken it to “reading uninitialized values of types that do not implement Mappable is unsafe”.

Makes perfect sense. I changed the pre-RFC to require ReprPacked, which doesn’t have any padding and is strictly more general since you can include your padding if you want to—does it seem OK to you?

I had considered it, but not included why I thought it didn’t work. I edited the post to include it, let me know what you think—maybe I’m missing something (see last alternative :P).

Yep.

Alternately, it would be great to understand why it is considered unsafe to read uninitialized values. LLVM doesn’t make reads of undefined values into undefined behavior unless you use them in certain ways (e.g. dividing by an undefined float). Is the Rust position on uninitialized values overly conservative? Who wrote that text in the first place?

For most data, reading an undef is a memory safety hazard. Many types contain pointers or other data that unsafe code relies on to be well-formed with respect to some chunk of memory (accurate bounds etc). In this particular case (the proposed Mappable trait’s guarantees), reading an undef is “fine”.

Rust guarantees that all values have been initialized. This guarantee has always stood. This is a good guarantee to have, I think, especially in terms of “do what I want”.

Another risk is accidentally exposing sensitive information. If the uninitalized spot used to hold a private key, this could be easily exploited.

Sorry, a think-o in the previous post may have asked the wrong question. Lemme rephrase:

It would be great to understand whether and why it is considered undefined behavior (rather than “unsafe”) to read uninitialized u8 values (specifically).

Safe Rust guarantees that all values have been initialized. That’s great, should always stand, etc. I’m happily using unsafe, where many of Rust’s guarantees don’t hold, but I want to avoid undefined behavior. I don’t see why it should be undefined behavior to read an undef u8, but I’d be happy to be corrected.

1 Like

That’s a reasonable question, to which I don’t know the answer. However, I think for the purposes of this proposal, requiring #[repr(packed)] may be the right call regardless.

Suppose you use the u8 to index into an array afterwards. This will run code similar to

let index: u8 = unsafe { mem::uninitialized(); }
let index = index as usize;
if index >= array.len() {
    panic!("array access out of bounds");
}
return &array[index];

Now LLVM says about undef values that they don't even need to have a consistent value. That means that it can first have a value smaller than array.len() in the condition, and a bigger one afterwards. This leads to ineffectice bounds checking and thus to memory unsafety (undefined behavior), as you now obtained a reference for a member outside the array's bounds.

2 Likes

Ha, never thought of that. LLVM is perfectly within its right to use padding space to put other things there that need to be on stack.

1 Like

I’m totally ok with “using undef values poorly is undefined behavior”, in the same spirit as Rust’s guidelines on aliasing (“don’t do anything that breaks LLVM’s aliasing model”).

Maybe the use case I have is sufficiently narrow that asking for clarification in language is not going to be helpful. Roughly, and I think in the intent of this RFC too: memcpy on undef data shouldn’t be undefined behavior unless it really is undefined behavior.

To be honest, all I’d really like to know is where Rust makes its Faustian bargains with LLVM, and whether/why it promises anything that would cause putting an undef byte in a file or socket (or back into a padding byte) to be undefined behavior. It makes a very big difference in serialization performance for general types.

If you’re worried about performance isn’t the better solution to use your own padding with #[repr(packed)]? Writing to a padding byte is definitely bad given tbu’s example—if that space is used to hold something else on stack, then you’d be clobbering that.

1 Like

I don’t get to write the types. Someone shows up with a Vec<(u8, u64)> and wants it serialized.

If Rust made a deal with LLVM that it points out padding bytes and promises never to use them, that would be a great example of one of these bargains.

If Rust didn’t make that deal, LLVM should keep its register spill out of locations my code writes to. :smile:

Edit: as far as I can see, writing to a padding byte isn’t undefined behavior (ideally because LLVM only uses locations it can see as unused). If that isn’t the case, someone should add it to the list of behavior considered undefined.

Edit 2: in case it clears up where I’m coming from / what I’d like to be “defined behavior” https://github.com/frankmcsherry/abomonation

The “reads of uninitialized memory” rule doesn’t mean a read in the sense of the CPU reading memory, but rather reading a value from memory into a Rust temporary or variable (expressions like *ptr). Using std::ptr::copy_nonoverlapping or libc::write on a pointer to uninitialized memory is fine.