pre-RFC FromBits/IntoBits

gnzlbg · May 22, 2018, 4:47pm

@cramertj I think I don't follow the reasoning here, could you go step by step?

The unsafe impl<T> FromBits<T> for [u8] {} allows you to go from a T to a sized slice of u8s: [u8]. One can then take a reference and get a &[u8], and then one can call an API that gives you a &U from the &[u8] somehow. If T has padding bytes, as long as U has padding bytes in the same locations, there is no way to read the padding bytes of T via the &U.

So how can that be unsafe? What am I missing?

joshlf · May 22, 2018, 4:53pm

What if U doesn't have padding bytes in the same locations? It would still be the case that any valid T is a valid [u8], and that, separately, any valid (initialized) [u8] is a valid U, but it's not the case that any only partially-initialized [u8] is a valid U. And that's the crux of the problem - the current FromBits definition implies transitivity of references (if &T -> &U and &U -> &V then &T -> &V), but it seems that such transitivity is actually unsound given that [u8] can safely have uninitialized memory while other types can't.

Note that this wouldn't be a problem if we removed the exception for [u8], because &T -> &U would only be safe if none of U's data fields overlapped with T's padding, and so &U -> &V really would safely imply &T -> &V.

cramertj · May 22, 2018, 4:55pm

Sure! If we have unsafe impl<T> FromBits<T> for [u8] {}, then we get an impl FromBits<MaybeInit<u16>> for [u8] {}. Presumably we also have an impl FromBits<[u8]> for u16 {}. Then we can create &MaybeInit::empty() an pass it to coerce_ref_size_checked to get a &[u8]. Then we can pass the &[u8] to coerce_ref_size_align_checked which sees that the size and alignment match and gives an &u16 out, which now points to the original uninitialized memory.

gnzlbg · May 22, 2018, 4:57pm

That is the crux of the problem indeed. It now makes sense to me that the blanket impl is unsafe.

joshlf · May 22, 2018, 5:12pm

Having thought about it some more, I’m not sure that there’s a solution other than to remove the blanket impl and pretend that reading uninitialized memory with a [u8] is unsafe. Concretely, my idea for having U: FromBits<T> not imply &U: FromBits<&T> runs into the exact same problems. Consider this straw man proposal…

U: FromBits<T> doens’t imply &U: FromBits<&T>
If &U: FromBits<&T>, and either U or T are DSTs, then do something reasonable. I’m not sure what that would be, but it’s not relevant to this strawman.
There’s a blanket impl, unsafe impl<T> FromBits<&T> for &[u8] {}.
It is never safe to have unsafe impl FromBits<&[u8]> for MyT {} because of the possibility of reading uninitialized memory as MyT, which is UB.

This isn’t very useful because interpreting a random byte slice as another type is one of the most important use cases of this stuff. Thus, consider another straw man in which we allow for reinterpreting byte slices:

Unlike before, there is no blanket impl for &[u8], and we require that unsafe impl FromBits<&MyT> for &[u8] {} is only safe if MyT doesn’t have any padding/isn’t an enum/etc.
Now it’s safe to have unsafe impl FromBits<&[u8]> for &MyT {} for some values of MyT.

However, this latter proposal is essentially exactly what we have already, only less ergonomic because U: FromBits<T> doesn’t imply &U: FromBits<&T>.

Thus, I propose the following:

U: FromBits<T> implies &U: FromBits<&T> as it does now.
[u8]: FromBits<MyT> is only valid if every valid value of MyT contains no uninitialized memory.
As a result, unsafe impl FromBits<[u8]> for MyT {} is sometimes safe
For convenience, we add a function to coerce T into [u8] by copying bytes, which results in the [u8] being completely initialized, but we do not provide a function to coerce &T into &[u8].

gnzlbg · May 22, 2018, 5:24pm

Note that this isn't all that bad with transitivity since if From<T> for [u8] can be safely implemented manually for a type T, that's the only impl that a user must manually provide (all other impls follow from that one due to transitivity).

joshlf · May 22, 2018, 5:25pm

I'm not sure I'd say all other impls, but definitely a lot. That's a good point.

hanna-kruppe · May 22, 2018, 5:38pm

Ah, right, thanks.

These functions are not special in this sense, any loads and stores should have this property. ptr::{read,write} are at most special in that they are the primitive way to express "write without dropping old contents first" (in contrast to *p = x; which drops) and "read without concern for move semantics" (in contrast to let x = *p; which only works for Copy types). But this isn't really true either, at least of ptr::read (check out its implementation, you could write that in stable Rust today).

Aside from that, the IRC discussion @gnzlbg alludes to was about raw pointer loads and stores. When you add references in the mix, it gets a lot more complicated. While it's probably fine to cast a *const T to a *const [u8; size_of::<T>()] and copy around any of those bytes even if they're padding or uninitialized, a &[u8] to the same memory is a rather different story. References have quite strong invariants both about their address (+ metadata, if any) and about the concents of the memory they point at – the main open questions are about when exactly these invariants are asserted (e.g., "all the time", "at function call boundaries", "when you load or store through the reference", etc.).

To say that a &[u8] to padding or unintialized memory is fine amounts to saying there is no such thing as uninitialized or padding memory, every byte of memory always one of 256 possible values and it's safe to do anything with any of these bytes. I don't want to rehash that debate here but this option is very radical and, I believe, unacceptable for Rust because of how much it constrains the optimizer.

Thus, I don't think a &T -> &[u8] conversion could be safe for all T even if there was then no way to reinterpret the &[u8] as any other type (e.g. imagine it was a trait object for trait Blob { fn get_byte(idx: usize) -> u8; }). Or, put differently, there needs to be some way to correctly handle padding and uninitalized memory in unsafe code, but I don't see any (desirable) way to give safe code such capabilities.

joshlf · May 22, 2018, 5:44pm

Would it be safe to have a function like this?

fn move_into<T>(t: T, dst: &mut [u8]) {
    // verify that dst is large enough
    // write t's bytes into *dst and forget t
}

hanna-kruppe · May 22, 2018, 5:48pm

If we take the quoted belief for granted, no, it wouldn't be safe. It's basically the same thing, after all.

joshlf · May 22, 2018, 5:50pm

My understanding from the previous discussion is that it’s safe to copy uninitialized bytes into a [u8], at which point the [u8] is considered initialized? Maybe I misread that.

hanna-kruppe · May 22, 2018, 5:53pm

I don’t know where you read that, but I would argue against it. It still overly constrains the optimizer (you either have to stop doing many memcpy and load-store optimizations, or you get all the same restrictions that “there’s no such thing as uninitialized memory” implies).

joshlf · May 22, 2018, 5:55pm

OK, so it sounds like a) Rust differs from C/LLVM in that reading uninitialized memory into “character types” is UB and, b) we should just go full bore on disallowing going from T -> U if U has data where T has padding or other possibly uninitialized bytes?

gnzlbg · May 22, 2018, 5:56pm

But can’t you do that via a pointer instead of a reference?

fn move_into<T>(t: T, dst: *mut [u8]) {
    // verify that dst is not null
    // verify that *dst is large enough
    // write t's bytes into *dst and forget t
}

hanna-kruppe · May 22, 2018, 6:01pm

No! As I said before, unsafe code needs to be able to do some things with uninitialized memory and especially with padding. Simply loading from it and storing the loaded value elsewhere should definitely be among the things possible. The rationale for that is that it needs to be possible to implement e.g. memcpy in Rust. It's just that doing many other things with padding/uninitalized bytes should probably be UB, or otherwise dangerous enough that we can't allow safe code to do it arbitrarily.

This function isn't safe because dst can be non-null and dangling, but supposing it's okay to write size_of::<T> bytes through dst, then yes, my gut feeling is that this function would be fine. Turning the *mut [u8] into a reference to [u8] or some other type might be unsound, though.

joshlf · May 22, 2018, 6:03pm

In that case, couldn't you just implement my proposed function by first converting the &mut [u8] into a *mut [u8]?

hanna-kruppe · May 22, 2018, 6:12pm

Having the &mut reference probably implies a lot more than having a raw pointer to the same memory. See @RalfJung's Types as Contracts work for one specific proposal for what (and when) this means, though it doesn't talk a lot about initialized-ness specifically. I don't think those semantics would actually have a problem with this specific example, but might have a problem with e.g. a caller that does move_into(t, &mut buffer); and then passes buffer to another function.

To be clear, there will necessarily be some loopholes for taking a reference to some invalid memory and quickly turning it into a raw pointer (since in current Rust you can't create a raw pointer without first going through references). but having a reference for an extended period of time, across (e.g.) multiple crates, as the safe APIs discussed in this thread will allow, will certainly require the reference to point to "valid" memory. What that entails is up for discussion but as I said I really hope & expect it will exclude treating uninitialized or padding bytes as u8 values.

joshlf · May 22, 2018, 6:13pm

Yeah, I’m definitely leaning more and more towards just banning it outright and requiring manual (or derived) impls for the subset of types for which it’s safe.

hanna-kruppe · May 22, 2018, 6:50pm

If I rediscovered this link earlier I would have brought it up somewhere appropriate, but it’s still relevant so I’ll just drop it here: https://github.com/nikomatsakis/rust-memory-model/issues/42

joshlf · May 23, 2018, 5:36am

Related question: if we’re going to say that the uninitialized bytes of a given type must not correspond to the data bytes of another type, what exactly does that mean? For example, the following is probably fine:

#[repr(C)] struct T { a: u8, /* padding byte */, b: u16 }
#[repr(C)] struct U { a: u8, /* padding byte */, b: u16 }
unsafe impl FromBits<T> for U {}

because the padding byte is in the same position in both types. However, is it valid to convert an enum or union to a struct so long as the possibly-uninitialized bytes of the former correspond to padding bytes of the latter? In other words, are all possibly-uninitialized bytes alike, and converting between them is fine, or are different “types” of possibly-uninitialized bytes distinct, and so converting between them is not necessarily safe in the general case?

Topic		Replies	Views
Pre-RFC: PlatformFrom and PlatformInto libs	14	1682	June 25, 2020
pre-RFC: default fn impl in std::convert::From libs	7	1137	March 25, 2019
Pre-RFC: Add explicitly-named numeric conversion APIs libs	26	4942	March 11, 2020
Proposal: Platform-dependent conversions libs	9	962	June 25, 2020
New trait: core::convert::IntoUnderlying libs	2	593	March 28, 2021

pre-RFC FromBits/IntoBits

Related topics