pre-RFC FromBits/IntoBits

joshlf · May 15, 2018, 11:31pm

So my thinking was that ArbitraryBytesSafe would be somewhat related to but also somewhat orthogonal to FromBits/IntoBits. In particular, you could imagine the following holding:

If something is ArbitraryBytesSafe, then it is FromBits<T> for arbitrary T
If something is ArbitraryBytesSafe, then immutable references to it are FromBits<&T> for arbitrary T
If two things, T and U, are both ArbitraryBytesSafe, then mutable references to T are FromBits<&mut U> and vice versa
If something is FromBits<[u8; size_of::<Self>()]>, then it is ArbitraryBytesSafe.

And I could also imagine other blanket/default (I'm bad with terminology) impls holding for other combinations. But the example you mentioned would be naturally handled by only writing blanket/default impls that are definitely sound. For example, it sounds like m32x4 shouldn't be ArbitraryBytesSafe; if it were, then it would be safe to transmute m16x8 into m32x4.

Note that there are also concerns around alignment that are trickier when you're trying to transmute references than when you're only transmuting values. My proposal focuses on references because the goal is to enable safe zero-copy deserializing, but I don't think you have to deal with those concerns here because you're doing everything by value. There are also concerns, when consuming by value, around whether you drop or forget the input.

gnzlbg · May 16, 2018, 7:34am

Ah! I thought that ArbitraryBytesSafe should solve the whole problem!

I don't know how I feel about having 2 mechanisms to achieve almost the same thing, but not quite.

In particular, @jmst proposed a Compatible trait (read from here downwards: pre-RFC FromBits/IntoBits - #23 by gnzlbg) that appears to solve all problems better than FromBits/IntoBits and ArbitraryBytesSafe.

So I wonder why can't ArbitraryBytesSafe just derive Compatible<[u8; mem::size_of::<T>()]> or similar instead?

joshlf · May 16, 2018, 8:00am

I don't think it can because it's strictly less powerful than FromBits/IntoBits. In particular, the latter can express the idea that "any valid instance of this type has a bit pattern which is also valid for this type" even if the latter type is not ArbitraryBytesSafe. For my particular use cases, I don't need anything that powerful, but it sounds like you do.

Have you thought about having a custom derive to cover the gaps? If I've written a type that I'd like to be Convert<T>, but it has private fields, then I'm forced to unsafe impl Convert<T> for MyType {}. The thing is, it's not actually memory unsafety that's the issue here, but invariants on the values of my fields, so having to invoke unsafe here feels wrong and dangerous since it might let you not only break your contract, but actually introduce memory unsafety. A custom derive would give us something like "dear custom derive, I know that I'm OK with converting from T for the purposes of my invariants, so could you verify for me that it would be memory safe?"

Also, it looks like there hasn't been any discussion about converting references. I'm interested in zero-copy deserializing, so being able to convert references would be huge. In particular, I can imagine the following:

pub fn safe_transmute_ref<T, U>(x: &T) -> &U where U: Compatible<T> { ... }
pub fn safe_transmute_mut<T, U>(x: &mut T) -> &mut U where U: Compatible<T>, T: Compatible<U> { ... }

As discussed in Pre-RFC: Trait for deserializing untrusted input, there are still issues with verifying alignment, but it'd be a very powerful feature to have in general.

gnzlbg · May 16, 2018, 8:26am

Not really, but why can't that be done with a custom derive? If it can, then it can just be done on a crate, without having to write any kind of RFC for it. The proposal ensures transitivity, so this derive doesn't really need any kind of compiler support, it is purely syntactic.

Also, it looks like there hasn’t been any discussion about converting references.

One might be able to solve this with some blanket impls for references and raw pointers:

impl<T, U> Compatible<&T> for &U where U: Compatible<T> {}
impl<T, U> Compatible<&T> for &mut U where U: Compatible<T> {}
impl<T, U> Compatible<&mut T> for &mut U where U: Compatible<T> {}
// ... and for *T ...

there are still issues with verifying alignment

Which issues? For practical purposes transmute is just a memcpy, so if the source type is properly aligned and the destination type is properly aligned, which they must be, then there aren't any alignment issues AFAICT. The same applies to "endianness": since transmute is just a memcpy, the bytes will just be copied from the source to the destination. If you don't take endianness into account when reinterpreting the bytes on the destination, you will get different results on big endian and little endian systems, but that is just how memcpy works, so that's "working as intended".

joshlf · May 16, 2018, 7:11pm

Yeah, and I know that there was a proposal to have v0 of this just require manual impls. If you can have a custom derive, then v0 could use the custom derive (so that folks don't have to unsafe impl manually, which introduces a risk of unsafety), and have v1 be the move from custom derive to compiler-supported auto trait.

I'm referring to alignment of references. impl<T, U> Compatible<&T> for &U where U: Compatible<T> {} isn't safe in general because U may have higher alignment requirements than T, so it's not actually guaranteed that any valid &T is also a valid &U. Pre-RFC: Trait for deserializing untrusted input discusses some options here, but if you don't have compiler support, then your options are either somewhat unergonomic (use a macro that uses static_assert! under the hood) or unsafe (since the caller needs to verify alignment manually).

gnzlbg · May 16, 2018, 7:40pm

This makes sense.

At this point doing this in the compiler looks like the most appealing solution to me, and is something that Compatible<T> could do for references, for example: if T is compatible with U and some alignment conditions hold, then &T is compatible with &U.

Maybe one day we will be able to do where mem::align_of::<T>() >= mem::align_of::<U>() in the language, but this won't be the case in the near future since we need more than just const generics for this.

joshlf · May 16, 2018, 8:27pm

Yeah, my stopgap idea was to use macros to do something like:

pub unsafe fn transmute_ref<T, U>(x: &T) -> &U where U: Compatible<T> { ... }

macro_rules! transmute_ref {
    ($x:expr, $T:ty, $U:ty) => (
        static_assert!(::std::mem::align_of::<$T>() >= ::std::mem::align_of::<$U>());
        unsafe { transmute_ref::<$T, $U>($x) }
    );
}

joshlf · May 17, 2018, 2:40pm

Another question: What about DSTs? In particular:

If T is a DST and U: Sized, then what does U: Compatible<T> mean?
If T and U are both DSTs, then what does U: Compatible<T> mean?

Some ideas:

If T is a DST and U: Sized, then
- size_of_val(t) == size_of::<U>() implies that t's bits correspond to a valid U.
- size_of_val(t) > size_of::<U>() might imply that t's bits correspond to a valid U? Is this always safe?
If T and U are both DSTs, then
- If you have an existing u: U, then any t: T of size size_of_val(u) is a valid U (in other words, the existence of t is proof that size_of_val(t) is a valid size for T)
- Maybe it’s the case that any t: T whose size corresponds to a valid size for U is a valid U? What does "valid size for U" even mean? Can we query it at compile or run time?

There’s a caveat here, which is that since we’re defining what Compatible means, the questions of “is this safe?” are somewhat up to how we define Compatible. I have a vague intuition for how this would work with [T] and composite types ending in [T]. I have essentially no idea how this would work for trait objects. I’d love to hear some thoughts on all of this.

gnzlbg · May 17, 2018, 3:08pm

Is it possible to mem::transmute a DST into a Sized type and vice-versa ? Is it possible to mem::transmute two different DSTs to each other?

joshlf · May 17, 2018, 3:14pm

No because mem::transmute operates on values, not references. So both its arguments must be Sized.

Huge credit to @comex for figuring out a way to do this today! Here's the idea:

trait AlignCheck {
    const BAD: u8;
}

// only compiles if align_of::<T>() <= align_of::<U>()
impl<T, U> AlignCheck for (T, U) {
    // This is a division by 0 if align_of::<T>() > align_of::<U>(),
    // producing a constant evaluation error
    const BAD: u8 = 1u8 / ((std::mem::align_of::<T>() > std::mem::align_of::<U>()) as u8);
}

pub unsafe fn unsafe_transmute_ref<T, U>(x: &T) -> &U
{
    let _ = <(T, U) as AlignCheck>::BAD;
    &*(x as *const T as *const U)
}

And it actually works!

Also, they pointed out that:

So that might be an approach we could take as well.

joshlf · May 22, 2018, 12:23am

OK, here goes a first draft. A few things to note:

I went with FromBits<T> instead of Compatible<T> because I think it’s more descriptive. However, it behaves roughly as Compatible<T> has been proposed here, and there’s no IntoBits<T>.
I covered the case in which T is a DST and Self: Sized, but I haven’t yet figured out what to do when Self is a DST.
The file is pretty long, so here’s a summary of what’s offered if you just want to skim:
- FromBits<T> - as described
- FitsIn<T> - guarantees that T is no smaller than Self
- AlignedTo<T> - guarantees that Self is as aligned as T
- transmute - like mem::transmute, but T can be larger than U
- coerce - like transmute, but safe
- coerce_{ref,mut}_xxx - coercions from one reference type to another, including variations with both compile- and run-time-verified size and alignment.
- LayoutVerified - An object whose existence proves that certain size and alignment checking has been performed, allowing for size and alignment checking to be elided in the future when doing coercions.

I’d love any feedback you have! I’d also be interested to know whether you can think of any use cases for transmute. The only difference between it and mem::transmute is that T can be larger than U, and @cramertj feels that its presence is unjustified. If we can’t think of any use cases, then I agree.

joshlf · May 22, 2018, 11:05am

Interesting question from @cramertj: Is it safe to have unsafe impl<T> FromBits<T> for [u8]? You might expect that the answer is obviously yes since any random set of bytes is a valid byte slices, however…

In other languages, it can be UB to read an uninitialized value. Some notable quotes:
- “Reading an uninitialized CPU register on Itanium is the best example of a hardware-induced crash covered by this rule.”
- “Reading uninitialized memory by an lvalue of type unsigned char does not trigger undefined behavior. The unsigned char type is defined to not have a trap representation, which allows for moving bytes without knowing if they are initialized.”
- “However, on some architectures, such as the Intel Itanium, registers have a bit to indicate whether or not they have been initialized. The C Standard, 6.3.2.1, paragraph 2, allows such implementations to cause a trap for an object that never had its address taken and is stored in a register if such an object is referred to in any way.”
In C, it is always safe to read uninitialized memory as unsigned char *, but not as anything else. So maybe this is safe precisely because we’re implementing it for [u8]? I suspect (though can’t find a reference) that LLVM has a notion of character type, and so this question comes down to whether u8 is considered a character type by LLVM.

hanna-kruppe · May 22, 2018, 11:36am

The criterion for deciding this question are Rust's semantics. What LLVM, other languages, and CPUs do is only relevant in two respects:

What LLVM and CPUs do might prevent us from implementing some particular semantics efficiently. (NB: LLVM does not have a notion of "character type".)
The reasons why other languages are aggressive about uninitialized memory (e.g., optimizations enabled by it) might also be relevant for Rust.

Aside: why does this question lead to considering uninitialized memory? It seems to me the problem is padding -- which at the end of the day is probably physical memory that isn't written to. However, when talking about language semantics, it's perfectly possible and perhaps even advisable to distinguish padding bytes from non-padding bytes.

Regardless, the meaning (or lack thereof) of reads from uninitialized memory are a broader question whose answer is part of the unsafe code guidelines. Unfortunately it appears this particular question hasn't been addressed yet. There's multiple threads touching on the subject here in this forum, but as far as I remember it's never been in focus for the working group.

gnzlbg · May 22, 2018, 12:42pm

Well that depends.

I think that you can always implement this safely for types without padding bytes (e.g. repr(packed), or maybe even repr(C)?), and also for all types if the implementation does not touch the padding bytes in which case the size of the [u8] might be smaller than the size of T.

If you are talking about implementing FromBits<T> for [u8] by memcpying all the bytes for types with Rust layout including padding bytes then, as @hanna-kruppe says, that will depend on whether reading a padding byte is a read from uninitialized memory or not (cc @strega-nil @RalfJung).

In any case, given that the layout of types with Rust layout is unspecified (e.g. the compiler can reorder fields at will), you will run into issues when serializing/deserializing with different Rust versions on top of the endianness issues that you get when serializing/deserializing on different architectures.

At that point you might as well restrict that blanket impl to repr(C) types and call it a day.

joshlf · May 22, 2018, 3:19pm

The question isn't whether it's safe for some T, but rather whether a blanket impl for all T - unsafe impl<T> FromBits<T> for [u8] {} - is safe.

I agree, although it sounds like LLVM might at the very least give us safety for "character types." It's obviously another question whether Rust formally guarantees that u8 corresponds to an LLVM "character type." It doesn't sound like such a guarantee is made right now, but it's not clear to me whether this is one of those "Rust's memory model is undefined anyway, and this is a pretty safe bet" things or one of those "you really shouldn't be relying on it, as there's a meaningful chance that might not be part of a Rust memory model that is formalized in the future" things.

That's true, but the question is only about whether it will cause UB.

I'm not sure that'd be sufficient, as @dtolnay suggests in Does repr(C) define a trait I can use to check structs were declared with #repr(C)? that repr(C) types can contain non-repr(C) types. Plus, as that thread describes, there's no trait for repr(C) types anyway.

cramertj · May 22, 2018, 4:26pm

The blanket impl still isn't safe because it allows you to go from uninitialized -> &[u8] -> &T where T: FromBits<[u8]>.

cc @RalfJung -- halp!

cramertj · May 22, 2018, 4:28pm

You can use union MaybeInit<T> { uninit: (), x: T } to get uninitialized memory and then get a &[u8] from that if FromBits<MaybeInit<Foo>> is implemented for [u8].

gnzlbg · May 22, 2018, 4:33pm

Just summarizing the recent discussion with @hanna-kruppe and @mbrubeck on IRC as I understood it.

whether a blanket impl for all T - unsafe impl FromBits for [u8] {} - is safe.

Without a memory model, its impossible to say anything completely accurate here.

But chances are that this is safe for all types if you implement it properly, preferably by just using ptr::{read,write}. These functions might be special in the memory model if they can read uninitialized memory and copy it into a destination without making the memory in the destination initialized.

That is, "the impl is safe", what would introduce undefined behavior is reading the padding bytes from the resulting [u8] without going through ptr::{read,write} again. So if you coerce from T to U using that impl, as long as when accessing U fields you don't read any padding bytes from T, then everything is ok.

@cramertj

The blanket impl still isn’t safe because it allows you to go from uninitialized -> &[u8] -> &T where T: FromBits<[u8]>.

For that you would need an impl<T> FromBits<&[u8]> for &T where T: FromBits<[u8]> , but adding this impl would be incorrect since not all &[u8] bit patterns are valid &T patterns (e.g. null) right?

cramertj · May 22, 2018, 4:36pm

This conversion doesn't require that impl because of the coerce_ref_size_checked function. This function allows going from &T to &U where U: FromBits<T> and T and U have the same alignment (coerce_ref_size_align_checked will dynamically check both size and alignment).

joshlf · May 22, 2018, 4:38pm

Is there perhaps room for discussion of conversions which are safe on values (e.g., using ptr::{read,write}) but not on references? In the current draft, I just assume that U: FromBits<T> implies that you can convert references safely (with some size and alignment conditions), but maybe we don't want to make that assumption? I feel like the assumption is still valid if we rule out uninitialized memory, but uninitialized memory itself seems to be what's tripping up that model.

Topic		Replies	Views
Pre-RFC: PlatformFrom and PlatformInto libs	14	1682	June 25, 2020
pre-RFC: default fn impl in std::convert::From libs	7	1137	March 25, 2019
Pre-RFC: Add explicitly-named numeric conversion APIs libs	26	4942	March 11, 2020
Proposal: Platform-dependent conversions libs	9	962	June 25, 2020
New trait: core::convert::IntoUnderlying libs	2	593	March 28, 2021

pre-RFC FromBits/IntoBits

Related topics