[Pre-RFC v2] Safe Transmute

josh · December 5, 2019, 8:25am

This is an updated version of the proposal originally discussed at [Pre-RFC]: Safe Transmute , incorporating extensive feedback from that thread, and further design work from both Ryan Levick and myself. Thanks to everyone who contributed in that thread!

Safe(r) Transmute

Authors: Ryan Levick, Josh Triplett

Transmuting one type to another type and vice versa in Rust is extremely dangerous---so much so that the docs for std::mem::transmute are essentially a long list of how to avoid doing so. However, transmuting is sometimes necessary. For instance, in extremely performance-sensitive use cases, it may be necessary to transmute from bytes instead of explicitly deserializing and copying bytes from a buffer into a struct.

Causes of Unsafety and Undefined Behavior (UB)

At the core of understanding the safety properties of transmutation is understanding Rust's layout properties (i.e., how Rust represents types in memory). The best resource I've found for understanding this is Alexis Beingessner's blog post on the matter.

The following are the reasons that transmutation from a buffer of bytes is generally unsafe:

Illegal Representations: Safe transmutation of a slice of bytes to a type T is only possible if every possible value of those bytes corresponds to a valid value of type T. For example, this property doesn't hold for bool or for most enum types. While size_of::<bool>() == 1, a bool can only legally be either 0b1 or 0b0 - transmuting 0b10 to bool is UB.
Wrong Size: A buffer of bytes might not contain the correct number of bytes to encode a given type. Referring to uninitialized fields of a struct is UB. Of course, this assumes that the size of a given type is known ahead of time which is not always the case.
Alignment: Types must be "well-aligned" meaning that where they are in memory falls on a certain memory address interval (usually some power of 2). For example the alignment of u32 is 4 meaning that a valid u32 must always start at a memory address evenly divisible by 4. Transmuting a slice of bytes to a type T that does not have proper alignment for type T is UB.
Non-Deterministic Layout: Certain types might not have a deterministic layout in memory. The Rust compiler is allowed to rearrange the layout of any type that does not have a well defined layout associated with it. Explicitly setting the layout of a type is done through #[repr(..)]. To be deterministic, both the order of fields of a complex type as well as the exact value of their offsets from the beginning of the type must be well known. This is generally only possible by marking a complex type #[repr(C)] and recursively ensuring that all fields of the struct are composed of types with deterministic layout.

Transmuting from a type T to a slice of bytes can also be unsafe or cause UB:

Padding: Since padding bytes (i.e., bytes internally inserted to ensure all elements of a complex type have proper alignment) are not initialized, viewing them is UB. For instance, (u8, u32) has 3 bytes of padding to align the u32. Note that a type may have padding at the end, not just in the middle, to ensure that its size is a multiple of its alignment: (u32, u8) has 3 bytes of padding at the end to make its size 8, a multiple of the 4-byte alignment required for u32.
Non-Deterministic Layout: The same issue for transmuting from bytes to type T apply when going the other direction.

Proposed Improvements

Introduce traits for types that can be safely transformed to/from bytes

We first introduce the traits FromAnyBytes and ToBytes (names subject to bikeshedding - see below).

FromAnyBytes represents any type where all properly aligned and sized byte patterns are legal (from here on referred to as "byte complete" types), such that any byte slice of the same size can be transmuted into the type in-place without further checking.
ToBytes represents any type that can be transmuted into bytes in-place, which in requires that the type must not have any padding.

All core types that are byte-complete implement both FromAnyBytes and ToBytes; a full list appears below. Core types like bool that need further validation before being safely transmuted from bytes only implement ToBytes. Both traits can be safely opted into either using #[derive(...)] or impl blocks as long as:

They are only recursively composed of FromAnyBytes or ToBytes types respectively
They have a deterministic layout (such as types using repr(C) or repr(transparent))
For ToBytes, they contain no padding bytes.

The compiler will return an error when the type does not fit all of the necessary conditions.

FromAnyBytes contains no methods and serves as a marker trait; the next section defines a FromBytes trait with an automatic implementation for types implementing FromAnyBytes, which allows manual implementation for non-byte-complete types. ToBytes contains methods (defined in the next section) and implementations of those methods, with the expectation that those implementations will work for all types deriving the trait; those methods should not be manually implemented.

Notes on types implementing FromAnyBytes and ToBytes:

The user must opt into a complex type implementing FromAnyBytes and ToBytes, because this has implications on the public API of the type. For instance, changing normally private details of a complex type such as ordering of private fields may become a breaking change.
A struct that requires internal padding can become a struct that can derive ToBytes by explicitly defining padding fields.
The following core types will be marked as FromAnyBytes and ToBytes:
- u8, u16, u32, u64, u128, usize
- i8, i16, i32, i64, i128, isize
- f32, f64
- ()
- all SIMD types that are byte-complete
- Option applied to any NonZeroU or NonZeroI type
- Wrapping<T> for any T implementing the corresponding trait
- [T; N] for any T implementing the corresponding trait.
  - Note that all types guarantee their size is a multiple of their alignment, so a slice [T; N] can never contain padding that the type T doesn't itself contain.
The following additional core types will be marked as ToBytes only, and will have manual implementations of FromBytes (defined in the next section):
- bool
- any NonZeroU or NonZeroI type
- char
  - Note that this will produce and consume UCS-4 characters, and would require committing to the internal UCS-4 representation of char. We could, alternatively, omit the trait implementations for char.
All tuples composed of FromAnyBytes types will themselves implement FromAnyBytes.
All tuples composed of ToBytes types without padding can implement ToBytes. (Providing such implementations in the standard library may require compiler assistance.)
C-style enum types (with no fields in any variant) marked with #[repr(C)] or #[repr($INT)] may derive ToBytes.
Note that some structs may have "surprise" padding at the end and as such should not implement ToBytes. For example: struct MyType(u32, u8).
While it is theoretically possible to derive ToBytes and/or FromAnyBytes for generic structs which are generic over types that are ToBytes and/or FromAnyBytes, this is left to future work.
Transmute deals with in-memory data in-place, and thus does not have any provisions to perform translations between native endianness and non-native endianness.
There is no way to unsafe impl either FromAnyBytes or ToBytes for a type that doesn't meet the requirements.
Raw pointers could potentially implement both ToBytes and FromAnyBytes, and references or Option of references could potentially implement ToBytes. There may be uses for such implementations, but they also seem potentially error-prone. We propose to evaluate them further and consider such implementations in the future, but to not provide such implementations in the initial version.

Naming

The names for these traits are still subject to bikeshedding. There were several criteria used to select each trait name. First, the names should make their usages recognizable out of context although not necessarily sufficiently clear without prior exposure. It should be clear through the names how the two marker traits contrast with each other as well as the two further traits examined below. The FromAnyBytes trait should convey that any combination of bytes the same length as size_of<T>() is a valid representation of type T in memory. The ToBytes trait should convey that it is a well-defined operation to view the raw memory representation of the marked type.

Note that the working assumption is that these types will exist in the std::mem namespace.

Other names that were considered include:

FromValidBytes / AsValidBytes
FromValidBytes / ToValidBytes
SafeFromBytes / SafeToBytes
FromBytes / AsBytes
SafeTransmuteFrom / SafeTransmuteTo
FromAnyBytes / ToBytesInPlace

Introduce traits for safely transmutable types.

Next, we introduce a trait FromBytes, and the methods for the ToBytes trait.

FromBytes represents a type that may be transmuted from a byte array; the type need not be byte-complete (and implement FromAnyBytes), and the safe transmutation may fail with FromBytesError (defined in the following section).

trait FromBytes {
   fn from_bytes(bytes: &[u8]) -> Result<&Self, FromBytesError>;
}

impl<T: FromAnyBytes> FromBytes for T {
    // Inline to allow optimizing away the length and alignment checks.
    #[inline]
    fn from_bytes(bytes: &[u8]) -> Result<&Self, FromBytesError> {
        if bytes.len() < size_of::<Self>() {
            return Err(FromBytesError::InsufficientBytes);
        }
        if bytes.as_ptr().align_offset(align_of::<Self>()) != 0 {
            return Err(FromBytesError::InsufficientAlignment);
        }
        Ok(unsafe { std::mem::transmute<*const u8, &Self>(bytes.as_ptr()) })
    }
}

trait ToBytes {
    #[inline]
    fn to_bytes(&self) -> &[u8] {
        let pointer = self as *const Self as *const u8;
        unsafe {
            std::slice::from_raw_parts(pointer, size_of::<Self>())
        }
    }

    /// Safely cast this type in-place to another type, returning a reference
    /// to the same memory.
    fn cast<T: FromBytes>(&self) -> Result<&T, FromBytesError> { /*...*/ }

    /// Safely cast this type in-place to another type, returning a mutable
    /// reference to the same memory. This requires `Self` to satisfy
    /// `FromAnyBytes`, because writes through the returned mutable reference
    /// will mutate  `Self` without validation.
    fn cast_mut<T: FromBytes>(&mut self) -> Result<&mut T, FromBytesError>
      where Self: FromAnyBytes { /*...*/ }
}

Users can also manually implement FromBytes for a non-byte-complete type. For instance, the standard library will implement FromBytes for bool as follows:

impl FromBytes for bool {
  fn from_bytes(bytes: &[u8]) -> Result<&Self, FromBytesError> {
    match bytes.get(0) {
      Some(b) if b == 1 || b == 0 => Ok(unsafe { std::mem::transmute<*const u8, &Self>(bytes.as_ptr()) }),
      Some(_) => Err(FromBytesError::InvalidValue),
      None => Err(FromBytesError::InsufficientBytes),
    }
  }

Notes on manually implementing FromBytes:

In the case where the slice passed to from_bytes contains more than the number of bytes required to represent the type, the extra bytes should be ignored. This allows converting a slice without first manually re-slicing it to the length of the type.
from_bytes should process exactly size_of::<T>() bytes, and return Err(FromBytesError::InsufficientBytes) if supplied with less.
These APIs should uphold the invariant that ValueType::from_bytes(value.to_bytes()) == Ok(value).

Introduce a type representing errors when safely transmuting from bytes

The FromBytesError type used above (name subject to bike-shedding) represents the types of errors that can occur when transmuting from bytes to a concrete type:

#[non_exhaustive]
#[derive(Debug, PartialEq, Eq, Copy, Clone)]
enum FromBytesError {
    InsufficientAlignment,
    InsufficientBytes,
    InvalidValue
}

impl Display for FromBytesError { /*...*/ }
impl Error for FromBytesError {}

Note that FromBytesError intentionally does not contain specific information on the errors, such as the invalid value or the number of bytes required.

Alternatives

FromBytesError could omit the InsufficientAlignment and InsufficientBytes variants, in favor of asserts, if we consider those developer errors. This may be preferable if we expect most such errors to get optimized away, and expect most developers to use .unwrap() or similar rather than handling these errors. In this case, we could make from_bytes never error on a type implementing FromAnyBytes, either by providing separate functions for FromAnyBytes and FromBytes (the latter returning Option), or by giving FromBytes an associated error type and using type ! as the error for types implementing FromAnyBytes. This would substantially improve ergonomics for the common case.

Safe Unions

Unions whose fields all implement both FromAnyBytes and ToBytes can potentially allow reads of their fields without requiring unsafe, since writing to one field and reading from another acts as a transmute operation, and these traits make transmutes safe.

However, when a union's fields have differing lengths (referred to here as "unbalanced unions"), initializing a shorter field does not necessarily zero out the remainder of the union. This means initializing a union with a shorter field and then reading a longer field leads to reading from uninitialized memory. To make this well defined, we propose adding a new repr, #[repr(zero_init)], which initializes the remainder of the union to zero when initializing any field. Thus, safe Rust can allow reading fields of unbalanced unions if and only if the union type implements ToBytes and FromAnyBytes and is #[repr(zero_init)].

Depending on complexity and consensus during the pre-RFC process, we may propose repr(zero_init) as part of this RFC, or as a separate follow-on RFC. In the latter case, safe reads of union fields would not be part of the initial RFC.

Alternatives

While reading uninitialized memory from an unbalanced union whose fields implement ToBytes and FromAnyBytes is rarely the correct thing to do, it could be argued that it is not unsafe, and thus could be allowed in safe Rust. Therefore, an alternative is possible where we simply allow reads from such unions in safe Rust. This would provide a definition for previously undefined behavior in Rust.

Possible future extension: safe copying casts to support types with padding

We could potentially provide a ToBytesCopy trait or similar, with methods that support copying into a separate byte slice, or copying into another type. Such a trait could have an automatic implementation for any type implementing ToBytes, but could additionally support manual implementations for types that have padding. Such manual implementations could then copy the fields and zero the padding.

The initial version of this proposal does not define such a trait.

Acknowledgments

Shout out to the following crates for paving the way with many good ideas:

josh · December 5, 2019, 8:53am

Update: I discovered one issue with the formulation of cast_mut above, which we're working on fixing. As written, its type signature establishes all the requirements to make it safe, but it isn't implementable with that type signature because it has no way to produce and return a &mut T since FromBytes::from_bytes only produces a &T. We're evaluating a couple of solutions to that and will update the proposal with one of those solutions (and list the others as alternatives).

Update 2: Thanks to @hanna-kruppe for helping with the solution. We're thinking about two possibilities:

Call from_bytes to validate, return the error if that fails, otherwise transmute directly from &mut Self to &mut T.
Factor out a separate function to perform the "validate bytes" step (e.g. "is the byte 0 or 1" for bool), let types implement that function themselves, then provide the implementation of FromBytes in the standard library. That would avoid the need to duplicate logic for size and alignment checking into every manual implementation (and possibly get it wrong), and would let us have both from_bytes and from_bytes_mut in FromBytes without duplicating the validation logic.

The latter solution would look something like this (very preliminary):

trait ValidateBytes {
    fn validate_bytes(bytes: &[u8]) -> bool;
}

impl ValidateBytes for bool {
    #[inline(always)]
    fn validate_bytes(bytes: &[u8]) -> bool { bytes[0] <= 1 }
}

impl<T: FromAnyBytes> ValidateBytes for T {
    #[inline(always)]
    fn validate_bytes(_: &[u8]) -> bool { true }
}

impl<T: ValidateBytes> FromBytes for T {
    #[inline]
    fn from_bytes(bytes: &[u8]) -> Result<&Self, FromBytesError> {
        if bytes.len() < size_of::<Self>() {
            return Err(FromBytesError::InsufficientBytes);
        }
        if bytes.as_ptr().align_offset(align_of::<Self>()) != 0 {
            return Err(FromBytesError::InsufficientAlignment);
        }
        if !<Self as ValidateBytes>::validate_bytes(bytes) {
            return Err(FromBytesError::InvalidValue);
        }
        Ok(unsafe { std::mem::transmute<*const u8, &Self>(bytes.as_ptr()) })
    }

    #[inline]
    fn from_bytes_mut(bytes: &mut [u8]) -> Result<&mut Self, FromBytesError> {
        // ... same as above but with mut added in the transmute ...
    }
}

gnzlbg · December 5, 2019, 10:03am

The proposal should include prior art, since there has been many, particularly for Rust, e.g., Joshua's RFC implemented in Fuchsia and based on FromBits/IntoBits pre-RFC implemented in the simd crates, similar traits in the TTF library, the interprocess crates, safecast, etc. This all amount to many years of experience trying to solve this problem in Rust in practice. For example, the Compatible<T>-trait proposals look like a much better overall direction for safe transmutes than this proposal to me along all axes I can think of, so it would be interesting for this proposal to include rationale about why this should be done instead. I'm probably missing some obvious tradeoffs, but it would be good to have them in writing.

Centril · December 5, 2019, 11:04am

To add more relevant cases:

There might be a safety invariant because the type is some form of proof token

Sounds like auto traits, but presumably that's not it; would be good to reword this.

Should, or cannot? (Compiler checked?)

Not defined anywhere?

It would be good to segregate exactly what the language and the library component of this proposal is so that it becomes easier to see what requires compiler support and what does not.

No need to mention this; we can just benchmark.

Why not:

match bytes {
    [0, ..] => Ok(&false),
    [1, ..] => Ok(&true),
    [_, ..] => Err(FromBytesError::InvalidValue),
    [] => Err(FromBytesError::InsufficientBytes),
}

(Does it have to point into the byte slice? If so why? And is that a guarantee?)

Using ! seems like a good idea to gain some more type safety with infallible unwraps.

Would prefer separating the proposals as safe unions may be substantially more invasive in typeck/borrowck/operational-semantics and so that we don't need to block safe transmutes on this.

This doesn't seem sound to me. It seems to be relying on having a correct implementation of from_bytes to legitimize a transmute on Ok(...). However, the trait is not unsafe.

Seems like it's making the same assumption as above. Give a bad impl ValidateBytes for bool and trigger UB in from_bytes now.

Must read re. validation:

https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/

josh · December 5, 2019, 11:54am

If I understand what you mean by that, that sounds like a specific instance of "you shouldn't derive FromAnyBytes or implement FromBytes because you don't want to let people construct the type at all except in specific ways". That's one of many reasons we expect types to explicitly declare those traits rather than implicitly having the compiler provide them for all types that could qualify.

Sure. What term would you use in prose to describe things like impl<T: SomeTrait> OtherTrait for T?

"cannot" if there's a supported way to have a "derive-only, no manual implementations" trait. "should not" otherwise.

Is there a way to make a trait that you can derive but you cannot manually implement?

True. I think we can just replace every instance of it with either "aggregate type" or just "type".

The only language/compiler support components are 1) enforcing the requirements to derive the traits, and 2) defining ToBytes only for tuples that don't have padding, if we do so.

How does that make documentation unnecessary or undesirable?

The goal there is to make sure, for instance, that casting u32 to [u16; 2] requires no runtime checks.

For the mutable version, it has to. For the non-mutable version, technically you don't have to, but it seems wasteful not to.

I think at this point I'm leaning slightly more towards having from_bytes and try_from_bytes, the former infallible (returning &T) and the latter falliable (returning Result).

I don't object to separating the proposals, but can you elaborate on why you think unions would be substantially more invasive? Does the #[zero_init] proposal together with fields that are FromAnyBytes and ToBytes not suffice to make union field reads safe?

A few answers to this:

First, I do absolutely favor encoding requirements in the type system whenever reasonably possible. If const generics and const functions become sufficiently powerful to do so, I'd love to express the size requirement in the type system (can only convert &[u8; sizeof::<T>()] to &T for instance) And if you can find a way to encode "sufficiently aligned byte slice" safely in the type system, I'm all for it. But in the absence of that, we have to check for size and alignment and then transmute. Given that, it feels less error-prone to me to only implement the size and alignment checks in one place, rather than expecting any manual implementation of FromBytes to manually reimplement the size and alignment checks correctly themselves.

Second and separately, without necessarily encoding this in the type system, can you give a concrete example of how you'd suggest "parsing" bytes into bool or NonZeroU32 and providing a function from bytes to that type that duplicates as little as possible between such functions? How would you suggest eliminating the error-prone boilerplate for "is it long enough" and "is it aligned enough", as well as the actual transmute, from user-written code like trait impls?

rylev · December 5, 2019, 12:26pm

The proposal has definitely taken prior art into consideration, especially from the two crates listed at the bottom zerocopy (which is authored by Joshua) and safe_transmute. We can definitely add a section making this more explicit.

As for the Compatible<T> proposal - it seems to my eyes that that proposal is compatible (no pun intended ) with our proposal. The differences being that we split the idea of types that are representable as bytes in a well defined way and types where you can take any appropriately aligned and sized slice of bytes and view it as that type. Ultimately this is a good idea to better capture types that are only one or the other (e.g., bool which can be represented as a byte, but not every byte is a valid bool, structs with padding are the opposite).

This allows one to have more fine grained control over what type of guarantees transmuting needs. For instance, you can transmute from a type like bool to another type. If we didn't split them and required the type be both formable from any byte pattern AND viewable as bytes, then types like bool or structs with padding would be restricted since they only meet one requirement.

We also want to allow transmuting references (both mutable and immutable) of a given type to a reference of another type. These have different restrictions on them than transmuting owned types. If we didn't have the more fine grained view of the guarantees a type has, we wouldn't be able to distinguish types that support different types of casting.

If you have specifics on where the proposal falls short or differs from the Compatible proposal in a specific, we'd love to see them so we can address them directly.

Centril · December 5, 2019, 12:28pm

Yep, that's right. To take an extreme variant of this, if you can make a bad Id then you can brick the type system.

Right; would be good to add to your list.

Well so there are two aspects here (it's like Copy):

Syntactic: You can #[derive(...)]
Semantic: There are restrictions (including when manually written out)

So perhaps: "derivable implementation with restrictions"?

You can have the compiler use #[allow_internal_unstable(...)] in the expansion to refer to a perma-unstable trait that you depend on to make sure only the expansion can implement it (see StructuralEq in std::marker - Rust and cc @pnkfelix & @petrochenkov).

Right, but spelling out how the requirements are enforced (perhaps with an experimental prototype implementation in the coherence checking code -- see link in previous thread) would help to clarify if new lang items need to be added and exactly what the algorithm is for the requirements.

(I'm just saying it is unnecessary in the RFC text because it's not a very interesting detail for users, and it could also change depending on how LLVM develops.)

Alright, So if there's no guarantee then we can iterate based on perf and benchmarks, but it would be good to state this explicitly that this is not a requirement (and may not be depended upon conversely).

In my view, TryFrom is good precedent here for using !. Moreover, it seems useful to state, as a bound, that the conversion cannot fail.

Invasive in terms of compiler implementation just to check those things and to then give operational semantics to repr(zero_init). It's less clear to me how and where exactly the tweaks to the compiler need to be done.

Not sure const generics is mature enough to experiment with this yet (it really makes things easier when prototyping).

The way you encoded from_bytes in impl<T: ValidateBytes> FromBytes for T { seems like it has reusable components so those could be extracted to unsafe functions + given elaborate safety docs. That should reduce boilerplate and make it less error prone.

atagunov · December 5, 2019, 2:18pm

Hi, "alternatives" could mention

trait ToBytes {
    // Allows transmuting in presence of padding
    // Would only be useful if there were library methods taking
    // &[MaybeUninit<u8>]: write, memcpy ...
    fn to_bytes(&self) -> &[MaybeUninit<u8>] ...
}

gnzlbg · December 5, 2019, 3:28pm

No doubt.

If you have specifics on where the proposal falls short or differs from the Compatible proposal in a specific, we'd love to see them so we can address them directly.

For example (one of probably many), with this proposal one can't perform a zero-cost safe transmute from bool to (bool,), #[repr(transparent)] struct B(bool);, #[repr(C)] struct B(bool);, etc. One can go from bool to [u8], but going from [u8] to, e.g., (bool,) would require run-time checks. This "goal" or "constraint" is mentioned in previous RFC, but not covered by this feature. The "future possibilities" section does not show how to extend this feature in a backward compatible way to support that, and I don't think it would be possible.

I could share the notes I made of the proposal if you want, but fixing the nitpicks won't make the design direction satisfy that particular constraint.

It might well be that this constraint is not worth satisfying, but if that's the intent the proposal should argue why it isn't worth satisfying.

CAD97 · December 5, 2019, 3:49pm

Minor notes:

UCS-4 is a deprecated term at best; I'm not sure if it even was ever defined. The correct term would be UTF-32 or just mentioning that char represents a codepoint and not some encoding of the codepoint.
IIRC, tuples do not have a guaranteed defined order yet. And I don't think we want to guarantee that tuples are laid out in source order, either, because we want to be able to size optimize them just like named types. (This is one of the problems with trying to add tuple airity abstractions; a cons-list encoding breaks this permission.) So, unfortunately, tuples aren't able to be FromAnyBytes/ToBytes in any case.

SimonSapin · December 5, 2019, 6:27pm

Some reactions to this proposals. Sorry if the point-by-point style is hard to read! There’s some overlap with other comments.

Do you mean the derive(...) macro will emit errors? I assume there is no such checking when the trait is implemented manually. Or should the trait be "magical" through special treatment in the language and compiler?

So it needs to be an unsafe trait. I didn’t find this specified in the rest of the proposal.

I assume this means methods with a default impl in the trait. So also an unsafe trait, since those defaults use unsafe {} and assume some properties of Self.

… for sizes N up to 32, until const generics are stabilized, like other standard library impls for arrays.

Sounds fine to me. (Though I would call it "code point value as u32" rather than UCS-4.) char is already documented to represent a Unicode scalar value and be four bytes in size, and implements Into<u32> and TryFrom<u32>.

This does however expose native endianness, which perhaps less obvious as it is for integer types?

… for arity up to 12, like other standard library impls for tuples.

From the point of view of the standard library this would definitely require a magic HasNoPadding unsafe trait that’s automatically implemented by relevant types (including tuples). This seems a bit esoteric, I don’t know if the language team could be convinced to add this.

I think that at least the documentation for these now APIs need to call out very prominently that they return different results for a given input based on the target CPU’s native endianness.

Existing APIs like u32::to_ne_bytes do so in the method’s name. Maybe this would be appropriate here as well?

What does that mean? What happens if a crate write such an impl?

If it’s already an unsafe impl, I don’t think further enforcement to prevent non-derived impls is useful. And it seems rather tricky to implement.

This seems to assume a FromBytes::from_bytes_mut method, which is missing in the proposed trait definition of FromBytes.

This seems like a significant departure from current language rules, that probably would need its own RFC. It may also be difficult to achieve with LLVM.

josh · December 5, 2019, 7:17pm

Will do; we can add a note that types shouldn't implement FromAnyBytes if they only want to allow construction via specific interfaces.

That's not what I was referring to. I said "automatic implementation" in the RFC to refer to cases where we implement a trait for every type that implements another trait. We can use a different term, but what term would you suggest for that, specifically?

Sounds reasonable to me. That would prevent manual implementation of ToBytes, and if we switch to the ValidateBytes approach then we'll also want to prevent manual implementation of FromBytes.

It's important to users that these conversions have minimal overhead; that has come up in discussions of this multiple times. We'll want to accompany the RFC with demonstrations that the compiler can completely optimize away the checks.

If we switch to the ValidateBytes approach, it'll no longer be possible for from_bytes to return a different reference (e.g. to static values).

If we don't, then how about something like "It is technically possible for a manual from_bytes implementation to return a reference to a static value rather than to the slice; doing so will not break any guarantees, but seems unlikely to provide any benefit."?

Returning types like Result<&T, !> seems likely to just lead to extensive use of into_ok. If we can statically make a conversion infallable, let's just return &T directly.

I'd still be somewhat interested in proposals that would allow statically encoding the concept of "sufficiently aligned slice of bytes", even if they're not feasible with our current type system.

We're seriously considering just making ValidateBytes an unsafe trait instead, which would mean you'd need to write an unsafe implementation for any non-byte-complete types you want to convert. (I don't expect that to come up nearly as often as the derive case.) Then the error-prone boilerplate becomes an internal detail of a sealed FromBytes trait.

But if we don't do that, then yes, we should factor out helper functions.

Really don't want to deal with MaybeUninit in any way, nor push a new set of writing primitives that would accept it. But we could certainly mention that alternatives include solutions that would define currently undefined behavior to allow reading uninitialized memory or padding bytes.

(Given CAD97's observation below, tuples wouldn't actually have defined layout, but I'll address the other points.)

We believe that going from bool to (for instance) #[repr(transparent)] struct B(bool), or [bool; 1], should work with the compiler optimizing away all the runtime checks. We're working on confirming that, and the RFC will definitely include demonstrations of that.

We're absolutely open to considering alternative formulations, especially formulations that allow the type system to help distinguish fallible and infallible conversions. I'd like to be able to convert [u8; 4] to u32 and get back a u32 rather than a Result<u32, FromBytesError>. But how can we encode the necessary alignment requirement on the [u8; 4] to allow that?

As for going from [u8] to bool, the point of this proposal is to define a safe transmute operation. Going from [u8] to bool without checking the values would be an unsafe transmute operation; we already have that, as std::mem::transmute. Unsafe helpers for more ergonomic unsafe transmutes would be an entirely different proposal.

No objection to using UTF-32 in place of UCS-4 here.

I wasn't aware of that! In that case, we should drop implementations for tuples (though we can still allow deriving them for tuple structs that use repr(C)).

That also means:

(2) no longer applies here, which removes a substantial amount of complexity.

Yes, much like the error you get if you try to derive(Debug) for a type whose fields don't implement Debug.

Per other discussion in this thread, I'd like to prevent the possibility of implementing those traits manually, rather than deriving them.

I don't think they need to be unsafe traits, if the compiler enforces that you can only derive them for types that meet the requirements. (Of course, if they can only be derived and never manually implemented, it doesn't really matter if they're unsafe traits or not since the only impls will get generated by the derive.)

Sure. Or unless we come up with a way to use const generics in the standard library to define such traits before they're stable. But either way, this would use the same mechanisms as other traits for [T; N], yes.

That alone wouldn't preclude implementing it as a fixed 4-byte UTF-8 buffer, which would have advantages and disadvantages.

Per CAD97, we need to just drop the impls for tuples, so we don't need to worry about either this or the magic no-padding requirement.

I wouldn't phrase it as "different results for a given input", but it certainly seems fine to explicitly point out that they return references to bytes in memory and thus to the in-memory representation in native endianness.

See the second post in this thread.

We mentioned it as an alternative for completeness. We don't plan to go that route, it just seemed important to document as a potential alternative.

gnzlbg · December 5, 2019, 7:43pm

We believe that going from bool to (for instance) #[repr(transparent)] struct B(bool) , or [bool; 1] , should work with the compiler optimizing away all the runtime checks. We're working on confirming that, and the RFC will definitely include demonstrations of that.

Even if this happens to work, it would need to be a guaranteed compiler optimization, because this is a performance oriented feature (if you don't need this optimization, you can just copy all fields manually, which is already safe).

As for going from [u8] to bool , the point of this proposal is to define a safe transmute operation. Going from [u8] to bool without checking the values would be an unsafe transmute operation; we already have that, as std::mem::transmute . Unsafe helpers for more ergonomic unsafe transmutes would be an entirely different proposal.

According to your proposal, going from bool -> Bool would still require going from bool to [u8] and then from [u8] to Bool because the FromBytes and ToBytes traits go through u8s. This is not only unergonomic (requiring two API calls) but also unsound, since it exhibits undefined behavior.

But how can we encode the necessary alignment requirement on the [u8; 4] to allow that?

The Compatible<T> proposal solves this problem by using ! as the error type in this conversion.

SimonSapin · December 5, 2019, 8:19pm

When I read the signature of cast_mut it seemed obvious to me that FromBytes would have an additional method:

trait FromBytes {
   fn from_bytes(bytes: &[u8]) -> Result<&Self, FromBytesError>;
   fn from_bytes_mut(bytes: &mut [u8]) -> Result<&mut Self, FromBytesError>;
}

I assumed this was the intent, and that it missing from FromBytes was an oversight.

But from_bytes_mut is neither of the two solutions you’re considering. Did you reject that option? Why?

josh · December 5, 2019, 8:21pm

See "Update 2" in that second post. The sample code there includes from_bytes_mut.

SimonSapin · December 5, 2019, 8:23pm

Doesn’t that make separate validation unnecessary? Or is it only to help manual impls avoid duplicating some logic? If so adding a trait seems overkill, those impls can refactor the common logic in a private free function by themselves.

gbutler · December 5, 2019, 8:27pm

Isn't that called a "Blanket Implementation" currently? I'm nearly positive I've seen that terminology WRTT.

josh · December 5, 2019, 8:28pm

I wasn't clear on if "blanket impl" meant impl<T: OtherTrait> SomeTrait for T or impl<T> SomeTrait for T.

josh · December 5, 2019, 10:16pm

I don't think I understand what you mean by this. from_bytes and from_bytes_mut must validate before transmuting, for a non-byte-complete type.

It's not just "refactoring the common logic", it's also trying to make this solution safer, such as by preventing incorrect implementations of FromBytes entirely. At the moment, the solution we're leaning towards would allow you to implement ValidateBytes for non-byte-complete types, and then would not allow you to implement FromBytes manually at all; you either get the implementation for types that implement ValidateBytes or the implementation for types that implement FromAnyBytes.

SimonSapin · December 5, 2019, 10:28pm

Separate was the key word. By that I mean having a ValidateBytes trait at all (or even just a validate_bytes method in the FromBytes trait), rather than having from_bytes and from_bytes_mut methods in impls do validation.

But I see now. Having those manual impls not need to repeat the transmutes is interesting.

Declaring it unidiomatic to manually implement some traits sounds fine, but I don’t see the point of adding language and compiler special cases to actually enforce that rule. Making them unsafe traits seems enough to signal that implementers doing it anyway take their responsibilities.

Topic		Replies	Views
[Pre-RFC]: Safe Transmute language design	57	5191	March 4, 2020
[Pre-RFC] Safer Transmutation language design	38	6294	November 30, 2020
Safe transmute for transparent struct language design	15	952	March 27, 2024
Safe conversions for DSTs	20	1996	March 25, 2019
The special warning against transmute(&T)-> &mut T documentation	5	4716	March 25, 2019