This is an updated version of the proposal originally discussed at [Pre-RFC]: Safe Transmute , incorporating extensive feedback from that thread, and further design work from both Ryan Levick and myself. Thanks to everyone who contributed in that thread!
Safe(r) Transmute
- Authors: Ryan Levick, Josh Triplett
Transmuting one type to another type and vice versa in Rust is extremely dangerous---so much so that the docs for std::mem::transmute are essentially a long list of how to avoid doing so. However, transmuting is sometimes necessary. For instance, in extremely performance-sensitive use cases, it may be necessary to transmute from bytes instead of explicitly deserializing and copying bytes from a buffer into a struct.
Causes of Unsafety and Undefined Behavior (UB)
At the core of understanding the safety properties of transmutation is understanding Rust's layout properties (i.e., how Rust represents types in memory). The best resource I've found for understanding this is Alexis Beingessner's blog post on the matter.
The following are the reasons that transmutation from a buffer of bytes is generally unsafe:
-
Illegal Representations: Safe transmutation of a slice of bytes to a type
T
is only possible if every possible value of those bytes corresponds to a valid value of typeT
. For example, this property doesn't hold forbool
or for mostenum
types. Whilesize_of::<bool>() == 1
, abool
can only legally be either0b1
or0b0
- transmuting0b10
tobool
is UB. - Wrong Size: A buffer of bytes might not contain the correct number of bytes to encode a given type. Referring to uninitialized fields of a struct is UB. Of course, this assumes that the size of a given type is known ahead of time which is not always the case.
-
Alignment: Types must be "well-aligned" meaning that where they are in memory falls on a certain memory address interval (usually some power of 2). For example the alignment of
u32
is 4 meaning that a validu32
must always start at a memory address evenly divisible by 4. Transmuting a slice of bytes to a typeT
that does not have proper alignment for typeT
is UB. -
Non-Deterministic Layout: Certain types might not have a deterministic layout in memory. The Rust compiler is allowed to rearrange the layout of any type that does not have a well defined layout associated with it. Explicitly setting the layout of a type is done through
#[repr(..)]
. To be deterministic, both the order of fields of a complex type as well as the exact value of their offsets from the beginning of the type must be well known. This is generally only possible by marking a complex type#[repr(C)]
and recursively ensuring that all fields of the struct are composed of types with deterministic layout.
Transmuting from a type T
to a slice of bytes can also be unsafe or cause UB:
-
Padding: Since padding bytes (i.e., bytes internally inserted to ensure all elements of a complex type have proper alignment) are not initialized, viewing them is UB. For instance,
(u8, u32)
has 3 bytes of padding to align theu32
. Note that a type may have padding at the end, not just in the middle, to ensure that its size is a multiple of its alignment:(u32, u8)
has 3 bytes of padding at the end to make its size 8, a multiple of the 4-byte alignment required foru32
. -
Non-Deterministic Layout: The same issue for transmuting from bytes to type
T
apply when going the other direction.
Proposed Improvements
Introduce traits for types that can be safely transformed to/from bytes
We first introduce the traits FromAnyBytes
and ToBytes
(names subject to bikeshedding - see below).
-
FromAnyBytes
represents any type where all properly aligned and sized byte patterns are legal (from here on referred to as "byte complete" types), such that any byte slice of the same size can be transmuted into the type in-place without further checking. -
ToBytes
represents any type that can be transmuted into bytes in-place, which in requires that the type must not have any padding.
All core types that are byte-complete implement both FromAnyBytes
and ToBytes
; a full list appears below. Core types like bool
that need further validation before being safely transmuted from bytes only implement ToBytes
. Both traits can be safely opted into either using #[derive(...)]
or impl
blocks as long as:
- They are only recursively composed of
FromAnyBytes
orToBytes
types respectively - They have a deterministic layout (such as types using
repr(C)
orrepr(transparent)
) - For
ToBytes
, they contain no padding bytes.
The compiler will return an error when the type does not fit all of the necessary conditions.
FromAnyBytes
contains no methods and serves as a marker trait; the next section defines a FromBytes
trait with an automatic implementation for types implementing FromAnyBytes
, which allows manual implementation for non-byte-complete types. ToBytes
contains methods (defined in the next section) and implementations of those methods, with the expectation that those implementations will work for all types deriving the trait; those methods should not be manually implemented.
Notes on types implementing FromAnyBytes
and ToBytes
:
- The user must opt into a complex type implementing
FromAnyBytes
andToBytes
, because this has implications on the public API of the type. For instance, changing normally private details of a complex type such as ordering of private fields may become a breaking change. - A struct that requires internal padding can become a struct that can derive
ToBytes
by explicitly defining padding fields. - The following core types will be marked as
FromAnyBytes
andToBytes
:-
u8
,u16
,u32
,u64
,u128
,usize
-
i8
,i16
,i32
,i64
,i128
,isize
-
f32
,f64
()
- all SIMD types that are byte-complete
-
Option
applied to any NonZeroU or NonZeroI type -
Wrapping<T>
for anyT
implementing the corresponding trait -
[T; N]
for anyT
implementing the corresponding trait.- Note that all types guarantee their size is a multiple of their alignment, so a slice
[T; N]
can never contain padding that the typeT
doesn't itself contain.
- Note that all types guarantee their size is a multiple of their alignment, so a slice
-
- The following additional core types will be marked as
ToBytes
only, and will have manual implementations ofFromBytes
(defined in the next section):bool
- any NonZeroU or NonZeroI type
-
char
- Note that this will produce and consume UCS-4 characters, and would require committing to the internal UCS-4 representation of
char
. We could, alternatively, omit the trait implementations forchar
.
- Note that this will produce and consume UCS-4 characters, and would require committing to the internal UCS-4 representation of
- All tuples composed of
FromAnyBytes
types will themselves implementFromAnyBytes
. - All tuples composed of
ToBytes
types without padding can implementToBytes
. (Providing such implementations in the standard library may require compiler assistance.) - C-style
enum
types (with no fields in any variant) marked with#[repr(C)]
or#[repr($INT)]
may deriveToBytes
. - Note that some structs may have "surprise" padding at the end and as such should not implement
ToBytes
. For example:struct MyType(u32, u8)
. - While it is theoretically possible to derive
ToBytes
and/orFromAnyBytes
for generic structs which are generic over types that areToBytes
and/orFromAnyBytes
, this is left to future work. - Transmute deals with in-memory data in-place, and thus does not have any provisions to perform translations between native endianness and non-native endianness.
- There is no way to
unsafe impl
eitherFromAnyBytes
orToBytes
for a type that doesn't meet the requirements. - Raw pointers could potentially implement both
ToBytes
andFromAnyBytes
, and references orOption
of references could potentially implementToBytes
. There may be uses for such implementations, but they also seem potentially error-prone. We propose to evaluate them further and consider such implementations in the future, but to not provide such implementations in the initial version.
Naming
The names for these traits are still subject to bikeshedding. There were several criteria used to select each trait name. First, the names should make their usages recognizable out of context although not necessarily sufficiently clear without prior exposure. It should be clear through the names how the two marker traits contrast with each other as well as the two further traits examined below. The FromAnyBytes
trait should convey that any combination of bytes the same length as size_of<T>()
is a valid representation of type T
in memory. The ToBytes
trait should convey that it is a well-defined operation to view the raw memory representation of the marked type.
Note that the working assumption is that these types will exist in the std::mem
namespace.
Other names that were considered include:
-
FromValidBytes
/AsValidBytes
-
FromValidBytes
/ToValidBytes
-
SafeFromBytes
/SafeToBytes
-
FromBytes
/AsBytes
-
SafeTransmuteFrom
/SafeTransmuteTo
-
FromAnyBytes
/ToBytesInPlace
Introduce traits for safely transmutable types.
Next, we introduce a trait FromBytes
, and the methods for the ToBytes
trait.
FromBytes
represents a type that may be transmuted from a byte array; the type need not be byte-complete (and implement FromAnyBytes
), and the safe transmutation may fail with FromBytesError
(defined in the following section).
trait FromBytes {
fn from_bytes(bytes: &[u8]) -> Result<&Self, FromBytesError>;
}
impl<T: FromAnyBytes> FromBytes for T {
// Inline to allow optimizing away the length and alignment checks.
#[inline]
fn from_bytes(bytes: &[u8]) -> Result<&Self, FromBytesError> {
if bytes.len() < size_of::<Self>() {
return Err(FromBytesError::InsufficientBytes);
}
if bytes.as_ptr().align_offset(align_of::<Self>()) != 0 {
return Err(FromBytesError::InsufficientAlignment);
}
Ok(unsafe { std::mem::transmute<*const u8, &Self>(bytes.as_ptr()) })
}
}
trait ToBytes {
#[inline]
fn to_bytes(&self) -> &[u8] {
let pointer = self as *const Self as *const u8;
unsafe {
std::slice::from_raw_parts(pointer, size_of::<Self>())
}
}
/// Safely cast this type in-place to another type, returning a reference
/// to the same memory.
fn cast<T: FromBytes>(&self) -> Result<&T, FromBytesError> { /*...*/ }
/// Safely cast this type in-place to another type, returning a mutable
/// reference to the same memory. This requires `Self` to satisfy
/// `FromAnyBytes`, because writes through the returned mutable reference
/// will mutate `Self` without validation.
fn cast_mut<T: FromBytes>(&mut self) -> Result<&mut T, FromBytesError>
where Self: FromAnyBytes { /*...*/ }
}
Users can also manually implement FromBytes
for a non-byte-complete type. For instance, the standard library will implement FromBytes
for bool
as follows:
impl FromBytes for bool {
fn from_bytes(bytes: &[u8]) -> Result<&Self, FromBytesError> {
match bytes.get(0) {
Some(b) if b == 1 || b == 0 => Ok(unsafe { std::mem::transmute<*const u8, &Self>(bytes.as_ptr()) }),
Some(_) => Err(FromBytesError::InvalidValue),
None => Err(FromBytesError::InsufficientBytes),
}
}
Notes on manually implementing FromBytes
:
- In the case where the slice passed to
from_bytes
contains more than the number of bytes required to represent the type, the extra bytes should be ignored. This allows converting a slice without first manually re-slicing it to the length of the type. -
from_bytes
should process exactlysize_of::<T>()
bytes, and returnErr(FromBytesError::InsufficientBytes)
if supplied with less. - These APIs should uphold the invariant that
ValueType::from_bytes(value.to_bytes()) == Ok(value)
.
Introduce a type representing errors when safely transmuting from bytes
The FromBytesError
type used above (name subject to bike-shedding) represents the types of errors that can occur when transmuting from bytes to a concrete type:
#[non_exhaustive]
#[derive(Debug, PartialEq, Eq, Copy, Clone)]
enum FromBytesError {
InsufficientAlignment,
InsufficientBytes,
InvalidValue
}
impl Display for FromBytesError { /*...*/ }
impl Error for FromBytesError {}
Note that FromBytesError
intentionally does not contain specific information on the errors, such as the invalid value or the number of bytes required.
Alternatives
FromBytesError
could omit the InsufficientAlignment
and InsufficientBytes
variants, in favor of asserts, if we consider those developer errors. This may be preferable if we expect most such errors to get optimized away, and expect most developers to use .unwrap()
or similar rather than handling these errors. In this case, we could make from_bytes
never error on a type implementing FromAnyBytes
, either by providing separate functions for FromAnyBytes
and FromBytes
(the latter returning Option
), or by giving FromBytes
an associated error type and using type !
as the error for types implementing FromAnyBytes
. This would substantially improve ergonomics for the common case.
Safe Unions
Unions whose fields all implement both FromAnyBytes
and ToBytes
can potentially allow reads of their fields without requiring unsafe
, since writing to one field and reading from another acts as a transmute operation, and these traits make transmutes safe.
However, when a union's fields have differing lengths (referred to here as "unbalanced unions"), initializing a shorter field does not necessarily zero out the remainder of the union. This means initializing a union with a shorter field and then reading a longer field leads to reading from uninitialized memory. To make this well defined, we propose adding a new repr
, #[repr(zero_init)]
, which initializes the remainder of the union to zero when initializing any field. Thus, safe Rust can allow reading fields of unbalanced unions if and only if the union type implements ToBytes
and FromAnyBytes
and is #[repr(zero_init)]
.
Depending on complexity and consensus during the pre-RFC process, we may propose repr(zero_init)
as part of this RFC, or as a separate follow-on RFC. In the latter case, safe reads of union fields would not be part of the initial RFC.
Alternatives
While reading uninitialized memory from an unbalanced union whose fields implement ToBytes
and FromAnyBytes
is rarely the correct thing to do, it could be argued that it is not unsafe
, and thus could be allowed in safe Rust. Therefore, an alternative is possible where we simply allow reads from such unions in safe Rust. This would provide a definition for previously undefined behavior in Rust.
Possible future extension: safe copying casts to support types with padding
We could potentially provide a ToBytesCopy
trait or similar, with methods that support copying into a separate byte slice, or copying into another type. Such a trait could have an automatic implementation for any type implementing ToBytes
, but could additionally support manual implementations for types that have padding. Such manual implementations could then copy the fields and zero the padding.
The initial version of this proposal does not define such a trait.
Acknowledgments
Shout out to the following crates for paving the way with many good ideas: