ATTENTION: This has been superseded by version 2 of the proposal.
-------------------------------------------------------------------------------------------
I've been working with Josh Triplett on a design for safe transmute. Please let us know what you think.
Safe(r) Transmute
Transmuting a buffer of bytes to a type and vice versa in Rust is extremely dangerous so much so that the docs for std::mem::transmute
are essentially a long list of how to avoid doing so. However, transmuting is sometimes necessary. For instance, in extremely performance-sensitive use cases, it may be necessary to transmute from bytes instead of explicitly deserializing and copy bytes from a buffer into a struct.
Causes of Unsafety and Undefined Behavior (UB)
At the core of understanding the safety properties of transmutation is understanding Rust's layout properties (i.e., how Rust represents types in memory). The best resource I've found for understanding this is Alexis Beingessner's blog post on the matter.
The following are the reasons that transmutation from a buffer of bytes is generally unsafe:
- Wrong Size: A buffer of bytes might not contain the correct number of bytes to encode a given type. Referring to uninitialized fields of a struct is UB. Of course, this assumes that the size of a given type is known ahead of time which is not always the case.
-
Illegal Representations: Safe transmutation of a slice of bytes to a type
T
is only possible if every possible value of those bytes corresponds to a valid value of typeT
. For example, this property doesn't hold for bool or for most enums. Whilesize_of::<bool>() == 1
, abool
can only legally be either0b1
or0b0
- transmuting0b10
tobool
is UB. -
Non-Deterministic Layout: Certain types might not have a deterministic layout in memory. The Rust compiler is allowed to rearrange the layout of any type that does not have a well defined layout associated with it. Explicitly setting the layout of a type is done through
#[repr(..)]
attributes. To be deterministic, both the order of fields of a complex type as well as the exact value of their offsets from the beginning of the type must be well known. This is generally only possible by marking a complex type#[repr(C)]
and recursively ensuring that all fields of the struct are composed of types with deterministic layout. -
Alignment: Types must be "well-aligned" meaning that where they are in memory falls on a certain memory address interval (usually some power of 2). For example the alignment of
u32
is 4 meaning that a validu32
must always start at a memory address evenly divisible by 4. Transmuting a slice of bytes to a typeT
that does not have proper alignment for typeT
is UB.
Transmuting from a type T
to a slice of bytes can also be unsafe or cause UB:
- Padding: Since padding bytes (i.e., bytes internally inserted to ensure all elements of a complex type have proper alignment) are not initialized, viewing them is UB.
-
Non-Deterministic Layout: The same issue for transmuting from bytes to type
T
apply when going the other direction.
Suggested Improvements
Introduce a marker trait for safely transmutable types.
We first introduce the trait Transmutable
(name subject to bike-shedding) that represents any type where all properly aligned and sized byte patterns are legal (from here on referred to as "byte complete" types)
All core types that are byte complete implement Transmutable
. This includes u8
and usize
but do not include basic types like bool
that need further validation before being safely transmuted. Transmutable
can be safely opted into using #[derive(Transmutable)]
as long as they are only recursively composed of Transmutable
types, they have a deterministic layout (i.e., they are repr(C)
), and they contain no padding bytes. The compiler will return an error when the type does not fit one of the necessary conditions for being Transmutable
.
The following should be noted:
- A struct that requires internal padding can become a struct that can
derive(Transmutable)
by explicitly including padding fields. - Manual
impl Transmutable
is not allowed. - The user must opt into a complex type being
Transmutable
because this has implications on the public API of the type. Adding a new non-Transmutable
private field to a type and thus making it non-Transmutable
itself is a breaking change. - While deriving
Transmutable
for[T; N]
whereT
is itselfTransmutable
is theoretically possible, this is left to future work.
The following types should automatically be marked as Transmutable
:
-
u8
,u16
,u32
,u64
,u128
,usize
,i8
,i16
,i32
,i64
,i128
,isize
,f32
,f64
,()
, all SIMD types that are byte-complete, and[T; N]
for all of those types (but not for arbitrary Transmutable types).
Introduce trait for types that be transformed to/from bytes
Next, we introduce a trait called ToFromBytes
(name subject to bike-shedding).
This trait represents a type that can go to and from bytes in a way that may fail. All Transmutable
types would implement this trait. (note: FromBytesError
is explained in the following sections).
trait ToFromBytes {
fn to_bytes(&self) -> &[u8];
fn from_bytes(bytes: &[u8]) -> Result<&Self, FromBytesError>;
}
impl<T: Transmutable> ToFromBytes for T {
fn from_bytes(bytes: &[u8]) -> Result<&Self, FromBytesError> {
if bytes.len() < size_of::<Self>() {
return Err(FromBytesError::InsufficientBytes);
}
if bytes.as_ptr().align_offset(align_of::<Self>()) != 0 {
return Err(FromBytesError::InsufficientAlignment);
}
Ok(unsafe { std::mem::transmute<*const u8, &Self>(bytes.as_ptr()) })
}
fn to_bytes(&self) -> &[u8] {
let pointer = self as *const Self as *const u8;
unsafe {
std::slice::from_raw_parts(pointer, size_of::<Self>())
}
}
}
Users can implement ToFromBytes
for their own types as well. The standard library will implement this forbool
:
impl ToFromBytes for bool {
fn from_bytes(bytes: &[u8]) -> Result<&Self, FromBytesError> {
match bytes.get(0) {
Some(b) if b == 1 || b == 0 => Ok(unsafe { std::mem::transmute<*const u8, &bool>(bytes.as_ptr()) }),
Some(_) => Err(FromBytesError::InvalidValue),
None => Err(FromBytesError::InsufficientBytes),
}
}
fn to_bytes(&self) -> &[u8] {
let pointer = self as *const Self as *const u8;
unsafe {
std::slice::from_raw_parts(pointer, size_of::<Self>())
}
}
}
The following should be noted:
- While the above
to_bytes
implementation is applicable for all types with deterministic layout and no padding, there is no default implementation ofto_bytes
. -
to_bytes
returns a borrowed slice, so even a manual implementation of the trait cannot construct a slice of bytes that does not match the in-memory representation of the structure. In particular, this means a type with internal padding bytes cannot implement ToFromBytes. This would require a trait that either constructs an owned (or Cow) slice, or a trait that writes bytes to a mutable slice supplied as a parameter. This pre-RFC does not attempt to specify any such trait, leaving it to future work. - In the case where the slice contains more than the number of bytes required to represent the type, the extra bytes are simply ignored.
- When implementing
ToFromBytes
,from_bytes
should processsize_of::<T>()
bytes and return an error if supplied with less andto_bytes
should return a slice of exactlysize_of::<T>()
in length. These APIs should also uphold thatValue::from_bytes(value.to_bytes()) == value
.
Introduce a type representing errors when transmuting from bytes
Next, we introduce a FromBytesError
(name subject to bike-shedding) which represents the types of errors that can occur when transmuting from bytes to a concrete type.
#[non_exhaustive]
#[derive(Debug)]
enum FromBytesError {
InsufficientAlignment,
InsufficientBytes,
InvalidValue
}
impl Error for FromBytesError { ... }
impl Display for FromBytesError { ... }
Question: ShouldFromBytesError
contain specific information on where the errors occurred? For instance should FromBytesError::InsufficientBytes
include the number of bytes required and the number given?
Safe Unions
Lastly, unions which are composed purely of Transmutable
types will allow safe access to their fields since writing to and reading from the union is well defined no matter how one interprets it.
Question: Can we safely allow access to union fields if every field is Transmutable but the fields have different sizes? Is it possible, in safe code, to end up with a union that only has the bytes of a shorter field initialized and has uninitialized data in the remainder?
Acknowledgments
- Shout out to the following crates for paving the way with many good ideas: