- Feature Name: arbitrary_bytes_safe_trait
- Start Date: (fill me in with today’s date, YYYY-MM-DD)
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)
Summary
Introduce an unsafe marker trait that guarantees that any byte sequence of size_of::<T>()
bytes is a valid instance of T
, and a derive for that trait.
Motivation
In deserializing input from untrusted sources (IPC, disk, network, etc), it is sometimes desirable to be able to interpret a sequence of bytes as a particular type. In general, this is unsafe, but for a large subset of types, it is safe. Currently, code that wishes to do this needs to be unsafe. It would be a big ergonomics win if the users and authors of deserialization APIs could deserialize such types without needing to reason about memory safety themselves.
Guide-level Explanation
Introduce a marker trait, pub unsafe trait ArbitraryBytesSafe {}
(name to be bikeshedded) that indicates that any sequence of size_of::<T>()
bytes is a valid instance of T
.
Provide four (safe!) associated functions with default impsl on ArbitraryBytesSafe
:
fn transmute_ref<T>(t: &T) -> &Self
fn transmute_mut<T: ArbitraryBytesSafe>(t: &mut T) -> &mut Self
fn transmute_slice_ref<T>(slc: &[T]) -> &Self
fn transmute_slice_mut<T: ArbitraryBytesSafe>(slc: &mut [T]) -> &mut Self
The first two functions ensure at compile time that size_of::<T>() == size_of::<Self>()
. The second two functions check at runtime that slc.len() * size_of::<T>() == size_of::<Self>()
.
Provide a custom derive for ArbitraryBytesSafe
.
Reference-level Explanation
TODO: Explain how transmute_XXX
is implemented.
Note that, when obtaining an immutable reference, T
(the source type) doesn’t need to be ArbitraryBytesSafe
because it isn’t mutated. When obtaining a mutable reference, it does need to be ArbitraryBytesSafe
because there’s no guarantee that valid instances of Self
correspond to the bits of valid instances of T
without this trait bound.
The custom derive uses the following rules to determine whether a type is ArbitraryBytesSafe
:
- The primitives
uXXX
andiXXX
are safe, includingusize
andisize
. - Arrays are safe if their element types are safe.
- Structs and tuple structs are safe if all of their fields are safe and if it is guaranteed that there is no internal padding. This is true if
repr(packed)
is used or ifrepr(C)
is used and the alignment of all fields is met without needing any padding (e.g., if au32
follows twou16
s). For tuple structs, this is also true ifrepr(transparent)
is used. Anonymous tuples are never safe because there is no way to specify arepr
on them. - Enums are safe if they are C-like, have either
repr(C)
orrepr(i*)
/repr(u*)
, and if every possible discriminant value corresponds to a variant.
Rationale and Alternatives
This API allows for clean, safe serialization APIs (that don’t require unsafe impls internally) such as:
- For reading from a stream,
fn read<T: ArbitraryBytesSafe>(&mut self) -> T
- For zero-copy parsing of input,
fn parse<'a>(input: &'a [u8]) -> Result<Parsed<'a>, ParseErr>
It also has uses beyond deserializing untrusted input such as:
- Generating random instances of things for testing
- Debugging/inspecting arbitrary regions of program memory
Alternatives:
- Do not have
ArbitraryBytesSafe::transmute_XXX
, and simply allow unsafe code to useArbitraryBytesSafe
as a marker trait that provides it the guarantees it needs to do unsafe things such as create an uninitializedT
and then copy bytes into it manually. Downside: you lose the guarantee that the sizes match. - Instead of a custom derive, use a normal macro that is placed around the type definition and can decide whether or not to emit
unsafe impl ArbitraryBytesSafe for <type> {}
. This is less ergonomic, but also easier to implement as a first pass.
Prior Art
Thanks to @dtolnay for pointing out the Pod
trait, which is similar to ArbitraryBytesSafe
, although it performs more checks at runtime (in particular, size and alignment checking). It also provides a large number of utility functions available to Pod
types.
The recent FromBits
/IntoBits
pre-RFC dicusses a different but related case - conversions between types in which, while arbitrary bit patterns may not be safe, all of the valid bit patterns of one type correspond to valid bit patterns of the other type, and so the conversion is nonetheless safe.
Unresolved Questions
- How do the
transmute_XXX
functions guarantee that the alignment of their inputs are at least as large as the alignment requirements ofSelf
? Some options might be:- At compile time, check not only that the sizes match, but also that the alignment of
T
is at least as large as the alignment ofSelf
- Have
transmute_XXX
check at runtime. The user will be encouraged to check themselves, in which case the second check (insidetransmute_XXX
) will be redundant and optimized out. - Introduce
transmute_XXX_aligned
which guarantees thatalign_of::<T>() >= align_of::<Self>()
at compile time, whiletransmute_XXX
checks at runtime - The same as the previous bullet, except that
transmute_XXX
are unsafe, and the documentation states that the only precondition is that alignments are guaranteed by the caller - Require that only types with an alignment of 1 can be
ArbitraryBytesSafe
. - Provide a
transmute<T>(t: &T) -> Self
that returnsSelf
by value, and so has no alignment requirements
- At compile time, check not only that the sizes match, but also that the alignment of
- Should the requirement be that the size of the input is the same size as
Self
, or just that it’s no smaller thanSelf
? In other words, is it valid to transmute a reference to an object ofn
bytes to a reference to an object ofm < n
bytes? - Should we add a restriction that
ArbitraryBytesSafe
types have trivial drops? My intuition is that this is not needed. - Should
transmute_XXX
be associated functions ofArbitraryBytesSafe
or instead a library functions (or something else entirely?) - Does
transmute
imply unsafety (sincemem::transmute
is the canonical unsafe function)? Is there a better word to use?