- Feature Name: arbitrary_bytes_safe_trait
- Start Date: (fill me in with today’s date, YYYY-MM-DD)
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)
Summary
Introduce an unsafe marker trait that guarantees that any byte sequence of size_of::<T>() bytes is a valid instance of T, and a derive for that trait.
Motivation
In deserializing input from untrusted sources (IPC, disk, network, etc), it is sometimes desirable to be able to interpret a sequence of bytes as a particular type. In general, this is unsafe, but for a large subset of types, it is safe. Currently, code that wishes to do this needs to be unsafe. It would be a big ergonomics win if the users and authors of deserialization APIs could deserialize such types without needing to reason about memory safety themselves.
Guide-level Explanation
Introduce a marker trait, pub unsafe trait ArbitraryBytesSafe {} (name to be bikeshedded) that indicates that any sequence of size_of::<T>() bytes is a valid instance of T.
Provide four (safe!) associated functions with default impsl on ArbitraryBytesSafe:
fn transmute_ref<T>(t: &T) -> &Selffn transmute_mut<T: ArbitraryBytesSafe>(t: &mut T) -> &mut Selffn transmute_slice_ref<T>(slc: &[T]) -> &Selffn transmute_slice_mut<T: ArbitraryBytesSafe>(slc: &mut [T]) -> &mut Self
The first two functions ensure at compile time that size_of::<T>() == size_of::<Self>(). The second two functions check at runtime that slc.len() * size_of::<T>() == size_of::<Self>().
Provide a custom derive for ArbitraryBytesSafe.
Reference-level Explanation
TODO: Explain how transmute_XXX is implemented.
Note that, when obtaining an immutable reference, T (the source type) doesn’t need to be ArbitraryBytesSafe because it isn’t mutated. When obtaining a mutable reference, it does need to be ArbitraryBytesSafe because there’s no guarantee that valid instances of Self correspond to the bits of valid instances of T without this trait bound.
The custom derive uses the following rules to determine whether a type is ArbitraryBytesSafe:
- The primitives
uXXXandiXXXare safe, includingusizeandisize. - Arrays are safe if their element types are safe.
- Structs and tuple structs are safe if all of their fields are safe and if it is guaranteed that there is no internal padding. This is true if
repr(packed)is used or ifrepr(C)is used and the alignment of all fields is met without needing any padding (e.g., if au32follows twou16s). For tuple structs, this is also true ifrepr(transparent)is used. Anonymous tuples are never safe because there is no way to specify arepron them. - Enums are safe if they are C-like, have either
repr(C)orrepr(i*)/repr(u*), and if every possible discriminant value corresponds to a variant.
Rationale and Alternatives
This API allows for clean, safe serialization APIs (that don’t require unsafe impls internally) such as:
- For reading from a stream,
fn read<T: ArbitraryBytesSafe>(&mut self) -> T - For zero-copy parsing of input,
fn parse<'a>(input: &'a [u8]) -> Result<Parsed<'a>, ParseErr>
It also has uses beyond deserializing untrusted input such as:
- Generating random instances of things for testing
- Debugging/inspecting arbitrary regions of program memory
Alternatives:
- Do not have
ArbitraryBytesSafe::transmute_XXX, and simply allow unsafe code to useArbitraryBytesSafeas a marker trait that provides it the guarantees it needs to do unsafe things such as create an uninitializedTand then copy bytes into it manually. Downside: you lose the guarantee that the sizes match. - Instead of a custom derive, use a normal macro that is placed around the type definition and can decide whether or not to emit
unsafe impl ArbitraryBytesSafe for <type> {}. This is less ergonomic, but also easier to implement as a first pass.
Prior Art
Thanks to @dtolnay for pointing out the Pod trait, which is similar to ArbitraryBytesSafe, although it performs more checks at runtime (in particular, size and alignment checking). It also provides a large number of utility functions available to Pod types.
The recent FromBits/IntoBits pre-RFC dicusses a different but related case - conversions between types in which, while arbitrary bit patterns may not be safe, all of the valid bit patterns of one type correspond to valid bit patterns of the other type, and so the conversion is nonetheless safe.
Unresolved Questions
- How do the
transmute_XXXfunctions guarantee that the alignment of their inputs are at least as large as the alignment requirements ofSelf? Some options might be:- At compile time, check not only that the sizes match, but also that the alignment of
Tis at least as large as the alignment ofSelf - Have
transmute_XXXcheck at runtime. The user will be encouraged to check themselves, in which case the second check (insidetransmute_XXX) will be redundant and optimized out. - Introduce
transmute_XXX_alignedwhich guarantees thatalign_of::<T>() >= align_of::<Self>()at compile time, whiletransmute_XXXchecks at runtime - The same as the previous bullet, except that
transmute_XXXare unsafe, and the documentation states that the only precondition is that alignments are guaranteed by the caller - Require that only types with an alignment of 1 can be
ArbitraryBytesSafe. - Provide a
transmute<T>(t: &T) -> Selfthat returnsSelfby value, and so has no alignment requirements
- At compile time, check not only that the sizes match, but also that the alignment of
- Should the requirement be that the size of the input is the same size as
Self, or just that it’s no smaller thanSelf? In other words, is it valid to transmute a reference to an object ofnbytes to a reference to an object ofm < nbytes? - Should we add a restriction that
ArbitraryBytesSafetypes have trivial drops? My intuition is that this is not needed. - Should
transmute_XXXbe associated functions ofArbitraryBytesSafeor instead a library functions (or something else entirely?) - Does
transmuteimply unsafety (sincemem::transmuteis the canonical unsafe function)? Is there a better word to use?