- Feature Name: arbitrary_bytes_safe_trait
- Start Date: (fill me in with today’s date, YYYY-MM-DD)
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)
Summary
Introduce an unsafe marker trait that guarantees that any byte sequence of size_of::<T>() bytes is a valid instance of T, and a derive for that trait.
Motivation
In deserializing input from untrusted sources (IPC, disk, network, etc), it is sometimes desirable to be able to interpret a sequence of bytes as a particular type. In general, this is unsafe, but for a large subset of types, it is safe. Currently, code that wishes to do this needs to be unsafe. It would be a big ergonomics win if the users and authors of deserialization APIs could deserialize such types without needing to reason about memory safety themselves.
Guide-level Explanation
Introduce a marker trait, pub unsafe trait ArbitraryBytesSafe {} (name to be bikeshedded) that indicates that any sequence of size_of::<T>() bytes is a valid instance of T.
Provide four (safe!) associated functions with default impsl on ArbitraryBytesSafe:
fn transmute_ref<T>(t: &T) -> &Self
fn transmute_mut<T: ArbitraryBytesSafe>(t: &mut T) -> &mut Self
fn transmute_slice_ref<T>(slc: &[T]) -> &Self
fn transmute_slice_mut<T: ArbitraryBytesSafe>(slc: &mut [T]) -> &mut Self
The first two functions ensure at compile time that size_of::<T>() == size_of::<Self>(). The second two functions check at runtime that slc.len() * size_of::<T>() == size_of::<Self>().
Provide a custom derive for ArbitraryBytesSafe.
Reference-level Explanation
TODO: Explain how transmute_XXX is implemented.
Note that, when obtaining an immutable reference, T (the source type) doesn’t need to be ArbitraryBytesSafe because it isn’t mutated. When obtaining a mutable reference, it does need to be ArbitraryBytesSafe because there’s no guarantee that valid instances of Self correspond to the bits of valid instances of T without this trait bound.
The custom derive uses the following rules to determine whether a type is ArbitraryBytesSafe:
- The primitives
uXXX and iXXX are safe, including usize and isize.
- Arrays are safe if their element types are safe.
- Structs and tuple structs are safe if all of their fields are safe and if it is guaranteed that there is no internal padding. This is true if
repr(packed) is used or if repr(C) is used and the alignment of all fields is met without needing any padding (e.g., if a u32 follows two u16s). For tuple structs, this is also true if repr(transparent) is used. Anonymous tuples are never safe because there is no way to specify a repr on them.
- Enums are safe if they are C-like, have either
repr(C) or repr(i*)/repr(u*), and if every possible discriminant value corresponds to a variant.
Rationale and Alternatives
This API allows for clean, safe serialization APIs (that don’t require unsafe impls internally) such as:
- For reading from a stream,
fn read<T: ArbitraryBytesSafe>(&mut self) -> T
- For zero-copy parsing of input,
fn parse<'a>(input: &'a [u8]) -> Result<Parsed<'a>, ParseErr>
It also has uses beyond deserializing untrusted input such as:
- Generating random instances of things for testing
- Debugging/inspecting arbitrary regions of program memory
Alternatives:
- Do not have
ArbitraryBytesSafe::transmute_XXX, and simply allow unsafe code to use ArbitraryBytesSafe as a marker trait that provides it the guarantees it needs to do unsafe things such as create an uninitialized T and then copy bytes into it manually. Downside: you lose the guarantee that the sizes match.
- Instead of a custom derive, use a normal macro that is placed around the type definition and can decide whether or not to emit
unsafe impl ArbitraryBytesSafe for <type> {}. This is less ergonomic, but also easier to implement as a first pass.
Prior Art
Thanks to @dtolnay for pointing out the Pod trait, which is similar to ArbitraryBytesSafe, although it performs more checks at runtime (in particular, size and alignment checking). It also provides a large number of utility functions available to Pod types.
The recent FromBits/IntoBits pre-RFC dicusses a different but related case - conversions between types in which, while arbitrary bit patterns may not be safe, all of the valid bit patterns of one type correspond to valid bit patterns of the other type, and so the conversion is nonetheless safe.
Unresolved Questions
- How do the
transmute_XXX functions guarantee that the alignment of their inputs are at least as large as the alignment requirements of Self? Some options might be:
- At compile time, check not only that the sizes match, but also that the alignment of
T is at least as large as the alignment of Self
- Have
transmute_XXX check at runtime. The user will be encouraged to check themselves, in which case the second check (inside transmute_XXX) will be redundant and optimized out.
- Introduce
transmute_XXX_aligned which guarantees that align_of::<T>() >= align_of::<Self>() at compile time, while transmute_XXX checks at runtime
- The same as the previous bullet, except that
transmute_XXX are unsafe, and the documentation states that the only precondition is that alignments are guaranteed by the caller
- Require that only types with an alignment of 1 can be
ArbitraryBytesSafe.
- Provide a
transmute<T>(t: &T) -> Self that returns Self by value, and so has no alignment requirements
- Should the requirement be that the size of the input is the same size as
Self, or just that it’s no smaller than Self? In other words, is it valid to transmute a reference to an object of n bytes to a reference to an object of m < n bytes?
- Should we add a restriction that
ArbitraryBytesSafe types have trivial drops? My intuition is that this is not needed.
- Should
transmute_XXX be associated functions of ArbitraryBytesSafe or instead a library functions (or something else entirely?)
- Does
transmute imply unsafety (since mem::transmute is the canonical unsafe function)? Is there a better word to use?