Make a way to avoid UB when accessing byte representation of type with padding

Sometimes there is a need to send data between processes without paying the cost of serialization and deserialization. If the data is POD, the cheapest way to do that is to transmute the data to bytes, transfer them, then transmute them back. There are crates that help with this operation, such as bytemuck and zerocopy.

If a struct contains pointers to the heap, this isn't possible for obvious reasons. But another problem arises when the type contains padding. Padding is uninitialized, so constructing a &[u8] pointing to it is UB.

If I'm in full control of the implementation of the transfer, I can solve this issue. For example, I can operate on (*mut u8; usize) instead of slices, and the resulting transfer implementation will be sound when working on types with padding. However, if I want to utilize std or the async ecosystem for network communication, I can no longer work with such types because all the APIs require &[u8].

Note that transmuting a struct with padding to bytes and sending them over the network will work in practice. While this doesn't make UB any less scary, it may lead to projects using this approach for performance reasons because there is no sound way to do the same without paying a performance penalty. And using only types without padding has severe ergonomics downsides: enums with payload are practically unusable, and talking a reference to a packed struct's field is not allowed.

How can this situation be improved? I don't think there is a fundamental reason why this can't be done in a sound way. If the underlying system calls do not actually require bytes to be initialized, it could make sense to expose unsafe pointer-based methods on high-level interfaces. Unfortunately, in this case working with wrappers such as BufWriter or tokio's Framed would still be unsound unless the pointer APIs are propagated all the way.

Another way to solve this would be to add a zero-cost way to freeze the bytes so that they are arbitrary but not undefined. I found this related discussion and this PR, but the current state of this is unclear.

1 Like

AFAIK this is just one of many use cases for the "safe transmute project" that's been set up recently.

1 Like

It seems that project is dead. Set up 8 months ago, and it's just a blank template - links in the readme are either broken or have no content.

Oh, there's an RFC the readme didn't mention.

There has been activity on the zulip forum recently.

(FWIW, the type &'_ mut? [MaybeUninit<u8>] can also be used, and has the advantage of having a lifetime attached and of being non-NULL)

1 Like

You can also use *mut [u8], though that type is currently not terribly well-supported by the standard library.

Independent of the Rust type system, one question that I do not know the answer to is whether OS APIs such as write permit uninitialized data. They usually don't talk about that... (I recall a UCG discussion along those lines but cannot find it right now.)