[Pre-RFC v2] Safe Transmute

To add more relevant cases:

  • There might be a safety invariant because the type is some form of proof token

Sounds like auto traits, but presumably that's not it; would be good to reword this.

Should, or cannot? (Compiler checked?)

Not defined anywhere?

It would be good to segregate exactly what the language and the library component of this proposal is so that it becomes easier to see what requires compiler support and what does not.

No need to mention this; we can just benchmark.

Why not:

match bytes {
    [0, ..] => Ok(&false),
    [1, ..] => Ok(&true),
    [_, ..] => Err(FromBytesError::InvalidValue),
    [] => Err(FromBytesError::InsufficientBytes),
}

(Does it have to point into the byte slice? If so why? And is that a guarantee?)

Using ! seems like a good idea to gain some more type safety with infallible unwraps.

Would prefer separating the proposals as safe unions may be substantially more invasive in typeck/borrowck/operational-semantics and so that we don't need to block safe transmutes on this.

This doesn't seem sound to me. It seems to be relying on having a correct implementation of from_bytes to legitimize a transmute on Ok(...). However, the trait is not unsafe.

Seems like it's making the same assumption as above. Give a bad impl ValidateBytes for bool and trigger UB in from_bytes now.

Must read re. validation:

https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/

1 Like

If I understand what you mean by that, that sounds like a specific instance of "you shouldn't derive FromAnyBytes or implement FromBytes because you don't want to let people construct the type at all except in specific ways". That's one of many reasons we expect types to explicitly declare those traits rather than implicitly having the compiler provide them for all types that could qualify.

Sure. What term would you use in prose to describe things like impl<T: SomeTrait> OtherTrait for T?

"cannot" if there's a supported way to have a "derive-only, no manual implementations" trait. "should not" otherwise.

Is there a way to make a trait that you can derive but you cannot manually implement?

True. I think we can just replace every instance of it with either "aggregate type" or just "type".

The only language/compiler support components are 1) enforcing the requirements to derive the traits, and 2) defining ToBytes only for tuples that don't have padding, if we do so.

How does that make documentation unnecessary or undesirable?

The goal there is to make sure, for instance, that casting u32 to [u16; 2] requires no runtime checks.

For the mutable version, it has to. For the non-mutable version, technically you don't have to, but it seems wasteful not to.

I think at this point I'm leaning slightly more towards having from_bytes and try_from_bytes, the former infallible (returning &T) and the latter falliable (returning Result).

I don't object to separating the proposals, but can you elaborate on why you think unions would be substantially more invasive? Does the #[zero_init] proposal together with fields that are FromAnyBytes and ToBytes not suffice to make union field reads safe?

A few answers to this:

First, I do absolutely favor encoding requirements in the type system whenever reasonably possible. If const generics and const functions become sufficiently powerful to do so, I'd love to express the size requirement in the type system (can only convert &[u8; sizeof::<T>()] to &T for instance) And if you can find a way to encode "sufficiently aligned byte slice" safely in the type system, I'm all for it. But in the absence of that, we have to check for size and alignment and then transmute. Given that, it feels less error-prone to me to only implement the size and alignment checks in one place, rather than expecting any manual implementation of FromBytes to manually reimplement the size and alignment checks correctly themselves.

Second and separately, without necessarily encoding this in the type system, can you give a concrete example of how you'd suggest "parsing" bytes into bool or NonZeroU32 and providing a function from bytes to that type that duplicates as little as possible between such functions? How would you suggest eliminating the error-prone boilerplate for "is it long enough" and "is it aligned enough", as well as the actual transmute, from user-written code like trait impls?

The proposal has definitely taken prior art into consideration, especially from the two crates listed at the bottom zerocopy (which is authored by Joshua) and safe_transmute. We can definitely add a section making this more explicit.

As for the Compatible<T> proposal - it seems to my eyes that that proposal is compatible (no pun intended :wink:) with our proposal. The differences being that we split the idea of types that are representable as bytes in a well defined way and types where you can take any appropriately aligned and sized slice of bytes and view it as that type. Ultimately this is a good idea to better capture types that are only one or the other (e.g., bool which can be represented as a byte, but not every byte is a valid bool, structs with padding are the opposite).

This allows one to have more fine grained control over what type of guarantees transmuting needs. For instance, you can transmute from a type like bool to another type. If we didn't split them and required the type be both formable from any byte pattern AND viewable as bytes, then types like bool or structs with padding would be restricted since they only meet one requirement.

We also want to allow transmuting references (both mutable and immutable) of a given type to a reference of another type. These have different restrictions on them than transmuting owned types. If we didn't have the more fine grained view of the guarantees a type has, we wouldn't be able to distinguish types that support different types of casting.

If you have specifics on where the proposal falls short or differs from the Compatible proposal in a specific, we'd love to see them so we can address them directly.

1 Like

Yep, that's right. To take an extreme variant of this, if you can make a bad Id then you can brick the type system.

Right; would be good to add to your list. :slight_smile:

Well so there are two aspects here (it's like Copy):

  • Syntactic: You can #[derive(...)]
  • Semantic: There are restrictions (including when manually written out)

So perhaps: "derivable implementation with restrictions"?

You can have the compiler use #[allow_internal_unstable(...)] in the expansion to refer to a perma-unstable trait that you depend on to make sure only the expansion can implement it (see StructuralEq in std::marker - Rust and cc @pnkfelix & @petrochenkov).

Right, but spelling out how the requirements are enforced (perhaps with an experimental prototype implementation in the coherence checking code -- see link in previous thread) would help to clarify if new lang items need to be added and exactly what the algorithm is for the requirements.

(I'm just saying it is unnecessary in the RFC text because it's not a very interesting detail for users, and it could also change depending on how LLVM develops.)

Alright, So if there's no guarantee then we can iterate based on perf and benchmarks, but it would be good to state this explicitly that this is not a requirement (and may not be depended upon conversely).

In my view, TryFrom is good precedent here for using !. Moreover, it seems useful to state, as a bound, that the conversion cannot fail.

Invasive in terms of compiler implementation just to check those things and to then give operational semantics to repr(zero_init). It's less clear to me how and where exactly the tweaks to the compiler need to be done.

Not sure const generics is mature enough to experiment with this yet (it really makes things easier when prototyping).

The way you encoded from_bytes in impl<T: ValidateBytes> FromBytes for T { seems like it has reusable components so those could be extracted to unsafe functions + given elaborate safety docs. That should reduce boilerplate and make it less error prone.

Hi, "alternatives" could mention

trait ToBytes {
    // Allows transmuting in presence of padding
    // Would only be useful if there were library methods taking
    // &[MaybeUninit<u8>]: write, memcpy ...
    fn to_bytes(&self) -> &[MaybeUninit<u8>] ...
}
1 Like

No doubt.

If you have specifics on where the proposal falls short or differs from the Compatible proposal in a specific, we'd love to see them so we can address them directly.

For example (one of probably many), with this proposal one can't perform a zero-cost safe transmute from bool to (bool,), #[repr(transparent)] struct B(bool);, #[repr(C)] struct B(bool);, etc. One can go from bool to [u8], but going from [u8] to, e.g., (bool,) would require run-time checks. This "goal" or "constraint" is mentioned in previous RFC, but not covered by this feature. The "future possibilities" section does not show how to extend this feature in a backward compatible way to support that, and I don't think it would be possible.

I could share the notes I made of the proposal if you want, but fixing the nitpicks won't make the design direction satisfy that particular constraint.

It might well be that this constraint is not worth satisfying, but if that's the intent the proposal should argue why it isn't worth satisfying.

Minor notes:

  • UCS-4 is a deprecated term at best; I'm not sure if it even was ever defined. The correct term would be UTF-32 or just mentioning that char represents a codepoint and not some encoding of the codepoint.
  • IIRC, tuples do not have a guaranteed defined order yet. And I don't think we want to guarantee that tuples are laid out in source order, either, because we want to be able to size optimize them just like named types. (This is one of the problems with trying to add tuple airity abstractions; a cons-list encoding breaks this permission.) So, unfortunately, tuples aren't able to be FromAnyBytes/ToBytes in any case.

Some reactions to this proposals. Sorry if the point-by-point style is hard to read! There’s some overlap with other comments.

Do you mean the derive(...) macro will emit errors? I assume there is no such checking when the trait is implemented manually. Or should the trait be "magical" through special treatment in the language and compiler?

So it needs to be an unsafe trait. I didn’t find this specified in the rest of the proposal.

I assume this means methods with a default impl in the trait. So also an unsafe trait, since those defaults use unsafe {} and assume some properties of Self.

… for sizes N up to 32, until const generics are stabilized, like other standard library impls for arrays.

Sounds fine to me. (Though I would call it "code point value as u32" rather than UCS-4.) char is already documented to represent a Unicode scalar value and be four bytes in size, and implements Into<u32> and TryFrom<u32>.

This does however expose native endianness, which perhaps less obvious as it is for integer types?

… for arity up to 12, like other standard library impls for tuples.

From the point of view of the standard library this would definitely require a magic HasNoPadding unsafe trait that’s automatically implemented by relevant types (including tuples). This seems a bit esoteric, I don’t know if the language team could be convinced to add this.

I think that at least the documentation for these now APIs need to call out very prominently that they return different results for a given input based on the target CPU’s native endianness.

Existing APIs like u32::to_ne_bytes do so in the method’s name. Maybe this would be appropriate here as well?

What does that mean? What happens if a crate write such an impl?

If it’s already an unsafe impl, I don’t think further enforcement to prevent non-derived impls is useful. And it seems rather tricky to implement.

This seems to assume a FromBytes::from_bytes_mut method, which is missing in the proposed trait definition of FromBytes.

This seems like a significant departure from current language rules, that probably would need its own RFC. It may also be difficult to achieve with LLVM.

Will do; we can add a note that types shouldn't implement FromAnyBytes if they only want to allow construction via specific interfaces.

That's not what I was referring to. I said "automatic implementation" in the RFC to refer to cases where we implement a trait for every type that implements another trait. We can use a different term, but what term would you suggest for that, specifically?

Sounds reasonable to me. That would prevent manual implementation of ToBytes, and if we switch to the ValidateBytes approach then we'll also want to prevent manual implementation of FromBytes.

It's important to users that these conversions have minimal overhead; that has come up in discussions of this multiple times. We'll want to accompany the RFC with demonstrations that the compiler can completely optimize away the checks.

If we switch to the ValidateBytes approach, it'll no longer be possible for from_bytes to return a different reference (e.g. to static values).

If we don't, then how about something like "It is technically possible for a manual from_bytes implementation to return a reference to a static value rather than to the slice; doing so will not break any guarantees, but seems unlikely to provide any benefit."?

Returning types like Result<&T, !> seems likely to just lead to extensive use of into_ok. If we can statically make a conversion infallable, let's just return &T directly.

I'd still be somewhat interested in proposals that would allow statically encoding the concept of "sufficiently aligned slice of bytes", even if they're not feasible with our current type system.

We're seriously considering just making ValidateBytes an unsafe trait instead, which would mean you'd need to write an unsafe implementation for any non-byte-complete types you want to convert. (I don't expect that to come up nearly as often as the derive case.) Then the error-prone boilerplate becomes an internal detail of a sealed FromBytes trait.

But if we don't do that, then yes, we should factor out helper functions.

Really don't want to deal with MaybeUninit in any way, nor push a new set of writing primitives that would accept it. But we could certainly mention that alternatives include solutions that would define currently undefined behavior to allow reading uninitialized memory or padding bytes.

(Given CAD97's observation below, tuples wouldn't actually have defined layout, but I'll address the other points.)

We believe that going from bool to (for instance) #[repr(transparent)] struct B(bool), or [bool; 1], should work with the compiler optimizing away all the runtime checks. We're working on confirming that, and the RFC will definitely include demonstrations of that.

We're absolutely open to considering alternative formulations, especially formulations that allow the type system to help distinguish fallible and infallible conversions. I'd like to be able to convert [u8; 4] to u32 and get back a u32 rather than a Result<u32, FromBytesError>. But how can we encode the necessary alignment requirement on the [u8; 4] to allow that?

As for going from [u8] to bool, the point of this proposal is to define a safe transmute operation. Going from [u8] to bool without checking the values would be an unsafe transmute operation; we already have that, as std::mem::transmute. Unsafe helpers for more ergonomic unsafe transmutes would be an entirely different proposal.

No objection to using UTF-32 in place of UCS-4 here.

I wasn't aware of that! In that case, we should drop implementations for tuples (though we can still allow deriving them for tuple structs that use repr(C)).

That also means:

(2) no longer applies here, which removes a substantial amount of complexity.

Yes, much like the error you get if you try to derive(Debug) for a type whose fields don't implement Debug.

Per other discussion in this thread, I'd like to prevent the possibility of implementing those traits manually, rather than deriving them.

I don't think they need to be unsafe traits, if the compiler enforces that you can only derive them for types that meet the requirements. (Of course, if they can only be derived and never manually implemented, it doesn't really matter if they're unsafe traits or not since the only impls will get generated by the derive.)

Sure. Or unless we come up with a way to use const generics in the standard library to define such traits before they're stable. But either way, this would use the same mechanisms as other traits for [T; N], yes.

That alone wouldn't preclude implementing it as a fixed 4-byte UTF-8 buffer, which would have advantages and disadvantages.

Per CAD97, we need to just drop the impls for tuples, so we don't need to worry about either this or the magic no-padding requirement.

I wouldn't phrase it as "different results for a given input", but it certainly seems fine to explicitly point out that they return references to bytes in memory and thus to the in-memory representation in native endianness.

See the second post in this thread.

We mentioned it as an alternative for completeness. We don't plan to go that route, it just seemed important to document as a potential alternative.

We believe that going from bool to (for instance) #[repr(transparent)] struct B(bool) , or [bool; 1] , should work with the compiler optimizing away all the runtime checks. We're working on confirming that, and the RFC will definitely include demonstrations of that.

Even if this happens to work, it would need to be a guaranteed compiler optimization, because this is a performance oriented feature (if you don't need this optimization, you can just copy all fields manually, which is already safe).

As for going from [u8] to bool , the point of this proposal is to define a safe transmute operation. Going from [u8] to bool without checking the values would be an unsafe transmute operation; we already have that, as std::mem::transmute . Unsafe helpers for more ergonomic unsafe transmutes would be an entirely different proposal.

According to your proposal, going from bool -> Bool would still require going from bool to [u8] and then from [u8] to Bool because the FromBytes and ToBytes traits go through u8s. This is not only unergonomic (requiring two API calls) but also unsound, since it exhibits undefined behavior.

But how can we encode the necessary alignment requirement on the [u8; 4] to allow that?

The Compatible<T> proposal solves this problem by using ! as the error type in this conversion.

When I read the signature of cast_mut it seemed obvious to me that FromBytes would have an additional method:

trait FromBytes {
   fn from_bytes(bytes: &[u8]) -> Result<&Self, FromBytesError>;
   fn from_bytes_mut(bytes: &mut [u8]) -> Result<&mut Self, FromBytesError>;
}

I assumed this was the intent, and that it missing from FromBytes was an oversight.

But from_bytes_mut is neither of the two solutions you’re considering. Did you reject that option? Why?

See "Update 2" in that second post. The sample code there includes from_bytes_mut.

Doesn’t that make separate validation unnecessary? Or is it only to help manual impls avoid duplicating some logic? If so adding a trait seems overkill, those impls can refactor the common logic in a private free function by themselves.

1 Like

Isn't that called a "Blanket Implementation" currently? I'm nearly positive I've seen that terminology WRTT.

2 Likes

I wasn't clear on if "blanket impl" meant impl<T: OtherTrait> SomeTrait for T or impl<T> SomeTrait for T.

I don't think I understand what you mean by this. from_bytes and from_bytes_mut must validate before transmuting, for a non-byte-complete type.

It's not just "refactoring the common logic", it's also trying to make this solution safer, such as by preventing incorrect implementations of FromBytes entirely. At the moment, the solution we're leaning towards would allow you to implement ValidateBytes for non-byte-complete types, and then would not allow you to implement FromBytes manually at all; you either get the implementation for types that implement ValidateBytes or the implementation for types that implement FromAnyBytes.

Separate was the key word. By that I mean having a ValidateBytes trait at all (or even just a validate_bytes method in the FromBytes trait), rather than having from_bytes and from_bytes_mut methods in impls do validation.

But I see now. Having those manual impls not need to repeat the transmutes is interesting.

Declaring it unidiomatic to manually implement some traits sounds fine, but I don’t see the point of adding language and compiler special cases to actually enforce that rule. Making them unsafe traits seems enough to signal that implementers doing it anyway take their responsibilities.

Sounds like an "implied implementation" since we're talking "implication constraints" (if Implements(A: B) then Implements(C: D)).

Is that true? Some benchmarks to confirm it could be helpful. Provided that it is true, then I don't see why not.

I have not had time to sit and focus on reading this whole RFC carefully, and I will try to do so soon. However, I'll say right now that I would love if bytemuck were included in the prior art.

It's a much, much simpler interface than all of this. That naturally means that there's some uncommon cases where it might fall down a bit, but the common case is kept very plain and simple for the end user. I think that having a good common case is essential here. I know many folks on the Community Discord who don't even want to touch zero-copy or safe-transmute simply because of the API complexity, and I've been able to get them to adopt bytemuck by having a clear and easy API.

7 Likes
  • uninit/padding bytes can at best have "unrepeatable reads" semantics
  • supporting that generally seems unrealistic due to LLVM behaviour
  • may declare it not-UB to pass such mem to syscall-s and memcpy instead

So I'd say it's a different Alterantive I'd been referring to