Safe conversions for DSTs


#1

I was wondering if we could introduce an unsafe Transmute<T> trait, that would indicate when it it safe to transmute from one type to another, and then provide a way to do that conversion in safe code. The idea is that even though you can’t convert directly between two unsized types, you should be able to convert between pointers to those types just fine. So Box<T> is convertible to Box<U> where T: Transmute<U>.

There’s a couple issues I encountered in trying to test this out:

  1. Should we also have a TryTransmute trait for fallible conversions? For example, &str can safely be converted to &[u8], but the conversion from &[u8] to &str is fallible.
  2. Some conversions (like str to [u8]) are only safe for immutable references/smart pointers like &T or Rc<T>, or when the value is owned outright with Box<T> or &move T, and some work in all cases including &mut T. I was thinking about having TransmuteOwned, TransmuteShared, TransmuteMut variants, but then with fallible conversions we also need TryTransmuteOwned, TryTransmuteShared, and TryTransmuteMut.

Here’s a table listing conversions and what types of references they are safe for.

From To &T (shared) &mut T (mutable) Box<T> (owned)
str [u8]
[u8] str fallible fallible fallible
T ManuallyDrop<T> ✓ (I think) ✓ (I think)
T Cell<T>
Cell<T> T

The first row says that &str can be safely converted to &u8, and similarly Box<str> can be safely converted to Box<u8>, but &mut str cannot be safely converted to &mut u8 because then you could modify the bytes and make it invalid utf-8, rendering the original str invalid.

If people have suggestions for types to convert between, I’d love to add to this list to get a better sense of what conversions are out there.


#2

One that I have often wanted: &[u8; N] -> &[u64; N/8] and similar for other types


#3

There is a longish history of design work around this (would be nice to finally get a proposal accepted!), many of which are linked from: https://github.com/rust-lang/rfcs/issues/270


#4

T -> Cell<T> is an interesting one since I believe it’s valid for mut and owned, but would be invalid for shared references.


#5

For as_ref() of my wrapper struct I need to cast

*mut struct {foo; Vec<T>}

as

*mut struct {foo; &[T]}

It works (since Vec happens to have slice-compatible layout at the beginning), but it’s sooo hacky.


#6

Am I right that that would behave differently depending on the platform’s byte order? I guess it wouldn’t be an unsafe conversion, but could lead to logic errors and unportable code.

Thanks for linking that issue! After a quick look at the RFC PR, it looks like Coercible and Transmute are pretty much the same, and HasPrefix is an interesting extension.

I think the way forward, in terms of getting a proposal accepted, is to start experimenting in a crate outside of std. I don’t think anything here actually needs language support

Thanks for the suggestion, I’ll add it to the list! I guess T -> UnsafeCell<T> also works, but T -> RefCell<T> doesn’t because it has extra fields


#7

@kornel that does seem hacky! I guess Vec<T> could implement HasPrefix<Box<[T]>> from the RFC that @glaebhoerl linked. It would have to have some way of guaranteeing the compiler won’t reorder the fields though.


#8

Copying my post from https://github.com/rust-lang/rust/issues/49792#issuecomment-379638786:

Two recent internals threads with thoughts around this area:

It seems to me like there’s a general common theme here of “safe, but a bit weird and rather transmute-y” conversions: this thread’s uN <=> [u8; N/8], the first thread’s u16x8 <=> u32x4 and u32 <=> f32, some parts of as like u32 <=> i32 that currently don’t have a method version, and extended versions of that like &'a u32 <=> &'a i32 that are never exposed as safe today (but could be).

So here’s a sketch of an idea using #[marker] traits:

#[marker] unsafe trait InplaceReinterpretAs<T> {}
unsafe impl<T> InplaceReinterpretAs<T> for T {}
unsafe impl InplaceReinterpretAs<[u8; 4]> for u32 {}
unsafe impl InplaceReinterpretAs<i32> for u32 {}
unsafe impl InplaceReinterpretAs<u32> for i32 {}
unsafe impl<T, U> InplaceReinterpretAs<*const U> for *const T {}
unsafe impl<T, U> InplaceReinterpretAs<*mut U> for *mut T {}
unsafe impl InplaceReinterpretAs<u16x8> for u32x4 {}
unsafe impl InplaceReinterpretAs<u32x4> for u16x8 {}

#[marker] unsafe trait ReinterpretAs<T> {
    // Because it's a marker trait, these cannot be overridden,
    // and thus their behaviour is always predicatable
    fn reinterpret(self) -> T {
        unsafe {
            let r = ptr::read_unaligned(&self as *const Self as *const T);
            mem::forget(self);
            r
        }
    }
    unsafe fn reinterpret_unchecked(x: T) -> Self {
        let r = ptr::read_unaligned(&x as *const T as *const Self);
        mem::forget(x);
        r
    }
}
unsafe impl<T, U> ReinterpretAs<U> for T where T: InplaceReinterpretAs<U> {}
unsafe impl<'a, T, U> ReinterpretAs<&'a U> for &'a T where T: InplaceReinterpretAs<U> {}
unsafe impl<'a, T, U> ReinterpretAs<&'a mut U> for &'a mut T where T: InplaceReinterpretAs<U> {}
unsafe impl ReinterpretAs<u32> for [u8;4] {} // not ok in-place, but fine as memcpy

Certainly std is generally adverse to introducing these using traits, but I think the recursiveness of the scenario makes the trait version more compelling than normal in this case. If one can turn a u32 into a [u8; 4] safely, why not also be able to turn a &[u32] into a &[[u8; 4]] safely?

(Name inspired by C++'s reinterpret_cast, obviously.)


Pre-RFC: Trait for deserializing untrusted input
#9

One other thing I wanted to add: there is a distinction between conversions that alter the pointer metadata, and those that do not. Some examples of conversions that alter the pointer metadata:

  • upcasting a trait object to a supertrait object
  • downcasting a trait object to its concrete type (fallible)
  • downcasting a slice to a sized array (fallible)
  • casting between thin and fat trait object pointers

It would be interesting to hear if there are more conversions like this. If there are, it may be useful to handle them more generally with a trait.


#10

Well, I guess that depends on where you get the bytes. If you just have a file that you want to read bytewise and do something to, it might not matter. In my case, I wanted to operate word by word instead of byte by byte because it was a lot faster…


#11

That in particular doesn’t work because of alignment issues.


#12

Well, that’s true, though it depends on platform…


#13

Sure, the only point I was trying to make is that it’s generally not safe to do, and as such it shouldn’t be a safe operation that appears to “just work”, whereas in reality it might well invoke undefined behavior, and an unsuspecting user could make it crash, then have trouble debugging it…


#14

I agree, but it would be nice to have it expose in some way other than a straight transmute


#15

But what’s wrong with the transmute in that case? It needs to be unsafe in any case, and transmute is just one function call.


#16

That’s a great question :slight_smile:

I actually made this mistake not too long ago. Do you see the bug?

// assume `bytes.len() % 8 == 0`
fn bytes_to_u64(bytes: &[u8]) -> &[u64] {
    unsafe {
        let len = bytes.len();
        let raw = bytes.as_ptr();
        slice::from_raw_parts(raw as *const u64, len)
    }
}

It might be unsafe regardless, but we can still make it less error-prone…


#17

The two bugs I can spot right away is len not being divided by size_of::<u64>() / size_of::<u8>() and the aforementioned alignment issue.

Sure, we should try to make things as fault-torelant as possible. That’s the point of using Rust, after all :slight_smile: However, again, I don’t think adding these coercions would improve the error rate, exactly because they require a nontrivial amount of thinking in order to work correctly even if you add some syntactic sugar on top. Sure, we could eliminate the len / 8 bug, but we could do that with a plain function as well.


#18

Yes, later proposals actually called it that. (I originally took “Coercible” from Haskell but it turned out Haskell and Rust mean very different things by “coerce”, so let’s banish the word as far as possible to avoid further confusion.) IIRC gereeter’s RFCs (also linked) were much better fleshed-out than my own.


#19

@scottmcm

One important axiom mentioned in those threads is transitivity. That is, if you can “reinterpret” an [u8; 4] into a u32, and a u32 into an i32, then, one should be able to “reinterpret” an [u8; 4] into an i32 without explicitly providing an impl for that.

I don’t see any of your impls covering transitivity, and in fact I don’t know how that could be done using a pure library-based solution.


#20

Just wanted to point folks here to my recent post about types which are safe to be deserialized from arbitrary byte sequences: Pre-RFC: Trait for deserializing untrusted input. I think it’s effectively just a special case of InplaceReinterpretAs<T> for [u8], but there’s some discussion there of both ergonomics and also alignment and size issues. Glad to see that everybody seems to be discussing this stuff at once :slight_smile:

EDIT: I realized that a trait that always converts from [u8] is more powerful in one critical way - since it’s not parametric, you can derive it. That’s a big part of the proposal of my trait, ArbitraryBytesSafe. You could then do something like unsafe impl<T: ArbitraryBytesSafe> InplaceReinterpretAs<&T> for &[u8] (and the same for &mut), and use #[derive(ArbitraryBytesSafe)] to get a safe implementation of InplaceReinterpretAs for your type.