pre-RFC FromBits/IntoBits

gnzlbg · March 16, 2018, 6:03pm

Feature Name: from_bits

Start Date: (fill me in with today’s date, YYYY-MM-DD)

RFC PR: (leave this empty)

Rust Issue: (leave this empty)

Summary and motivation

This RFC proposes to add two traits to the std library convert module, FromBits and IntoBits, as well as implementations of these traits for some of the std library types.

These traits are used to implement bit-pattern preserving conversions between types. Currently, the easiest way to perform these conversions is via unsafe code by means of mem::transmute.

These two traits allow users to express for which pairs of types every bit-pattern of the input type is also a valid bit-pattern of the output type, and thus a safe, infallible, and lossless conversion exists.

Motivation

The std library From and Into traits are used to express infallible conversions. In the context of numeric types, std follows the convention that these conversions must preserve numeric values:

assert_eq!(f64::from(13_i32), 13.0_f64);

Another common operation for these types is to perform a conversion that, instead of preserving the numeric value, preserves the bit-pattern of the value. The floating-point types have some inherent methods for this:

assert_eq!(f32::from_bits(0x5F3759DF_u32), 1.3211836e19_f32);

However, these methods are not generic, and as a consequence the following fails to compile:

assert_eq!(f32::from_bits(0x0_i32), 0.0_f32);

This isn’t a big of a deal if one only has a couple of types to convert to. For example, we could add a f32::from_bits_i32 method and call it a day. However, in the context of SIMD vector types, bitwise preserving conversions are incredibly common. For converting between architecture-specific vector types like __m256, __m256i, and __m256d, the number of _::from_bits_{...} conversion functions would remain reasonably small. But this is a list of all portable packed SIMD vector types whose bit-pattern often needs to be converted to __m256: i8x32, u8x32, i16x16, u16x16, i32x8, u32x8, f32x8, i64x4, u64x4, and f64x4.

Now consider adding the same amount of bitwise conversions for __m256i and __m256d, and then think about 64-bit, 128-bit, and 512-bit wide portable vectors and their architecture specific types. The number of total _::from_bits_xyz methods quickly reaches > 50.

Users might be tempted to reach for unsafe { mem::transmute(...) } in these cases, but not having to write any unsafe code is actually one of the main advantages of the portable packed SIMD vector types because, as opposed to the std::arch intrinsics, their API is safe. This is how one ARM NEON stdsimd test looks without these traits:

unsafe {
    let a = i16x8::new(1, 2, 3, 4, 5, 6, 7, 8);
    let b = i16x8::new(8, 7, 6, 5, 4, 3, 2, 1);
    let e = i16x8::new(9, 9, 9, 9, 9, 9, 9, 9);
    let r: i16x8 = mem::transmute(vaddq_s16(mem::transmute(a), mem::transmute(b)));
    assert_eq!(r, e);
}

and the same test with the traits:

let a = i16x8::new(1, 2, 3, 4, 5, 6, 7, 8);
let b = i16x8::new(8, 7, 6, 5, 4, 3, 2, 1);
let e = i16x8::new(9, 9, 9, 9, 9, 9, 9, 9);
let r: i16x8 = vaddq_s16(a.into_bits(), b.into_bits()).into_bits();
assert_eq!(r, e);

This RFC is one potential solution to this problem.

Guide-level explanation

With the traits proposed by this RFC, the currently-rejected snippet of code shown above:

assert_eq!(f32::from_bits(0x0_i32), 0.0_f32);

would compile and produce the correct result. The following currently-rejected snippets of code would also work correctly:

assert_eq!((0x0_u32).into_bits(), 0.0_f32);
assert_eq!((0x0_i32).into_bits(), 0.0_f32);

Reference

This RFC introduces two traits to core::convert analogous to From/Into that are used to provide a safe wrapper over bitwise preserving conversions:

pub trait FromBits<T>: marker::Sized {
    fn from_bits(T) -> Self;
}

pub trait IntoBits<T>: marker::Sized {
    fn into_bits(self) -> T;
}

// FromBits implies IntoBits:
impl<T, U> IntoBits<U> for T
where
    U: FromBits<T>,
{
    fn into_bits(self) -> U {
        U::from_bits(self)
    }
}

// FromBits (and thus IntoBits) is reflexive
impl<T> FromBits<T> for T {
    fn from_bits(t: Self) -> Self {
        t
    }
}

as well as implementations for the following equally-sized types:

impl FromBits<i8> for u8;
impl FromBits<u8> for i8;
impl FromBits<i16> for u16;
impl FromBits<u16> for i16;
impl FromBits<u32> for f32;
impl FromBits<f32> for u32;
impl FromBits<i32> for f32;
impl FromBits<f32> for i32;
impl FromBits<u32> for i32;
impl FromBits<i32> for u32;
impl FromBits<u64> for f64;
impl FromBits<f64> for u64;
impl FromBits<i64> for f64;
impl FromBits<f64> for i64;
impl FromBits<u64> for i64;
impl FromBits<i64> for u64;
impl FromBits<isize> for usize;
impl FromBits<usize> for isize;
impl FromBits<i128> for u128;
impl FromBits<u128> for i128;

Drawbacks

It adds a new pair of traits to std which might be painful.

Coherence

If crate A exposes the type AT, and crate B exposes the type BT, crate C cannot implement FromBits<A::AT> for B::BT.

Rationale and alternatives

Equally-sized types restriction

The proposed implementations are only restricted to equally-sized types.

This is however, not a requirement, since, for example, the following implementation would also be safe:

impl FromBits<i32> for i64;

The problem is that there are many ways to extend an i32 onto an i64, e.g., zero-extend, sign-extend, etc.

FromBits is not Bijective

That is: FromBits<T> for U does not imply FromBits<U> for T. This is by design.

In the context of stdsimd we have vector masks, like b8x8, a 64-bit wide type, containing eight 8-bit masks, where the bits of each mask are all either set of cleared. That is, each lane can only contain two values: 0 or u8::max_value(). Therefore, FromBits<b8x8> for u8x8 is a safe and correct operation, since all valid bit-patterns of the mask is a valid u8x8 bit patterns. However, its inverse: FromBits<u8x8> for b8x8 is not correct, since there are many u8x8 bit-patterns that aren’t valid b8x8 bit patterns.

Prior art

A version of this trait is currently used to provide easy .into_bits() conversions between both portable packed SIMD vector types themselves and against the architecture-specific vector types.

Unresolved questions

TBD.

zackw · March 16, 2018, 6:20pm

I think it would make the RFC more compelling if you gave some SIMD examples - cases where "users might be tempted to reach for unsafe { mem::transmute(...) }" as you put it.

jmst · March 16, 2018, 9:01pm

This seems way too restrictive.

A much more general design seems to have an auto trait that signals that all bits pattern are valid for a data type (and that all fields are recursively public and their types visible), and then allow to coerce &T into &[u8; size_of::\<T\>] and similar for &mut if T implements the trait (and also &[T] into &[u8] and &[T; n] into &[u8; n * size_of::\<T\>]) - and also provide a fallible conversion in the other direction returning None if the alignment is wrong, as well as safe transmute primitives.

Note that since the trait would only apply to data types with all recursively public and visible fields, they can’t change their definition anyway without breaking compatibility, so there would be no risk of the trait silently disappearing in a non-breaking update of a crate.

An even more general design would be a T: Compatible<U> trait that would work even for non-all-bits-valid subfields as long as both T and U have such a field in the same place, where the trait in the first design is equivalent to T: Compatible<[u8, size_of::<T>]>. This latter design would allow to coerce publicly-constructible newtypes to the original type, convert &T to/from &[T; 1], add/remove typestate, and address many other similar needs.

ExpHP · March 16, 2018, 9:25pm

Why are the traits unsafe? Normally an unsafe trait provides additional guarantees that can be relied on by unsafe code. The only guarantees I see here are “the methods are safe to call,” which is already guaranteed by the fact that the methods are not unsafe.

mark-i-m · March 16, 2018, 11:35pm

Because impl-ing them is unsafe. You the impl-er must guarantee that the two types can actually be converted that way safely.

vitalyd · March 17, 2018, 2:02am

I’d say the unsafety hinges on the concrete values that are being converted, not the whole notion itself; as long as two types have an overlap in some bit patterns, that intersection is safe. That implies that it’s the method that’s unsafe, and not the trait.

scottmcm · March 17, 2018, 2:38am

I think the focus on “bits” here is too limiting. Everything listed is already available in safe code via either as or a dedicated std method, so it’s not really opening up new ground. Why not also do things like &'a i32 to &'a u32?

For the unsafety: I would either expect that either

it’s an unsafe marker trait (no methods) and a corresponding safe free function that’s bound by the marker, or
it’s a safe trait with a safe method that is often implemented using unsafe

This also seems like a similar problem statement to these crates, which work totally differently… https://docs.rs/plain/ https://docs.rs/pod/

Centril · March 17, 2018, 3:04am

As @ExpHP wrote, an unsafe trait should be used when and means that there are some documented non-executable invariants that are being relied upon in unsafe { .. }. This does not seem to be the case here, wherefore the traits should not be unsafe.

gnzlbg · March 17, 2018, 8:52am

Makes sense, I've removed these. FWIW in stdsimd these traits are safe.

@mark-i-m the trait methods are safe, so their implementation must maintain safety.

@scottmcm

Everything listed is already available in safe code via either as

as is not implementable for user-defined types. An alternative would be to provide one trait that allows using as with user-defined types.

Why not also do things like &'a i32 to &'a u32?

I just implemented the trait for the types I cared about, but the list isn't comprehensive. Nothing prevents the addition of more implementation later. If you want to propose more, ideally, you would specify exactly for which types you want this implemented. If it makes sense I'll edit the list.

This like:

impl<'a> FromBits<&'a i32> for &'a u32 { }
impl<'a> FromBits<&'a mut i32> for  &'a mut u32 { }

sound good to me.

@jmst

A much more general design seems to have an auto trait that signals that all bits pattern are valid for a data type

These traits support cases for which not all bit-patterns are valid, like the reference above (a null reference would be invalid). In any case, I don't think I fully understand what you proposed, so maybe you could reformulate it in a different way? AFAICT you propose a trait that says whether transmuting into [u8; size_of::<T>()] is safe, and therefore, all types for which this is safe, can be safely transmuted into one another. But I am not sure if this is what I meant.

@zackw I will add some examples of this for the RFC. This is a part of the portable SIMD rfc that's not written yet, so I couldn't just copy it over here.

gnzlbg · March 17, 2018, 9:13am

Right now all usages in stdsimd require that the size of both types must be equal. I left that requirement provisionally out here, but I think these traits should maintain them.

The example of i32 to u64 shows the issue: there are many ways to do that, two of them often useful: zext and sext.

These two traits would, however, only be able to express one of them. AFAICT if these traits require the same size, because the bit-pattern preserving requirement, there is only one way to do this, and this way must be “invertible”, so one only needs to implement the trait for one permutation and a blanket impl could be used to provide the other.

gnzlbg · March 17, 2018, 10:07am

So plain looks larger in scope than this, allowing also to convert to/from u8 arrays and slices, and pod does look even larger in scope than plain. @scottmcm I think that this meshes also with the comment that @jmst made, and it does make sense to consider a path forward here that can be seamlessly extended to those use cases, or that at least those use cases can build on.

I find the approach of having a single unsafe trait appealing:

/// Marker trait that indicates that all 
/// bit-patterns of a type are valid. 
unsafe trait Raw {
   fn from_raw<T: Raw>(x: T) -> Self 
     where mem::size_of::<T>() == mem::size_of::<Self>() {
       unsafe { mem::transmute(x) }
   }
   fn into_raw<T: Raw>(self) -> T 
     where mem::size_of::<T>() == mem::size_of::<Self>() {
       unsafe { mem::transmute(self) }
   }
}

jmst · March 17, 2018, 6:06pm

If that's necessary, then a T: Compatible<U> "auto" trait seems better (the idea is that the compiler automatically implements T: Compatible<U> if T can be safely transmuted to U and obviously viceversa - perhaps "Transmutable" could be an option as a name as well).

It would work like FromBits (but with an external safe transmute method conditional to the trait being implemented), except the compiler would automatically provide it.

The upside is that unlike FromBits, which seems completely infeasible to implement for all possible pairs, it would always be available when it's appropriate. The downside is that it requires a compiler change.

gnzlbg · March 17, 2018, 8:42pm

@zackw I’ve added an example to the RFC.

kornel · March 17, 2018, 8:42pm

This doesn’t work for slices/arrays/vectors.

When I really need casting/transmuting is for conversions between [T] and [u8].

So I’d like a trait such as trait Primitive: Copy that is (auto?) implemented for all types that can be harmlessly cased to and from a bunch of bits, and then on top of that it will be safe to implement helper functions converting T, as well as [T], etc.

gnzlbg · March 17, 2018, 8:46pm

I don't understand how that would work. Suppose I define the following type:

// This struct can only contain one value (42, 42)
struct Foo(i32, i32);

impl Foo {
    pub fn new(x: i32, y: i32) -> Self {
        assert!(x == 42, y == 32);
        Foo(x, y)
    }
}

which as you see can only contain one value Foo(42, 42), an invariant maintained by the new constructor, which can be relied by unsafe code for correctness.

How would the compiler know that this type is not "compatible" with (i32, i32) ?

gnzlbg · March 17, 2018, 8:47pm

How would you transmute two regions of memory of different sizes into each other?

kornel · March 17, 2018, 9:00pm

I do that with slice.as_ptr() and slice_from_raw_parts with length adjusted by size_of

T to u8 AFAIK would always be safe (the other way round can fail due to required alignment).

I’m also interested in conversion between [T] and [Newtype<T>], and [T] to [[T; 3]]. I don’t mind using unsafe and upholding invariants of slices, but I want to support arbitrary user-supplied T type for which this is safe.

gnzlbg · March 18, 2018, 8:57am

@kornel What I meant is, how do you deal with this (assuming alignment would be right):

pub struct Foo(u8, u8);
let xs = [0_u8; 1];
let Foos: &[Foo] = (&xs).into_bits();   // ?

That is, I don’t understand how safe infallible conversions between slices could work because the length of a slice is not part of its type, so at best such a conversion would need to be fallible and return an Option (and would need to be accounted for by TryFromBits, or similar traits, not discussed here). In the case above, I don’t see a clear way of safely “transmuting” a slice of 1 u8 to a slice of Foo where each Foo has a size of 2 u8s. So what am I missing?

jmst · March 18, 2018, 2:25pm

The compiler sees that the fields are not public, and types with any private field would not be compatible with anything except trivially their own exact type (and maybe the same type with different generic parameters, if the private fields had compatible types and if an attribute to enable that was manually placed on the type).

gnzlbg · March 18, 2018, 2:27pm

I see. The question then becomes: how do we offer this functionality for types with private fields? FromBits/IntoBits allow this. That is, given the previous example:

// This struct can only contain one value (42, 42)
struct Foo(i32, i32);
impl Foo {
    pub fn new(x: i32, y: i32) -> Self {
        assert!(x == 42, y == 42);
        Foo(x, y)
    }
}

// This struct can contain all permutation pairs of 42, 7: (42, 42)
// (7, 7), (7, 42), (42, 7)
struct Bar(u32, u32);
impl Bar {
    pub fn new(x: u32, y: u32) -> Self {
        assert!(x == 42 || x == 7, y == 42 || y == 7);
        Foo(x, y)
    }
}

impl FromBits<Foo> for Bar { ... } // This is safe and correct
// impl FromBits<Bar> for Foo { ... } // can't be implemented

Transmuting from Foo to Bar is ok while the opposite is not true.

We use this property of FromBits/IntoBits in stdsimd a lot actually. For example, transmuting from b8x8 (a boolean mask of 8 8-bit integers) into i8x8 is ok, but the opposite isn't (not al i8x8 bit patterns are valid b8x8 ones). Also, all fields of SIMD types are private. You might use a native 128-bit type on one architecture, but emulate it using 4x 32-bit types in another, so the exact internal representation isn't something we want to expose.

Topic		Replies	Views
Pre-RFC: PlatformFrom and PlatformInto libs	14	1682	June 25, 2020
pre-RFC: default fn impl in std::convert::From libs	7	1138	March 25, 2019
Pre-RFC: Add explicitly-named numeric conversion APIs libs	26	4945	March 11, 2020
Proposal: Platform-dependent conversions libs	9	962	June 25, 2020
New trait: core::convert::IntoUnderlying libs	2	593	March 28, 2021