[Pre-RFC] Thin pointers with inlined metadata

Previous discussions: Idea: `Thin` wrapper of DSTs


Summary

This RFC adds Thin<T> that wraps T's metadata inline, which makes Thin<T> thin even if T is !Sized.

Motivation

Pointers of dynamically sized types (DSTs) are fat pointers, and they are not FFI-compatible, which obstables some common types like &str, &[T], and &dyn Trait from being passed across the FFI boundaries.

1. Passing pointers of DSTs accross FFI-boundary is hard

Currently, it's difficult to using DSTs in FFI-compatible functions (even by-pointer). For example, it is not allowed to use &str, &[T], or &dyn Trait types in an extern "C" function.

extern "C" fn foo(
    str_slice: &str, //~ ERROR not FFI-compatible
    int_slice: &[i32], //~ ERROR not FFI-compatible
    opaque_obj: &dyn std::any::Any, //~ ERROR not FFI-compatible
) { /* ... */ }

Instead, users have to wrap these types in #[repr(C)] structs:

/// FFI-compatible wrapper struct of `&[T]`
#[repr(C)]
pub struct Slice<'a, T> [
    len: usize,
    ptr: NonNull<()>,
    _marker: PhantomData<&'a [T]>,
}

/// FFI-compatible wrapper struct of `&str`
#[repr(C)]
pub struct StrSlice<'a> {
    len: usize,
    bytes: NonNull<u8>,
    _marker: PhantomData<&'a str>,
}

/// FFI-compatible wrapper of `&dyn Trait`
#[repr(C)]
pub struct DynTrait<'a> {
    vtable: NonNull<()>,
    ptr: NonNull<()>,
    _marker: PhantomData<&'a dyn Trait>,
}

Luckily, the abi_stable crate provides a series of FFI-compatible types like RSlice<'a, T>, RSliceMut, RStr<'a>, and an attribute macro sabi_trait that makes ABI-stable trait objects (which are also FFI-compatible).

However, that is tedious and and non-exhaustive because the library writer cannot enumerate all compound DSTs (e.g. ADTs with a DST field) exhaustively.

2. Slices cannot be unsized to trait objects

Suppose there is a dyn-safe trait MyTrait, and it is implemented for [T]. However, it is not possible to convert an &[T] to an &dyn MyTrait because [T] doesn't impl Sized.

trait MyTrait {
    fn foo(&self);
}

impl<T> MyTrait for [T] {
    fn foo(&self) { /* ... */ }
}

fn as_my_trait<T>(x: &[T]) -> &dyn MyTrait {
    x //~ ERROR the size for values of type `[T]` cannot be known at compilation time
}

Guide-level explanation

To overcome the obstacles above, we introduce a Thin<T> wrapper that stores the metadata and a (sized) value inside and thus keeps pointers of Thin<T> thin.

Passing DST pointers accross the FFI boundraries

extern "C" fn foo(
    str_slice: &Thin<str>, // ok because `&Thin<str>` is thin
    int_slice: &Thin<[i32]>, // ok because `&Thin<[i32]>` is thin
    opaque_obj: &Thin<dyn std::any::Any>, // ok because `&Thin<dyn std::any::Any>` is thin
} { /* ... */ }

// Construct the values of DSTs on stack
let str_slice: &Thin<str> = thin_str!("something");
let int_slice: &Thin<[i32]> = &Thin::new_unsized([1, 2, 3]);
let opaque_obj: &Thin<dyn std::any::Any> = &Thin::new_unsized(String::from("hello"));
// Pass the thin DSTs accross FFI boundary
unsafe {
    foo(str_slice, int_slice, opaque_obj);
}

Making trait objects of slices

trait MyTrait {
    fn foo(&self);
}

impl<T> MyTrait for Thin<[T]> {
    fn foo(&self) { /* ... */ }
}

// Construct a thin `Thin<[i32]>` on stack
let value: &Thin<[i32]> = &Thin::new_unsized([1, 2, 3]);
// Coerce it to a trait object
let dyn_value: &dyn MyTrait = value; // ok because `Thin<[i32]>` is thin
// Calls `<Thin<[i32]> as dyn MyTrait>::foo`
dyn_value.foo();

Unify normal and thin containers

Given that:

  • List<T> in rustc that is a thin [T] with the metadata (length) on the head;
  • ThinVec<T> that put the length and capacity components together with its contents on the heap;
  • ThinBox<T> like Box<T> but put the metadata together on the heap;
  • thin_trait_object, an attribute macro that makes a thin trait object (by manually constructing the vtable).

Now they can be rewritten as:

  • List<T> -> &Thin<[T]>
  • ThinVec<T>, technically Box<(usize, Thin<[MaybeUninit<T>]>)> (in representation)
  • ThinBox<T> -> Box<Thin<T>>
  • BoxedTrait -> Box<Thin<dyn Trait>>

where much less boilerplate codes are needed.

Reference-level explanation

Add ValueSized to the sized hierarchy

Regarding sized hierarchy, Thin is more than PointeeSized but not MetaSized:

  • it is not MetaSized because the metadata is not carried by the pointer itself;
  • it is more than PointeeSized because we actually know its size by reading the metadata stored inside.

We need to add a new stuff to the sized hierarchy, named ValueSized, to indicate a value of which the size is known by reading its value, as mentioned in RFC 3729 (comments).

// mod core::marker;

/// Indicates that a type's size is known from reading its value.
/// 
/// Different from `MetaSized`, this requires pointer dereferences.
pub trait ValueSized: PointeeSized {}

// Change the bound of `MetaSized: PointeeSized` to `MetaSized: ValueSized`
pub trait MetaSized: ValueSized {}

Public APIs

The public APIs of Thin consist of 2 parts:

  • Thin<T, U>, which is a (maybe unsized) value of T with the metadata type of U carried on. Typically, U = T or U is some type that T: Unsize<U>.
  • EraseMetadata<T> which is a wrapper of (maybe unsized) T, which ignores the metadata of T. E.g., both &ErasedMetadata<dyn Trait> and &ErasedMetadata<[u8]> has the same size as a thin pointer &().
// mod core::thin;

/// Wrapping a DST `T` with its metadata inlined,
/// then the pointers of `Thin<T>` are thin.
///
/// the generic type `U` is for two-stage construction of
/// `Thin`, i.e., `Thin<T, U> where T: Unsize<U>` must be
/// constructed first, then coerced (unsized) to `Thin<T>`
/// (aka `Thin<T, T>`)
#[repr(C)]
pub struct Thin<T: Pointee, U: Pointee = T> {
    metadata: U::Metadata,
    data: EraseMetadata<T>,
}

// The size is known via reading its metadata.
impl<U: Pointee> ValueSized for Thin<U> {}

/// A wrapper that ignores the metadata of a type.
#[lang = "erase_metadata"]
#[repr(transparent)]
pub struct EraseMetadata<T: Pointee>(T);

// The size is unknown because the metadata is erased.
impl<T: Pointee> PointeeSized for EraseMetadata<T> {}

// Value accesses
impl<U: Pointee> ops::Deref for Thin<U> {
    type Target = U;
    fn deref(&self) -> &U;
}
impl<U: Pointee> ops::DerefMut for Thin<U> {
    fn deref_mut(&mut self) -> &mut U;
}

Value constructions

For a sized typed Thin<T>, it is able to construct with Thin::<T>::new. For an unsized (MetaSized) typed Thin<U>, in general, it requires 3 steps to construct a Thin<U> on stack or on heap:

  • construct a sized value of Thin<T, U> via Thin::<T, U>::new_unsized (where T: Unsize<U>).
  • obtain a pointer (i.e., &, &mut, Box, Rc, Arc, etc.) of Thin<T, U> via their constructors.
  • coerce the pointer of Thin<T, U> to the pointer of Thin<U>.

Here are the APIs related to value constructions mentioned above:

impl<T: Sized> Thin<T> {
    /// Create a sized `Thin<T>` value, which is a simple wrapper of `T`
    pub fn new(value: T) -> Thin<T> {
        Self {
            metadata: (), // Sized type `T` has an empty metadata
            data: ErasedMetadata(value),
        }
    }
}

impl<T: Sized, U: Pointee> Thin<T, U> {
    /// Create a sized `Thin<T, U>` value with metadata of unsized type `U`,
    /// which can be coerced (unsized) to `Thin<U>`
    pub fn new_unsized(value: T) -> Self
    where
        T: Unsize<U>,
    {
        Self {
            metadata: ptr::metadata(&value as &U),
            data: ErasedMetadata(value),
        }
    }
    /// Consume the `Thin<T>` and return the inner wrapped value of `T`
    pub fn into_inner(self) -> T {
        self.data.0
    }
}

/// `Thin<T, U>` has the same layout as `Thin<U>`, so that it can be coerced
/// (unsized) to `Thin<U>`
impl<T: Sized, U: Pointee> Unsize<Thin<U>> for Thin<T, U> 
where
    T: Unsize<U>
{}

Drawbacks

The term Thin has a different meaning with a previous term: the trait core::ptr::Thin which means types with Metadata = ().

3 Likes

Does this require new language support or is it just a library change? If it's just a library change, you should submit an ACP

Nevermind I see the EraseMetadata type now

I don't think Thin<[T]> can be unsized into dyn Trait as described, as dyn Trait is currently MetaSized for all trait Trait, meaning that the compiler needs to be able to make a DynMetadata<dyn Trait> pointer metadata that "contains" the correct layout for the source type when unsizing to dyn Trait, and it is not possible to make a DynMetadata<dyn Trait> which gives the correct layout for all Thin<[T]> (since Thin<[T]> with the same metadata have different layouts).

We could maybe add a new concept of ?MetaSized trait objects, e.g. dyn Trait + ?MetaSized or dyn Trait + PointeeSized, that are not MetaSized, and thus can be unsized from Thin<_> if it implements Trait (or extern types or any other T: Thin).

Thank you for your reminding! That's related to extending dyn Trait from MetaSized to ValueSized.

Probably dyn Trait + ValueSized? Instead of storing the size of type directly in DynMetadata<dyn Trait>, it should storing a function pointer that can be invoked to get the size of the value.

I'm afraid the new concept ValueSized is still too wide for Thin, as ValueSized is closedly connected with issues like how !Freeze type behavour? Can size being changed after an arbitrary ValueSized value created? (e.g., the ValueSized version of C null-terminated string, with the size of which determined by reading the contents till \0, what if a middle char of the string is assigned to \0?)

Maybe non-const-MetaSized fits better for Thin<U>, where the metadata is known inside somewhere in U (but unknown in compile-time), and also requires DynMetadata<T: const MetaSized>, so that dyn Trait + MetaSized, (different from dyn Trait which has an implicit const MetaSized bound) cannot be made into a DynMetadata<dyn Trait + Metadata>.

Then dyn Trait + MetaSized can be still non-const-MetaSized, where the vtable of dyn Trait + MetaSized stores the offset of the real metadata in the underlying non-const-MetaSized value, instead of the compile-time known size.