Idea: trait-based initialization

Currently, the API for MaybeUninit<T> is quite a mess when dealing with arrays and slices. There are (unstable) associated functions for initializing slices, marking slices as initialized, marking arrays as initialized, creating uninitialized arrays, etc. This API is also inconsistent, as for example assume_init is a method, but array_assume_init is an associated function.

I therefore propose to augment MaybeUninit with a trait-based initialization API in core::mem. This API is based around

pub unsafe trait Uninitialized {
    type Initialized: Pointee<Metadata = <Self as Pointee>::Metadata> + ?Sized;
}

which is implemented for types which may be uninitialized and are layout-compatible with their initialized form:

unsafe impl<T> Uninitialized for MaybeUninit<T> {
    type Initialized = T;
}

unsafe impl<T> Uninitialized for [MaybeUninit<T>] {
    type Initialized = [T];
}

unsafe impl<T, const N: usize> Uninitialized for [MaybeUninit<T>; N] {
    type Initialized = [T; N];
}

Based on this trait are the following free-standing functions:

pub unsafe fn assume_init<T>(location: T) -> T::Initialized
where
    T: Uninitialized;

pub unsafe fn assume_init_ref<T>(location: &T) -> &T::Initialized
where
    T: Uninitialized + ?Sized;

pub unsafe fn assume_init_mut<T>(location: &mut T) -> &mut T::Initialized
where
    T: Uninitialized + ?Sized;

/// Corresponds with `MaybeUninit::write`.
pub fn initialize<T>(location: &mut T, value: T::Initialized) -> &mut T::Initialized
where
    T: Uninitialized;

/// Corresponds with `MaybeUninit::write_slice` and `MaybeUninit::write_slice_cloned`
/// (can be optimized when `T::Initialized: Copy` using specialization).
pub fn initialize_from<T>(location: &mut T, value: &T::Initialized) -> &mut T::Initialized
where
    T: Uninitialized + ?Sized,
    T::Initialized: Clone;

With this new API, we now only have a few functions for all usecases, making the whole initialization story more consistent. This is backwards-compatible, although one could deprecate a few methods as they are now redundant. Also, as a cool side-effect, these functions also work for user types, in case one wants to have partially initialized types or such.

10 Likes

Assuming the implementations work, which I think they should, I honestly think this would be great. It solves a clear problem in a compact, well-defined manner.

I haven't thought about whether the proposed API would be exactly the one, but :+1: to the motivation.

libs-api seems like they'd be interested too. See this comment from after a recent meeting about the API surface getting to be a bit of a mess: Added `Box::take()` method by Kixunil · Pull Request #93653 · rust-lang/rust · GitHub

Aside: those Uninitialized impls make me think of ToOwned. Dunno if there's a reasonable parallel there.

And don't forget the zeroed constructors too!

I just noticed initialize_from does not work with unsized types as Clone requires Sized. One could of course make those functions trait methods, but I rather like having the trait just be a marker trait, so I can't really think of a solution there.

There's also fn uninit<T: Uninitialized>() -> T and fn as_mut_ptr<T: ?Sized + Uninitialized>(location: &mut T) -> *mut T::Initialized. And maybe also fn new<T: Uninitialized>(val: T::Initialized) -> T. I think we can replace the whole API surface of MaybeUninit with this, which is great!

Another idea: I had a look at bytemuck, which has a similar API. It handles slices with different functions than statically-sized types. Copying that API would result in two more functions, but would be quite consistent, as creating uninitialized slices with e.g. Box::uninit_slice requires different arguments than statically-sized uninitialized types. The API would look like this:

pub unsafe trait Uninitialized: Sized {
    type Initialized;
}

unsafe impl<T> Uninitialized for MaybeUninit<T> {
    type Initialized = T;
}

unsafe impl<U: Uninitialized, const N: usize> Uninitialized for [U; N] {
    type Initialized = [U::Initialized; N];
}

// I left out the `zeroed` function because of naming conflicts.
pub fn uninit<U: Uninitialized>() -> U;

pub unsafe fn assume_init<U: Uninitialized>(location: U) -> U::Initialized;
pub unsafe fn assume_init_ref<U: Uninitialized>(location: &U) -> &U::Initialized;
pub unsafe fn assume_init_mut<U: Uninitialized>(location: &mut U) -> &mut U::Initialized;
pub unsafe fn assume_init_slice<U: Uninitialized>(slice: &[U]) -> &[U::Initialized];
pub unsafe fn assume_init_slice_mut<U: Uninitialized>(slice: &mut [U]) -> &mut [U::Initialized];

pub fn initialize<U: Uninitialized>(location: &mut U, value: U::Initialized) -> &mut U::Initialized;
pub fn initialize_slice<U>(slice: &mut [U], values: &[U::Initialized]) -> &mut [U::Initialized]
where
    U: Uninitialized,
    U::Initialized: Clone;

The other option are trait methods:

pub unsafe trait Uninitialized {
    type Initialized: Pointee<Metadata = <Self as Pointee>::Metadata> + ?Sized;

    fn uninit() -> Self
    where
        Self: Sized
    fn zeroed() -> Self
    where
        Self: Sized

    fn assume_init(self) -> Self::Initialized
    where
        Self: Sized;
    fn assume_init_ref(&self) -> &Self::Initialized;
    fn assume_init_mut(&mut self) -> &mut Self::Initialized;

    fn initialize(&mut self, val: Self::Initialized) -> &mut Self::Initialized
    where
        Self: Sized;
    fn initialize_from(&mut self, val: &Self::Initialized) -> &mut Self::Initialized;
}

The trait usage is unclear to me. Which invariant must an implementor guarantee that requires it to be unsafe? The sketch only shows it for variants of MaybeUninit but either the intent is to seal it (in which case it could be safe but hidden/unstable) or I'd like to see an example of how this could be applied to my own type.

This doesn't deal with initialization itself. How would I write something that initializes an array with a custom sequence? Let's note that this would be 100% easier if MaybeUninit<[T]> was a proper type in the first place. Then we would have the unambiguous normal form for inputs to intializer functions, and this would work for all the above cases (there are already safe converters to normalized array-of-uninit to uninit-of-array etc.)¹:

/// Implementee guarantees that `init` returns the same memory.
unsafe trait IntializeSeed<T: ?Sized> {
    fn init(self, mem: &mut MaybeUninit<T>) -> &mut T;
}

This, of course, doesn't make sense for all T: ?Sized. Only for types for which Metadata does not inspecting the pointee memory for its operations (which includes sized types and slices, but also a few more such as dyn-traits objects(!)). The actual problem is the same as your Unitialized trait tries to solve. However, I would argue that relaxing to a trait bound on T is more proper than introducing an artificial new trait that only applies to MaybeUninit type; since fundamentally 'working with an unitialized memory representation' is a property of the type parameter, not the MaybeUninit wrapper.


A different approach, relying on generativity, would be to simply require functions to produce a certificate of initialization.

/// Asserts that the place tagged with the lifetime `ref` was initialized.
/// Note: this type is _invariant_ in `'ref`.
struct Init<'ref, T>(PhantomData<fn(&'ref mut T) -> &'ref mut T>);

impl<'ref, T> Init<'ref, T> {
    /// Proof by assertion of the caller.
    /// #Safety: Caller must have initialized the value.
    pub unsafe fn new_unchecked() -> Self { … }
    /// Proof by construction.
    pub fn new(_: &'ref mut MaybeUninit<T>, _: T) -> Self { … }
    /// … several more proofs by construction may be available
    /// e.g. fn zeroed() if we have a Pod trait. Or `bytemuck`
    /// can itself provide a safe constructor for that case.
}

/// Example on other safe initialization pattern:
impl<'ref, T, const N: usize> Init<'ref, [T; N]> {
    pub fn from_copy(_: &'ref mut MaybeUninit<[T; N]>, _: T) -> Self { … }
}

impl<T> MaybeUninit<T> {
    /// Correctness: any function capable of producing `Init` must have
    /// initialized the place. Within `f` the lifetime appears 'unique' 
    /// (or fresh) so it's not possible to reuse a proof.
    pub fn init(self, f: impl FnOnce(&mut T) -> Init<T>) -> T {…}
}
/// If we can make this work.
impl<T: ?Sized + TBD> MaybeUninit<T> {
    pub fn init_mut(&mut self, f: impl FnOnce(&mut T) -> Init<T>)
        -> &mut T
   {…}
}

The trait is meant to be implemented for types which are allowed not to be initialized, so MaybeUninit and arrays of MaybeUninit. It could be sealed, but that is not necessary, as long as implementing it is unsafe. My proposal only addresses the inconsistencies in the API, as we have three types of uninitialized data (single, array, slice), three forms of assume_init for each of them (two for slices) and two ways of generating the two statically sized types (uninit, zeroed, uninit_array, zeroed_array), for a total of 12 functions, some of them methods and others associated (MaybeUninit::assume_init is a method, MaybeUninit::array_assume_init is an associated function). My proposal would reduce that to 7 or 9 free-standing functions for all cases.

Your idea, however, is more about new ways to proof initialization (in a very elegant way, I might add), instead of a simple reorganization of the existing API.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.