Another Experiment To Make Unsafe Rust Safer: Preventing UB In MaybeUninit With Compile Time Error

Hello, I'm doing this experiment. What do you guys think? :]

use std::mem::MaybeUninit;
use std::marker::PhantomData;

pub struct Uninit;
pub struct Init;

pub struct UninitStorage<T>(MaybeUninit<T>);

pub struct InitStorage<T>(MaybeUninit<T>);

impl<T> Drop for InitStorage<T> {
    fn drop(&mut self) {
        unsafe {
            self.0.assume_init_drop();
        }
    }
}

pub struct UninitGuard<T, State> {
    storage: State,
    _marker: PhantomData<T>,
}

#[diagnostic::on_unimplemented(
    message = "Illegal access: Memory is not initialized yet!",
    label = "Attempted to operate on data here, but the status is still `Uninit`",
    note = "Call `.write(val)` on the UninitGuard first before accessing its references or pointers."
)]
pub trait IsInitialized<State> {}

#[diagnostic::do_not_recommend]
impl<T> IsInitialized<InitStorage<T>> for UninitGuard<T, InitStorage<T>> {}

impl<T> UninitGuard<T, UninitStorage<T>> {
    pub fn new() -> Self {
        Self {
            storage: UninitStorage(MaybeUninit::uninit()),
            _marker: PhantomData,
        }
    }

    pub fn zeroed() -> Self {
        Self {
            storage: UninitStorage(MaybeUninit::zeroed()),
            _marker: PhantomData,
        }
    }

    pub const fn uninit() -> Self {
        Self {
            storage: UninitStorage(MaybeUninit::uninit()),
            _marker: PhantomData,
        }
    }

    pub fn write(self, val: T) -> UninitGuard<T, InitStorage<T>> {
        let mut storage = MaybeUninit::uninit();
        storage.write(val);
        UninitGuard {
            storage: InitStorage(storage),
            _marker: PhantomData,
        }
    }
}

impl<T, State> UninitGuard<T, State> 
where
    Self: IsInitialized<State>,
{
    pub fn as_ptr(&self) -> *const T {
        let storage = unsafe { &*( &self.storage as *const State as *const MaybeUninit<T> ) };
        storage.as_ptr()
    }

    pub fn as_mut_ptr(&mut self) -> *mut T {
        let storage = unsafe { &mut *( &mut self.storage as *mut State as *mut MaybeUninit<T> ) };
        storage.as_mut_ptr()
    }

    pub fn get_ref(&self) -> &T {
        let storage = unsafe { &*( &self.storage as *const State as *const MaybeUninit<T> ) };
        unsafe { storage.assume_init_ref() }
    }

    pub fn get_mut(&mut self) -> &mut T {
        let storage = unsafe { &mut *( &mut self.storage as *mut State as *mut MaybeUninit<T> ) };
        unsafe { storage.assume_init_mut() }
    }

    pub fn assume_init(self) -> T {
        let mut this = std::mem::ManuallyDrop::new(self);
        let storage = unsafe { &mut *( &mut this.storage as *mut State as *mut MaybeUninit<T> ) };
        unsafe { storage.assume_init_read() }
    }

    pub fn replace(&mut self, val: T) -> T {
        let storage = unsafe { &mut *( &mut self.storage as *mut State as *mut MaybeUninit<T> ) };
        let old = unsafe { storage.assume_init_read() };
        storage.write(val);
        old
    }
}

impl<T> UninitGuard<T, InitStorage<T>> {
    pub const fn new_init(val: T) -> Self {
        Self {
            storage: InitStorage(MaybeUninit::new(val)),
            _marker: PhantomData,
        }
    }
}

fn a() {
    let guard = UninitGuard::<String, _>::new();
    let tes = guard.assume_init();
}

fn main() {

}

It will cause compile time error if we call assume_init, get reference, get pointer when the memory is uninitialized yet

   Compiling playground v0.0.1 (/playground)
error[E0599]: Illegal access: Memory is not initialized yet!
   --> src/main.rs:115:21
    |
 19 | pub struct UninitGuard<T, State> {
    | -------------------------------- method `assume_init` not found for this struct because it doesn't satisfy `_: IsInitialized<UninitStorage<String>>`
...
115 |     let tes = guard.assume_init();
    |                     ^^^^^^^^^^^ Attempted to operate on data here, but the status is still `Uninit`
    |
note: trait bound `UninitGuard<String, UninitStorage<String>>: IsInitialized<UninitStorage<String>>` was not satisfied
   --> src/main.rs:68:11
    |
 66 | impl<T, State> UninitGuard<T, State> 
    |                ---------------------
 67 | where
 68 |     Self: IsInitialized<State>,
    |           ^^^^^^^^^^^^^^^^^^^^ unsatisfied trait bound introduced here
    = note: Call `.write(val)` on the UninitGuard first before accessing its references or pointers.
note: the trait `IsInitialized` must be implemented
   --> src/main.rs:29:1
    |
 29 | pub trait IsInitialized<State> {}
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

For more information about this error, try `rustc --explain E0599`.
error: could not compile `playground` (bin "playground") due to 1 previous error
Standard Output

What is the escape hatch that you guys spot in this code?

If this experiment is successful, it would be a valuable addition to the standard library. Because it prevents many UBs that are related to uninitialized memory

To be frank, this doesn’t help whatsoever. UninitGuard::write might as well just ignore the self parameter -- its contents get overwritten anyway -- in which case you take a T and return a fancy guard that just lets you use the inner data as a T. If you already have a T, you don’t need to do that.

IMO, the best safe equivalent of MaybeUninit<T> is Option<T>.

3 Likes

I do not understand what do you mean yet, may you elaborate more?

In this guard, the goal is to prevent calling assume_init if it is uninitialized, because it is UB. The compile time error will tell us to call .write first, so that it is initialized and safe to use

Same to the above, it also prevents taking reference and pointer, because reading or writing uninit memory is UB

The point of this guard is to make using uninitialized memory safer by preventing calling assume_init, reading, and writing to it while the memory is still uninit. Option<MaybeUninit<T>> does not prevent calling assume_init, or taking reference or pointer then reading or writing to uninit MaybeUninit

Option<T> does not has corelation because it is not uninitialized memory. The point of using uninit memory is to avoid the process of first init

After .write is called, the guard goal is success, no guard is needed because now the memory is initialized and can be used freely and safely

In every program where it is possible to call your version of write(), which moves the "storage" and returns a new value of a different type, it is possible, and simpler, to not use MaybeUninit at all.

MaybeUninit is used in situations where

These things cannot be implemented using your write().

5 Likes

Yeah, currently I'm adding support for array, struct, enum etc that is new write function to write partially

The must be same type thing I will try to experiment with trait or enum

This can be used for performance optimization if we do not want the extra CPU and memory cycles to assign default value in the normal way

Thank youu for the technical feedback

MaybeUninit<T> is a union (a tagless enum) between a dataless variant called uninit and a variant with T data called value.

Option<T> is an enum between a dataless variant called None and a variant with T data called Some.

In other words, (ignoring optimizations and focusing on semantics,) Option<T> is very similar to a (bool, MaybeUninit<T>) pair, where the boolean tag indicates whether the second field is initialized. The initialization is checked at runtime instead of compile time.

With typestate, you have two separate types, one which is dataless and one which has T data. This is already possible with () and T respectively, even if wrappers around MaybeUninit<T> could serve the same purpose.

Incremental initialization could be interesting.

2 Likes

For looking at prior art: checked incremental initialization of an array is basically a Vec<T> of some sort. See also the arrayvec crate. It even provides a method to get a [T; N] if the ArrayVec is fully initialized. That is, there is nothing to do in that case; an existing crate with hundreds of millions of downloads provides a solution.

Checked incremental initialization of an enum or struct is called the “builder pattern”. There are a variety of crates for builders… I dunno which one is most popular, but, say, bon has over 30 million downloads.

1 Like

By the way, some big-picture advice, if you want it: in this thread and your previous one, you’ve thought up a type which makes some existing unsafe type harder to misuse. That’s good, but the part that’s missing, in both cases, is looking at it from the opposite direction: how powerful the new type is relative to existing safe types. In order for a new idea in this area to be useful, it needs to be both harder to misuse than existing unsafe options and more powerful (in some way) than existing safe options.

15 Likes

I did some experiments, here are what I found:

The .write() can be made to take reference and return void not moving the self. And there will a new method named .init() that will init it, so the write does not always return new type. Only .init() will return new type and .write() reuse the returned type. But I can not find a way to make compile time error without making after init, it will return new type. Because the return different type is what make the compile time error occurs

I did find another way, this one allows to have same type. By using enum. The full code is like this :

use std::mem::MaybeUninit;
use std::marker::PhantomData;

pub struct Uninit<T>(MaybeUninit<T>);
impl<T> Uninit<T> {
    pub fn new() -> Self {
        Self(MaybeUninit::uninit())
    }

    pub fn zeroed() -> Self {
        Self(MaybeUninit::zeroed())
    }
    
    fn init(&mut self, val: T) -> Init<T> {
        self.0.write(val);
        unsafe {
          std::ptr::read(self as *mut Uninit<T> as *mut Init<T>)
        }
    }
}

pub struct Init<T>(MaybeUninit<T>);
impl<T> Init<T> {
    pub fn ptr(&self) -> *const T {
        self.0.as_ptr()
    }

    pub fn mut_ptr(&mut self) -> *mut T {
        self.0.as_mut_ptr()
    }

    pub fn reff(&self) -> &T {
        unsafe { self.0.assume_init_ref() }
    }

    pub fn mut_ref(&mut self) -> &mut T {
        unsafe { self.0.assume_init_mut() }
    }

    pub fn assume_init(self) -> T {
        let mut this = std::mem::ManuallyDrop::new(self);
        unsafe { this.0.assume_init_read() }
    }

    pub fn replace(&mut self, val: T) -> T {
        let old = unsafe { self.0.assume_init_read() };
        self.0.write(val);
        old
    }
}

impl<T> Drop for Init<T> {
    fn drop(&mut self) {
        unsafe {
            self.0.assume_init_drop();
        }
    }
}

enum Guard<T> {
  Uninit(Uninit<T>),
  Init(Init<T>)
}

pub struct UninitGuard<T> {
  inner: Guard<T>
}

impl<T> UninitGuard<T> {
    pub fn new() -> Self {
        Self {
          inner: Guard::Uninit(Uninit::new())
        }
    }
    
    pub fn init(&mut self, val: T) {
        if let &mut Guard::Uninit(ref mut inner) = &mut self.inner {
          let new = inner.init(val);
          self.inner = Guard::Init(new);
        }
    }
    
    pub fn initialized_scope<U>(&mut self, closure: U) -> Result<(), &'static str>
    where U: FnOnce(&mut Init<T>)
    {
      match &mut self.inner {
          Guard::Init(inner) => {
              closure(inner);
              Ok(())
          },
          Guard::Uninit(_) => Err("not initialized, call init() first"),
      }
    }
}

But there is also cons. There is branching each time .initialized_scope() is called. If we can do multiple writes inside that scope, it oncly costs 1 branching for n writes. But if we can not do that, it costs n branchings that may negates the benefit of maybeuninit. Unsafe method with unreachable!() makro can be added to remove branching after it is guaranted that .init() os already called, but I can not find a way to make it only available for the enum variant Guard::Init(), so variant Guard::Uinit() can call it and cause UB

I also can not support array without creating new different type, because how many index has been written need to be saved somewhere that will be updated when calling .write_partial(). Trait only supports const not let so it can not be updated at runtime

Which one is better?

  • compile time error, different type after init, write just reuse the type
  • runtime check via branchings

And

  • save written len inside the same struct using Option
  • create different type for type that has len eg array

Yeah, I though enum will zeroed memory earlier :[

What I learned so far:

  • enum does not zeroed memory
  • heap types does not zeroed memory, stack array does
  • tuple depends on the type, if it is heap it does not zeroed, if it is stack array it does

Yeah, option uses branching aka runtime check. Plus option can not be initialized partially

This approach have no branching, but it needs 2 types because it is what makes the compiler can catch the mistake at compile time

I think arrayvec is more high level

My goal is a lossless safer abstraction of maybeuninit. So that it can still be used as building block to create data structure or low level programming

But now I'm confused to choose compile time error, zero branching, but 2 different type because it is what causes the compile time error so no branching is needed. Or 1 type but using runtime branching

1 type with optional written len for array. Or dedicated type for array with name UninitArray (MaybeUninit<[ array ]>). That will have method to get slice from it as long as it is < the written len, no need to wait it to be fully written

Yeah, I'm still thinking about this :]. The last one was about aliasing UB, refined from previous feedback. Could you recheck it?

For now, I'm also trying to catch compile time errors for other UB sources, like std::ptr::copy_nonoverlapping aka memcpy, std::mem::transmute, etc just to collect them first. If that goes well, I'll introduce it

What are some other sources of UB in unsafe Rust? I want to experiment with more common UB mistakes, but I'm not sure what else is out there

You’re approaching this problem too abstractly. Both of the possibilities you have named are usually already possible in safer ways (static checks with ordinary functions returning values of different types, and dynamic checks with Option). MaybeUninit is used in cases where neither of those approaches apply.

You cannot design a better MaybeUninit by just thinking about what additional checks would make it safer to use. What you need to do is:

  1. Find specific existing code that uses MaybeUninit.
  2. Think about how that code uses MaybeUninit and what it actually needs.
  3. Design something that is safer and make sure that code can be rewritten to use it.
  4. Apply this to other code too, in order to show that it is useful in more than one situation.

Validate your design by showing that it can improve existing real-world unsafe code.

This is what ArrayVec does. You don’t need to write this type because it already exists.

6 Likes

Static checks with ordinary functions returning values of different types, and dynamic checks with Option. That is not true without we build that abstraction for MaybeUninit first, because different types + MaybeUninit or Option + MaybeUninit doesn't magically prevent MaybeUninit footguns without building the right abstraction. That what I am trying to do, building safer abstraction for MaybeUninit that can be reused intead of keep building the same abstraction

I will try to build simple data structure using this custom MaybeUninit

For the ArrayVec after reading the code yeah it looks it already has what is needed for MaybeUninit + array. So I will cancel the array support because there is already one

Yes, I am not suggesting that you should put MaybeUninit inside an Option; I am saying that Option<T> is the replacement for MaybeUninit<T> for cases where a single runtime flag is sufficient.

In general, things that improve on MaybeUninit don't mention MaybeUninit or have "uninit" on the name, because MaybeUninit is the last resort.

5 Likes

... and, in order to replace the unsafe abstraction, it needs to be not less powerful and not less performant.

(But there is a value with it even if does not completely replace the unsafe case, provided it is more powerful than existing safe options as you said).