Less Unsafe


#1

I’m starting this topic to discuss ways to make unsafe code less “unsafe”. The basic idea is to introduce semi-safe abstractions to:

  1. Create “guard rails” that limit unsafety.
  2. Make unsafe code more ergonomic. The harder it is to write and understand code, the harder it is to get right.

Is this something people are interested in?


My first proposal would be an Uninit<T> wrapper to make dealing with uninitialized memory less unsafe:

#![feature(untagged_unions)]

use std::{mem, ptr};

// Assuming the correct layout... [repr(transparent)]?
pub union Uninit<T: Sized> {
    inited: T,
    uninited: (),
}

impl<T: Sized> Uninit<T> {

    /// Create a new uninitialized slot.
    pub fn new() -> Self {
        Uninit {
            uninited: (),
        }
    }

    /// Initialize this uninitialized slot. Returns a mutable reference to value.
    ///
    /// # Safety:
    ///
    /// Calling this is always safe. However, if the `Uninit` has already been initialized,
    /// it will forget the existing value without dropping it. 
    pub fn initialize(&mut self, value: T) -> &mut T {
        unsafe {
            self.inited = value;
            &mut self.inited
        }
    }

    /// Get a constant pointer to the contents of this slot.
    ///
    /// # Safety:
    ///
    /// 1. While safe, the returned pointer may point into uninitialized memory.
    /// 2. You *must not* create any mutable references from this pointer.
    /// 3. You *must not* mutably dereference this pointer or any aliases thereof
    ///     while referencing (mutably or immutably) this slot.
    /// 4. You *must not* dereference or create any references from this pointer or any
    ///     aliases thereof while mutably referencing this slot.
    pub fn as_ptr(&self) -> *const T {
        unsafe { &self.inited as *const T }
    }

    /// Get a mutable pointer to the contents of this slot.
    ///
    /// # Safety:
    ///
    /// 1. While safe, the returned pointer may point into uninitialized memory.
    /// 2. You *must not* mutably dereference or create any mutable references from 
    ///     this pointer or any aliases thereof while referencing (mutably or immutably)
    ///     this slot.
    /// 3. You *must not* dereference or create any references from this pointer or any
    ///     aliases thereof while mutably referencing this slot.
    pub fn as_mut_ptr(&mut self) -> *mut T {
        unsafe { &mut self.inited as *mut T }
    }


    /// Create an uninitialized reference from an existing mutable pointer.
    ///
    /// # Unsafe:
    ///
    /// 1. `*mut T` must be properly aligned, allocated, etc.
    /// 2. You *must not* dereference `ptr` (or any alias thereof) for the duration of `'a`.
    /// 3. You must not create any references (`&`/`&mut`) that alias `ptr` for the duration
    ///     of `'a` (i.e., no transmuting `ptr` to get around rule 2).
    pub unsafe fn from_raw<'a>(ptr: *mut T) -> &'a mut Uninit<T> {
        &mut *(ptr as *mut Uninit<T>)
    }

    /// Drop the contents of this slot.
    ///
    /// # Unsafe:
    ///
    /// This method is unsafe because the value stored in this slot may not be initialized.
    pub unsafe fn deinitialize(&mut self)  {
        ptr::drop_in_place(&mut self.inited);
    }

    /// Get a shared reference to the contents of this slot.
    ///
    /// # Unsafe:
    ///
    /// This method is unsafe because the value stored in this slot may not be initialized.
    pub unsafe fn get_unchecked(&self) -> &T {
        &self.inited
    }

    /// Get a mutable reference to the contents of this slot.
    ///
    /// # Unsafe:
    ///
    /// This method is unsafe because the value stored in this slot may not be initialized.
    pub unsafe fn get_mut_unchecked(&mut self) -> &mut T {
        &mut self.inited
    }

    /// Extract the (initialized) value stored in this uninitialized slot.
    ///
    /// # Unsafe:
    ///
    /// This method is unsafe because the value stored in this slot may not be initialized.
    pub unsafe fn unwrap_unchecked(self) -> T {
        self.inited
    }
}

impl<T: Sized> Vec<Uninit<T>> {
    pub unsafe fn assume_initialized_unchecked(self) -> Vec<T> {
        mem::transmute(self)
    }
}

Changelog:

  • Added uninited variant.
  • Added Sized bound.
  • Added as_mut_ptr and as_ptr.
  • Removed from_raw/from_raw_mut distinction. &Uninit is totally useless.
  • Added poorly-worded, sloppy, incomplete documentation.
  • Change assert_initialized to assume_initialized_unchecked based on @naicode’s comment.

Next steps for the unsafe code guidelines
#2

Bringing uninitialised types into the type system is something I’ve been mulling over recently. Right now all we have is an ugly hack in mem::uninitialized(), which is very easy to shoot yourself in the foot with. I’d much prefer tools that don’t require a candle and a prayer, and I’d say this fits the bill. :slight_smile:

A couple of thoughts:

  • I think ! fits better than () here, and we should probably limit this to Sized types too:
union Uninitialized<T: Sized> {
    inited: T,
    uninited: !,
}
  • How would this work with Drop types? Or would we just not support them at first/at all (by limiting to Copy types)?

#3

I think ! fits better than () here

To me, that means that the uninited state is impossible which kind of defeats the point. As-is, the union is always technically initialized to some valid state (either () or T). This even allows us to avoid calling mem::uninitialized().

should probably limit this to Sized types too

Definitely.

How would this work with Drop types? Or would we just not support them at first/at all (by limiting to Copy types)?

Drop types are fine; not dropping is generally considered a logic bug, not a safety issue. Not supporting Drop types would severely limit the usefulness of this feature.


#4

How would this work with Drop types? Or would we just not support them at first/at all (by limiting to Copy types)?

I think the ManuallyDrop rules are probably sufficient at least in the short term.


#5

Uninitialized memory comes up a lot when wrapping a C API that has out-parameters. Take for example read. You give it a buffer and a size, and it fills in up to that many bytes of the buffer and tells you how many it actually wrote. Right now the most straightforward way to work with that involves Vec and therefore heap allocation:

unsafe {
    let mut buf: Vec<u8> = Vec::with_capacity(1024)
    let count = libc::read(fd, buf.as_mut_ptr() as *mut c_void, buf.capacity());
    if count < 0 {
        Err(io::error::last_os_error())
    } else {
        buf.set_len(count);
        buf.shrink_to_fit(); // optional
        Ok(buf)
    }
}

The stdlib doesn’t actually bother with this; Read::read takes a &mut [u8] from the caller, so you’re working with initialized memory.

Another similar example is pthread_sigmask; here the output buffer is always completely filled in, so you can do

unsafe {
    let oldmask: libc::sigset_t = mem::uninitialized();
    if libc::pthread_sigmask(how, 
            &mask as *const libc::sigset_t,
            &mut oldmask as *mut libc::sigset_t) {
        Err(std::io::last_os_err())
    } else {
        Ok(oldmask)
    }
}

How would you handle these with Uninit<T>? I particularly care about the read case, because I’d really like to be able to write code like this:

let mut buf: Buffer::new(1024); // stack allocation of 1024 uninitialized bytes
while file.read(&mut buf)? > 0 {
    // in here `buf` acts like a slice with length
    // equal to the value `read` returned
}

#6

IIRC, the standard library specializes where possible to avoid initializing memory (e.g., when you call reader.read_to_end(...)) but that’s not really a good solution.

How would you handle these with Uninit<T>

For the read case, I wouldn’t. take a look at ArrayVec. However, for nice stack allocation, we’ll need either some alloca API or a way to parametrize types over integers.

For the pthread case, you could write:

let uninit_mask: Uninit<libc::pthread_signask> = Uninit::new();
sigempty(uninit_mask.as_mut_ptr());
let mask = uninit_mask.assert_initialized();

However, I’m not really convinced it’s worth it in this case.


#7

I’m definitely :thumbsup:on this idea. I think such abstractons are very useful, and I also think that projects ought to structure themselves in this way to the extent possible. The garbage recollection stuff in Crossbeam is another example – it does have some unsafe bits, but they are clearly identified and it handles a lot of things for you. In the standard library, things like RawVec or Unique play a similar role.

I didn’t give a detailed look over your code, but one thing I think would be really important in such a project is clearly calling out the “proof obligations” for each unsafe fn etc (i.e., precisely what must the caller ensure to make things safe?).


#8

@zackw

After thinking about this, I think we may need a buffer abstraction with support for uninitialized bytes. However, I’m not really sure what it should look like.


@nikomatsakis

one thing I think would be really important in such a project is clearly calling out the “proof obligations”

That’s a very good point. I’ve added some to the OP but getting them both correct and concise is going to take some work.


#9

I often have the feeling that writing unsafe rust code can sometimes be more tricky then writing C code. There are often many nit-bity bits you do not necessary have to get right to make your tests run, but which are still not unimportant. That’s why I’m totaly a fan of your proposal and already existing parts like Unique. I think we definitely should have more of them.

Btw. while I know that the proposed Uninit is just a scratch. There is one functione name which bugs me a lot: assert_initialized. I think something like assuming_initialized or convert_assuming_initialized would be much better. The problem I have with assert is that assert statements normally check implicit assumptions, while assert_initialized does a operation assuming something without any check…