Canvas unsafe code in the wild

I didn’t get to the most interesting parts of rayon yet, which would be rayon-core, but I did some investigation into the parallel iterators. These are mostly safe code (yay safe abstractions!) but there is a bit of unsafe code dealing with uninitialized memory. Specifically, when constructing a vector, we will initialize it to the desired length and then fill in parts of it in parallel (with some assertions to make sure that everything gets written):

The collect() code.

The main interface boundary here is the following (safe) function:

pub fn collect_into<PAR_ITER, T>(mut pi: PAR_ITER, v: &mut Vec<T>)
    where PAR_ITER: ExactParallelIterator<Item = T>,
          T: Send

This will:

  • resize v to reserve pi.len() slots (but truncate to 0, so they are uninitialized)
  • allow the parallel iterator to drive the code:
    • parallel iterator can either split (into two consumers) or into_fold (into a reducer)
  • Consumers hold an &mut [ITEM]contents uninitialized!
    • each split uses split_at_mut
  • Reducers convert that into a IterMut<'c, ITEM>again, contents uninitialized!
    • each item that is pushed uses ptr::write
    • if, when reducing is done, the iterator is not drained, we abort
  • At the very end, we will call set_len to adjust the length of the vector
  • In general we have a “trust-but-verify” approach of the parallel iterator:
    • it tells us how many items it will write, but we count them ourselves
    • if at some point we get too many, we will panic
    • this may leak the values that were written thus far, since set_len is never called

Historically: - we used to use *mut everywhere - we rewrote it to use slices (split_at_mut) and iterators - because it was much cleaner

So, to sum up:

  • lots of &mut T and also slice::IterMut<T> where the referent is not initialized
  • potentially leaking data (if something goes awry)
2 Likes