I didn’t get to the most interesting parts of rayon yet, which would be rayon-core, but I did some investigation into the parallel iterators. These are mostly safe code (yay safe abstractions!) but there is a bit of unsafe code dealing with uninitialized memory. Specifically, when constructing a vector, we will initialize it to the desired length and then fill in parts of it in parallel (with some assertions to make sure that everything gets written):
The collect() code.
The main interface boundary here is the following (safe) function:
pub fn collect_into<PAR_ITER, T>(mut pi: PAR_ITER, v: &mut Vec<T>)
where PAR_ITER: ExactParallelIterator<Item = T>,
T: Send
This will:
- resize
vto reservepi.len()slots (but truncate to 0, so they are uninitialized) - allow the parallel iterator to drive the code:
- parallel iterator can either
split(into two consumers) orinto_fold(into a reducer)
- parallel iterator can either
- Consumers hold an
&mut [ITEM]– contents uninitialized!- each split uses
split_at_mut
- each split uses
- Reducers convert that into a
IterMut<'c, ITEM>– again, contents uninitialized!- each item that is pushed uses
ptr::write - if, when reducing is done, the iterator is not drained, we abort
- each item that is pushed uses
- At the very end, we will call
set_lento adjust the length of the vector - In general we have a “trust-but-verify” approach of the parallel iterator:
- it tells us how many items it will write, but we count them ourselves
- if at some point we get too many, we will panic
- this may leak the values that were written thus far, since
set_lenis never called
Historically:
- we used to use *mut everywhere
- we rewrote it to use slices (split_at_mut) and iterators
- because it was much cleaner
So, to sum up:
- lots of
&mut Tand alsoslice::IterMut<T>where the referent is not initialized - potentially leaking data (if something goes awry)