I didn’t get to the most interesting parts of rayon yet, which would be rayon-core
, but I did some investigation into the parallel iterators. These are mostly safe code (yay safe abstractions!) but there is a bit of unsafe code dealing with uninitialized memory. Specifically, when constructing a vector, we will initialize it to the desired length and then fill in parts of it in parallel (with some assertions to make sure that everything gets written):
The collect()
code.
The main interface boundary here is the following (safe) function:
pub fn collect_into<PAR_ITER, T>(mut pi: PAR_ITER, v: &mut Vec<T>)
where PAR_ITER: ExactParallelIterator<Item = T>,
T: Send
This will:
- resize
v
to reservepi.len()
slots (but truncate to 0, so they are uninitialized) - allow the parallel iterator to drive the code:
- parallel iterator can either
split
(into two consumers) orinto_fold
(into a reducer)
- parallel iterator can either
- Consumers hold an
&mut [ITEM]
– contents uninitialized!- each split uses
split_at_mut
- each split uses
- Reducers convert that into a
IterMut<'c, ITEM>
– again, contents uninitialized!- each item that is pushed uses
ptr::write
- if, when reducing is done, the iterator is not drained, we abort
- each item that is pushed uses
- At the very end, we will call
set_len
to adjust the length of the vector - In general we have a “trust-but-verify” approach of the parallel iterator:
- it tells us how many items it will write, but we count them ourselves
- if at some point we get too many, we will panic
- this may leak the values that were written thus far, since
set_len
is never called
Historically:
- we used to use *mut
everywhere
- we rewrote it to use slices (split_at_mut
) and iterators
- because it was much cleaner
So, to sum up:
- lots of
&mut T
and alsoslice::IterMut<T>
where the referent is not initialized - potentially leaking data (if something goes awry)