Disclaimer: This post has nothing to do with previous post mentioning MaybeInvalid.
Unsafe code guidelines contains a deliberate UB section mentioning the SeqLock issue, i.e. SeqLock algorithm isn’t compatible with Rust memory model. In fact, SeqLock relies on a racy read, which is known to be valid only after a subsequent check.
The document mentions two possible solutions:
- (a) adopt LLVM's handling of memory races (then the problematic read would merely return undef instead of UB due to a data race)
- (b) add bytewise atomic memcpy and using that instead of the non-atomic volatile load.
There is currently a RFC opened about solution (b). I would like to explore a path closer to solution (a).
It would be materialized by the following types/functions:
// in core::mem
#[lang = "maybe_invalid"]
#[derive(Copy)]
#[repr(transparent)]
pub union MaybeInvalid<T> {
invalid: (),
value: ManuallyDrop<T>,
}
impl<T> MaybeInvalid<T> {
pub fn assume_valid(self) -> T { /* .. */ }
}
// in core::ptr
pub unsafe fn read_maybe_invalid<T>(ptr: *const T) -> MaybeInvalid<T> { /* .. */ }
Concretely, it would mean to defer the UB of a racy read to MaybeInvalid::assume_valid call. SeqLock implementation would then become:
pub struct SeqLock<T> {
seq: AtomicUsize,
data: UnsafeCell<T>,
}
unsafe impl<T: Copy + Send> Sync for SeqLock<T> {}
impl<T> SeqLock<T> {
/// Safety: Only call from one thread.
pub unsafe fn write(&self, value: T) {
self.seq.fetch_add(1, Relaxed);
fence(Release);
unsafe { ptr::write(self.data.get(), value) }
self.seq.fetch_add(1, Release);
}
pub fn read(&self) -> T {
loop {
let s1 = self.seq.load(Acquire);
let data = unsafe { ptr::read_maybe_invalid(self.data.get()) };
fence(Acquire);
let s2 = self.seq.load(Relaxed);
if s1 & 1 == 0 && s1 == s2 {
return unsafe { data.assume_valid() };
}
}
}
}
For SeqLock, MaybeInvalid::assume_valid would be called after ensuring there was no data race. Compared to RFC 3301, I see the following advantages:
- The API is simpler: one type, one method, one function.
- It doesn't reuse
MaybeUninit, as initialization is not the issue here, avoiding confusion. - Writes remain non-racing plain writes, so
assume_validcannot return torn values. This is in my opinion the biggest advantage, as it removes an entire class of problems (for example theDropissue of RFC 3301). - The semantic is closer of what SeqLock algorithm is built on: a read which may be invalid, and is assumed valid after an atomic check.
- This model is de facto already supported by LLVM, so there would be no change on this side.
Of course, the drawbacks are significant:
- It requires to modify the Rust memory model, which is, I assume, quite a blocker by itself.
- Modifying Rust memory model might impact interoperability with C/C++, as Rust memory model is inherited from C++.
- (I've realized it while writing SeqLock implementation) racy reads, but also plain writes should interact with fences as atomics do; the consequences of including plain writes here may be bigger than I would have expected.
I'm not an expert in the domain, so I may be completely off the mark here. Moreover, the fence issue made me reconsider my will of having MaybeInvalid compared to RFC 3301. But this post has the merit of discussion option (a) of unsafe code guidelines.
