Interior mutability and safety of ownership transfer in Rust

I am currently working on a concurrent hash map design that uses RCU-like synchronization to permit non-blocking readers with blocking writers. The underlying data structure uses a single-allocation buffer that stores both metadata and KV-pairs inline, similar to the layout used by hashbrown. In particular, there is no additional level of pointer indirection for entries.

When the buffer needs to grow, the following sequence occurs:

  1. lock a mutex (to exclude other writers)
  2. allocate a new buffer
  3. rehash and migrate all KV-pairs to the new buffer (using ptr::read and ptr::write)
  4. atomically swap the pointer to the active buffer
  5. perform an RCU synchronization so that all threads that could have observed the old buffer are quiescent
  6. reclaim the memory of the old buffer (without dropping the KV-pairs)
  7. unlock the mutex

Semantically, ownership of the KV-pairs transfers from the old buffer to the new buffer at step 4. However, between steps 4 and 5 it becomes possible for readers to observe the same KV-pair in two different buffers simultaneously. New readers will access the new buffer, while readers that started earlier may still access the old buffer. Both buffers therefore contain the same values at different memory locations, although logically the ownership has already moved to the new buffer.

My concern is that, without additional constraints on the key and value types, this should be unsound under Rust’s ownership and aliasing model. One possible solution would be to require: K: Copy, V: Copy. This would avoid the issue entirely, but it is far too restrictive and would make the container unusable for many practical use cases.

Another option would be to require: K: Clone, V: Clone and explicitly clone each KV-pair during migration. This should be sound, but it would also be inefficient in many cases. For example, there is no obvious reason that a Vec<T> should need to be cloned just to migrate it between buffers.

It seems to me that the root cause of the problem is interior mutability. Rust does not appear to have a way to express true “read-only” types. For example, doing a shallow copy of a Vec<T> and performing an “atomic” ownership switch between buffers seems intuitively sound to me, although it may not be according to Rust’s ownership rules. Similarly, a shallow copy of an Arc<T> seems acceptable as well because the interior mutability (i.e., the reference counters) is contained behind an additional pointer indirection.

However, the design clearly breaks for something like:

ConcurrentHashMap<Arc<str>, AtomicUsize>

Here, a "reader" may perform an atomic store or update through a shared KV-pair reference while the writer thread concurrently migrates the table. Since migration involves reading the old memory while the atomic is being modified, this would introduce a data race. In other words, the problematic cases seem to arise when the value type contains interior mutability without an additional level of indirection.

So my main question is:

Is my reasoning correct, or am I missing something fundamental here?

More specifically:

  • Is the unsoundness caused by the fact that a !Copy value temporarily exists at two distinct memory locations while shared references to both locations may exist?
  • Or is the real issue solely the presence of interior mutability that allows readers to modify memory concurrently with the migration process?

More generally, this design seems sound to me in languages that support true constness (e.g., C or Zig), where a reader would be guaranteed not to mutate the underlying memory. From what I can tell, this particular pattern appears to become problematic only in Rust.

So I just stumbled over the unstable Freeze trait. This seems like a perfect trait bound for the K and V types in order to prevent data races for types with interior mutability (e.g. HashMap<Arc<str>, AtomicUsize>) during buffer migration.

Personally, I think a language level &const T would be even better - or at least more flexible - but for now this seems suitable to me.

That said, the question remains if this idea of mine is sound within the constraints of Rust's ownership model in general. For example, is it safe to have an old Buffer A containing a "non-owning" Vec<T> accessed via shared reference by some threads while simultaneously a new Buffer B containing a bit-wise copy of the same Vec<T> and logical ownership of it is accessed via shared references by other threads.

I think this is tangentially related, but I have wanted a way to give out "read halves" of, say, an RwLock<T> that can only lock for reading rather than being forced to give full access to anything needing any access. The ability to give out such limited accessors to shared state would be one way to do this.

However, I also question the sensibility of putting atomics or other mutable shared state inside of a ConcurrentMap; if you want to write, get write access.

1 Like

If "non-owning" includes &mut Vec<T>, then no. What if the Vec<T> is at capacity and both copies call push? Then you get a double free :boom:. More generally &mut means exclusive, so &mut v1[0] and &mut v2[0] is UB on its own.

If by "non-owning" you mean &Vec<T>, you will need better languish lawyers then me.

No, this is not what I mean. In my case there are two bitwise-equal copies of the same Vec (pointer, len and capacity are all equal), but one is essentially a "non-owning" ManuallyDrop<Box<Vec<T>>> and the other in an "owning" Box<Vec<T>>. This is quite typical in unsafe low-level data structures dealing with manually allocated memory*, but it is usually not possible for both instances to be observable externally.

(*) temporarily, inside a single function

Usually one requires exclusive access with &mut _ (e.g. any of Vec<_>'s potentially reallocating functions) which avoids the the question. I don't grok your approach well enough to comment much on it.

That is correct. There is no way to require a lack of interior mutability, or other related functionality,[1] with a bound.

No, Freeze does not guarantee there is no interior mutability in the type. It only guarantees there is no shallow immutability in the inline values. There can still be interior mutability behind indirection. For example, Box<RefCell<_>>: Freeze.

TBF, a lot of people have the same misconception. The documentation should put this clarification in flashing red lights.

internally, but not through an indirection


  1. using globals, using thread locals, using the FS to store state, ... ↩︎

1 Like

If i understand correctly this is enough here, as any data behind an indirection won't be copied by ptr::read/write and so won't be duplicated during the reallocation.

Yes, to me that seems to be the case. "Shallow" interior mutability is a problem that can lead to data races, but interior mutability behind an indirection layer should be fine if the duplication of values is in itself not unsound.

But it seems my description was not as clear as I had thought it was, so I'll try to explain it a bit better (hopefully).

So, a simplified version of the buffer type could look like this:

struct Buffer<K, V> {
    len: usize,
    pairs: [(K, V)], // Note: This field makes Buffer a DST!
}

A pseudo-code function for growing a buffer would roughly look like this:

struct HashMap<K, V> {
    buf: AtomicPtr<Buffer<K, V>>,
    rcu: Rcu,
}

impl<K, V> HashMap<K, V>
where
    K: Hash + Eq + Freeze,
    V: Freeze,
{
    unsafe fn grow_buffer(&self) {
        let old: &Buffer<K, V> = &*self.buf.load(Ordering::Acquire);
        let new: *mut Buffer<K, V> = Buffer::alloc_buffer(old.len * 2);

        // semantically "move" all (K, V) from old to new
        for i in 0..old.len {
            let src: *const (K, V) = &old.pairs[i] as _;
            // get  a raw pointer to the (K,V) pair through a raw pointer
            let dst: *mut (K, V) = &raw mut (*new.pairs[i]);
            // (K, V) exists in old and new at the same time!
            // readers can still access `old` and get shared references
            // into it.
            dst.write(src.read());
        }

        // after this swap *new* readers will see the same values that exist
        // in `old` but at a different memory location (only shared references)
        let _ = self.buf.swap(new, Ordering::Release);
        // This will block until no reader that started before the atomic swap is
        // accessing `old` any more, i.e., there can no longer be *any* shared
        // references into `old`.
        self.rcu.synchronize();
        // ... since there are no longer any references, `old` can be freed.
        // this call frees the buffer's memory without dropping any `(K, V)`.
        Buffer::<K, V>::free(old as *mut Buffer<K, V>);
    }
}

I hope this pseudo code is maybe a little easier to follow than my initial description. I doubt this would compile but it's close enough to what the real code would look like.

Of course, the actual code would be significantly more complex with lots of UnsafeCells and MaybeUninits sprinkled throughout.

Note that calling Clone::clone() requires providing it a reference of type &Self, so, if K and V are caller-chosen types, this is equivalent to handing out references to the caller (or to a caller-provided function, at least). I am not familiar enough with your use case to say whether this is an unsoundness, but it's something to keep in mind whenever you consider using a Clone bound with unsafe code.