Rc and internal mutability

-- @CAD97, @SimonSapin, @RalfJung rust-lang/pull#6251 (discussion)

-- @RalfJung rust-lang/rust-clippy#4774 (comment)

struct RcBox<T: ?Sized> {
    strong: Cell<usize>,
    weak: Cell<usize>,
    value: T,
}

pub struct Rc<T: ?Sized> {
    ptr: NonNull<RcBox<T>>,
    phantom: PhantomData<RcBox<T>>,
}
impl<T: ?Sized> Rc<T> {
    pub fn into_raw(this: Self) -> *const T {
        let ptr: *const T = &*this;
        mem::forget(this);
        ptr
    }

    pub unsafe fn from_raw(ptr: *const T) -> Self {
        let offset = data_offset(ptr);

        // Reverse the offset to find the original RcBox.
        let fake_ptr = ptr as *mut RcBox<T>;
        let rc_ptr = set_data_ptr(fake_ptr, (ptr as *mut u8).offset(-offset));

        Self::from_ptr(rc_ptr)
    }

    #[inline]
    pub fn into_raw_non_null(this: Self) -> NonNull<T> {
        // safe because Rc guarantees its pointer is non-null
        unsafe { NonNull::new_unchecked(Rc::into_raw(this) as *mut _) }
    }

-- liballoc/rc.rs (8960acf)

  • (A)Rc are shared pointers. That part is obvious.
  • (A)Rc expose mutability when they are runtime-checked unique. That part is stable (get_mut).
  • (A)Rc expose shared mutability when unsafe-guaranteed no other pointers are "used" during the mutation. That part is unstable (get_mut_unchecked).
  • (A)Rc are represented as ptr::NonNull<{ strong: UnsafeCell<usize>, weak: UnsafeCell<usize>, value: T }>.

There are three potential solutions to the immediate issue I see here (that Rc::from_raw(Rc::into_raw(_)) gives "immutable/shared provenance" to the RcBox pointer, making get_mut unsound):

  • The aliasing model is wrong, liballoc is correct.
  • Fix the implementation of into_raw to not use shared references. Unfortunately, I think this would require &raw or some other way of offsetting pointers without manifesting a reference.
  • Admit (A)Rc is an internal mutability type and store the value as UnsafeCell<T> in RcBox/ArcInner. (We really should unify those naming schemes.)

I think the third solution is the best here. The correctness of (A)Rc::get_mut(_unchecked) is really subtle without the use of UnsafeCell and interacts with minutia of the not-yet-specified aliasing model. If we do use UnsafeCell, it's trivial what the requirements of these functions are, rather than being scattered around the module to never accidentally change the provenance of the pointer to the inner blob.

1 Like

I don't think that helps -- assuming the implementation of into_raw stays as it. It goes through deref, which returns &T, which is read-only.

So you'll need to adjust into_raw to not go through deref. And if you do that, I think there is no reason to add an UnsafeCell.

Like just using ptr::offset by the size_of the header value?

And also aligning the result of the offset. This relies on the layout of RcBox/ArcInner.

Something like this should work.

let ptr = inner_ptr.add(size_of::<usize>());
let align_offset = ptr.align_offset(align_of::<T>());
let ptr: *const u8 = ptr.add(align_offset);

Would using the offset from the return value of core::alloc::Layout::extend be less error-prone?

Arc and Rc already manually offset their pointers as part of their from_raw implementations and as part of this calculate the offset of the data from the RcBox itself. Since liballoc can control the exact layout of the inner RcBox that is not a problem. What's stopping it from using the same logic in reverse (or rather forward in memory) to perform the initial offsetting?

// For Rc:
pub fn into_raw(this: Self) -> *const T {
    // Re-use the existing method of calculating the offset.
    // The deref is only temporary and does not create the final ptr.
    let offset = data_offset(&*this);
    let raw = NonNull::into_raw(this.ptr);
    unsafe {
        (raw as *const u8).offset(offset) as *const T
    }
}

Or am I missing something here? (Except that the code will be slightly more complicated for offsetting the pointer in the case of T: !Sized).

1 Like

Yes, I didn't realize that Layout had that! Cool.

Specifically, here are the computations that are used for offsetting:

unsafe fn data_offset<T: ?Sized>(ptr: *const T) -> isize {
    // Align the unsized value to the end of the `RcBox`.
    // Because it is ?Sized, it will always be the last field in memory.
    data_offset_align(align_of_val(&*ptr))
}

/// Computes the offset of the data field within `RcBox`.
///
/// Unlike [`data_offset`], this doesn't need the pointer, but it works only on `T: Sized`.
fn data_offset_sized<T>() -> isize {
    data_offset_align(align_of::<T>())
}

#[inline]
fn data_offset_align(align: usize) -> isize {
    let layout = Layout::new::<RcBox<()>>();
    (layout.size() + layout.padding_needed_for(align)) as isize
}

Current from_raw:

    pub unsafe fn from_raw(ptr: *const T) -> Self {
        let offset = data_offset(ptr);

        // Reverse the offset to find the original RcBox.
        let fake_ptr = ptr as *mut RcBox<T>;
        let rc_ptr = set_data_ptr(fake_ptr, (ptr as *mut u8).offset(-offset));

        Self::from_ptr(rc_ptr)
    }

I agree with @HeroicKatora here and I think we can just re-implement into_raw to use a manual offset as well, implemented something like

    pub fn into_raw(this: Self) -> *const T {
        let ptr: *mut RcBox<T> = NonNull::as_ptr(this.ptr);
        mem::forget(this);

        let offset = data_offset(&*ptr.value);
        (ptr as *mut u8).offset(offset) as *const u8 as *const T
    }

Has the idea of changing Stacked Borrows so that creating a reference only has an effect when it is first used (and thus never does if converted to a raw pointer immediately) been decided against?

I haven't heard of any suggestions or explicitly rejected alternatives along these lines. I would guess that's because you can't know when a reference is "used" without analyzing all transitively called functions, and making all reference optimizations depend on that would be bad. And I don't see how that would help with the issues in this thread anyway.

(at least, I assume by "used" you mean something more interesting than just passing the reference to another function, which is already a semantically significant operation in Stacked Borrows)

2 Likes

Does this mean that my (A)RcBorrow::upgrade is unsound? It's a not too uncommon pattern to allow upgrading from &T where the T is statically known to be behind an (A)Rc to an (A)Rc<T>, and that's what that library is providing a type for.

The implementation currently uses

pub struct $RcBorrow<'a, T: ?Sized> {
    raw: ptr::NonNull<T>,
    marker: PhantomData<&'a $Rc<T>>
}

impl<'a, T: ?Sized> From<&'a $Rc<T>> for $RcBorrow<'a, T> {
     fn from(v: &'a $Rc<T>) -> $RcBorrow<'a, T> {
        $RcBorrow {
            raw: (&**v).into(),
            marker: PhantomData,
        }
    }
}

impl<'a, T: ?Sized> $RcBorrow<'a, T> {
    /// Convert this borrowed pointer into an owned pointer.
    pub fn upgrade(this: Self) -> $Rc<T> {
        unsafe { $Rc::clone(&ManuallyDrop::new($Rc::from_raw(this.raw.as_ptr()))) }
    }

    /// Convert this borrowed pointer into a standard reference.
    pub fn downgrade(this: Self) -> &'a T {
        unsafe { &*this.raw.as_ptr() }
    }
}

As I understand it, we've determined that this way of implementing into_raw is incorrect when using the current Stacked Borrows rules, so my "as_raw" implementation would also be unsound to turn into a &$Rc<T> to clone, as it's lost write provenance over the location of T. (And a unique $Rc<T> can be used to write.)

If I understand correctly, the way to make this sound would be to

  • Provide a $Rc::<T>::as_raw(&self) -> *const T function that uses pointer manipulation instead of Deref,
  • Use that in the implementation of $RcBorrow<'_, T>, and
  • Say that the implicit case of &T -> $Rc<T> where T is known statically to only be allocated behind $Rc is actually unsound (unsafe when used to write).

I'm ok with that for this library, but I know &T -> $Rc<T> is used in the wild.

cc @RalfJung

Rc::get_mut_unchecked probably should require no other Rc and Weak pointers to the same value. It's possible to cause unsafety without violating "Any other Rc or Weak pointers to the same value must not be dereferenced for the duration of the returned borrow" requirement.

#![feature(get_mut_unchecked)]

use std::rc::Rc;

unsafe fn f<'a>(mut r: Rc<&'a str>, s: &'a str) {
    *Rc::get_mut_unchecked(&mut r) = s;
}

fn main() {
    let x = Rc::new("Hello, world!");
    {
        let s = String::from("Replaced");
        unsafe { f(Rc::clone(&x), &s) };
    }
    println!("{}", x);
}

Rc::into_raw probably should return a pointer without involving deref however (similarly to how Weak already does it in its as_raw method).

I don't think Rc should be modified to use UnsafeCell, as this changes variance, when the current variance is fine, and this is not necessary when Rc::into_raw could be changed instead.

1 Like

I opened a PR to adjust the implementation of into_raw. The "as_raw issue" remains. I also raised the as_mut_unchecked concern on the tracking issue.

1 Like

Now that we are at it, do use ManuallyDrop rather than forget, then.

The miri implementation can just execute code to find uses (i.e. actual pointer reads and writes).

As for optimizations, if a pointer is just passed around but never read/written from, then aliasing is irrelevant, so it should be fine to have arbitrary aliasing information for it (although LLVM might make assumptions that invalidate this).

It's kind of an intuitive principle: if you never dereference a pointer, then it's equivalent to an integer.

That said, stricter rules might be better to reduce accidental bugs later (i.e. someone incorrectly changing code to use a previously unused invalid reference).

It's not just about what miri can execute, it's also about what rustc can assume without whole-program analysis. Maybe I don't understand what you're suggesting, but afaik most of the optimizations described in the Stacked Borrows blog posts become invalid if the model has to start caring about whether certain kinds of "uses" occurred somewhere in a transitively called function. Unless you bake those "uses" into function types with an effect system or something.

1 Like

rustc can just do optimizations as normal, and it should not cause problems because while the reference has incorrect aliasing assumptions, it is never read/written in practice so it effectively does not exist.

Obviously if a transitively called function started to read/write the reference, then it would be UB, but it's the unsafe code's responsibility to ensure this doesn't happen.

Yeah I don't understand the suggestion at all. How can rustc "do optimizations as normal" on a reference without knowing whether it's a "real" reference or a reference that "effectively does not exist"?

rustc just assumes that it's a real reference. Since the reference is never read/written to, the fact that it's a reference with incorrect aliasing assumptions does not matter (or more precisely, you can only do optimizations for which this statement is true, but this should be all optimizations).

That's what the current compiler does and why the code in the standard library works.