Rc and internal mutability

CAD97 · December 10, 2019, 5:56pm

CAD97:

Just to note: this [(A)Rc::get_mut_unchecked] exposes shared internal mutability to ( A ) Rc where the stable API only exposes (runtime checked) unshared internal mutability (of the payload) [(A)Rc::get_mut].

RalfJung:

@CAD97 I am not entirely sure what you mean... I do not see any interior mutability here. That term is specifically tied to UnsafeCell in Rust, and there is none here.

Would you say that the fact that we can create a raw pointer into a Box "exposes shared internal mutability"? Because this operation is very similar.

SimonSapin:

I think another way to phrase @CAD97’s comment/concern is: given let some_box: Box<u32> = /*…*/; let ptr = &mut *some_box; , under what conditions if any is it safe to write through ptr while some_box still exists (and therefore "shares" the same u32 value by having another pointer to it).

Compare with Arc::get_mut which ensures there’s only one reference to the value and therefore none of this kind of "sharing".

RalfJung:

SimonSapin:

under what conditions if any is it safe to write through ptr while some_box still exists (and therefore "shares" the same u32 value by having another pointer to it).

That goes deep into the aliasing model -- and is not specific to Box or Arc at all, I would say.

Your example uses a safe reference for ptr so you won't be able to do anything illegal, but assuming you meant &mut *some_box as *mut _ , then the answer that Stacked Borrows gives is that you may use ptr until you do anything with some_box the next time -- including things like moving it or borrowing from it. But of course Stacked Borrows is just an experiment, not the final answer.

SimonSapin:

RalfJung:

not specific to Box or Arc at all

Yeah a code sample just a way to express something without terminology

RalfJung:

assuming you meant &mut *some_box as *mut _

Oops, yes I did.

Ok I understand you might not have a final answer yet for some cases of aliasing rules. So the question is: does documenting Arc::get_mut_unchecked as safe to use while other Arc references to the same value exist (just not deref’ed) constrain a future decision we might want to make for aliasing rules? As opposed to documenting that it may only be used if only one Arc and no Weak exist.

RalfJung:

Oh, now I see what you are getting at. Good question.

This can only become a problem if Arc / Rc is ever understood by the optimizer to mean something. With Box , it is basically a primitive type and also uses Unique , hence the strict rules; for Arc that seems unlikely.

SimonSapin:

while other Arc references to the same value exist (just not deref’ed)

Deref would certainly be bad, but even cloning creates a reference to the ArcInner , right? None of these should be allowed.

I think we'd be on the safe side if we said that other Arc might exist but nothing may be done with them (not even a move), similar to Box .

-- @CAD97, @SimonSapin, @RalfJung rust-lang/pull#6251 (discussion)

-- @RalfJung rust-lang/rust-clippy#4774 (comment)

struct RcBox<T: ?Sized> {
    strong: Cell<usize>,
    weak: Cell<usize>,
    value: T,
}

pub struct Rc<T: ?Sized> {
    ptr: NonNull<RcBox<T>>,
    phantom: PhantomData<RcBox<T>>,
}
impl<T: ?Sized> Rc<T> {
    pub fn into_raw(this: Self) -> *const T {
        let ptr: *const T = &*this;
        mem::forget(this);
        ptr
    }

    pub unsafe fn from_raw(ptr: *const T) -> Self {
        let offset = data_offset(ptr);

        // Reverse the offset to find the original RcBox.
        let fake_ptr = ptr as *mut RcBox<T>;
        let rc_ptr = set_data_ptr(fake_ptr, (ptr as *mut u8).offset(-offset));

        Self::from_ptr(rc_ptr)
    }

    #[inline]
    pub fn into_raw_non_null(this: Self) -> NonNull<T> {
        // safe because Rc guarantees its pointer is non-null
        unsafe { NonNull::new_unchecked(Rc::into_raw(this) as *mut _) }
    }

-- liballoc/rc.rs (8960acf)

(A)Rc are shared pointers. That part is obvious.
(A)Rc expose mutability when they are runtime-checked unique. That part is stable (get_mut).
(A)Rc expose shared mutability when unsafe-guaranteed no other pointers are "used" during the mutation. That part is unstable (get_mut_unchecked).
(A)Rc are represented as ptr::NonNull<{ strong: UnsafeCell<usize>, weak: UnsafeCell<usize>, value: T }>.

There are three potential solutions to the immediate issue I see here (that Rc::from_raw(Rc::into_raw(_)) gives "immutable/shared provenance" to the RcBox pointer, making get_mut unsound):

The aliasing model is wrong, liballoc is correct.
Fix the implementation of into_raw to not use shared references. Unfortunately, I think this would require &raw or some other way of offsetting pointers without manifesting a reference.
Admit (A)Rc is an internal mutability type and store the value as UnsafeCell<T> in RcBox/ArcInner. (We really should unify those naming schemes.)

I think the third solution is the best here. The correctness of (A)Rc::get_mut(_unchecked) is really subtle without the use of UnsafeCell and interacts with minutia of the not-yet-specified aliasing model. If we do use UnsafeCell, it's trivial what the requirements of these functions are, rather than being scattered around the module to never accidentally change the provenance of the pointer to the inner blob.

RalfJung · December 10, 2019, 6:23pm

I don't think that helps -- assuming the implementation of into_raw stays as it. It goes through deref, which returns &T, which is read-only.

So you'll need to adjust into_raw to not go through deref. And if you do that, I think there is no reason to add an UnsafeCell.

withoutboats · December 10, 2019, 6:31pm

Like just using ptr::offset by the size_of the header value?

RustyYato · December 10, 2019, 7:01pm

And also aligning the result of the offset. This relies on the layout of RcBox/ArcInner.

Something like this should work.

let ptr = inner_ptr.add(size_of::<usize>());
let align_offset = ptr.align_offset(align_of::<T>());
let ptr: *const u8 = ptr.add(align_offset);

SimonSapin · December 10, 2019, 7:04pm

Would using the offset from the return value of core::alloc::Layout::extend be less error-prone?

HeroicKatora · December 10, 2019, 7:17pm

Arc and Rc already manually offset their pointers as part of their from_raw implementations and as part of this calculate the offset of the data from the RcBox itself. Since liballoc can control the exact layout of the inner RcBox that is not a problem. What's stopping it from using the same logic in reverse (or rather forward in memory) to perform the initial offsetting?

// For Rc:
pub fn into_raw(this: Self) -> *const T {
    // Re-use the existing method of calculating the offset.
    // The deref is only temporary and does not create the final ptr.
    let offset = data_offset(&*this);
    let raw = NonNull::into_raw(this.ptr);
    unsafe {
        (raw as *const u8).offset(offset) as *const T
    }
}

Or am I missing something here? (Except that the code will be slightly more complicated for offsetting the pointer in the case of T: !Sized).

RustyYato · December 10, 2019, 8:35pm

Yes, I didn't realize that Layout had that! Cool.

CAD97 · December 10, 2019, 8:56pm

Specifically, here are the computations that are used for offsetting:

unsafe fn data_offset<T: ?Sized>(ptr: *const T) -> isize {
    // Align the unsized value to the end of the `RcBox`.
    // Because it is ?Sized, it will always be the last field in memory.
    data_offset_align(align_of_val(&*ptr))
}

/// Computes the offset of the data field within `RcBox`.
///
/// Unlike [`data_offset`], this doesn't need the pointer, but it works only on `T: Sized`.
fn data_offset_sized<T>() -> isize {
    data_offset_align(align_of::<T>())
}

#[inline]
fn data_offset_align(align: usize) -> isize {
    let layout = Layout::new::<RcBox<()>>();
    (layout.size() + layout.padding_needed_for(align)) as isize
}

Current from_raw:

    pub unsafe fn from_raw(ptr: *const T) -> Self {
        let offset = data_offset(ptr);

        // Reverse the offset to find the original RcBox.
        let fake_ptr = ptr as *mut RcBox<T>;
        let rc_ptr = set_data_ptr(fake_ptr, (ptr as *mut u8).offset(-offset));

        Self::from_ptr(rc_ptr)
    }

I agree with @HeroicKatora here and I think we can just re-implement into_raw to use a manual offset as well, implemented something like

    pub fn into_raw(this: Self) -> *const T {
        let ptr: *mut RcBox<T> = NonNull::as_ptr(this.ptr);
        mem::forget(this);

        let offset = data_offset(&*ptr.value);
        (ptr as *mut u8).offset(offset) as *const u8 as *const T
    }

bill_myers · December 10, 2019, 9:57pm

Has the idea of changing Stacked Borrows so that creating a reference only has an effect when it is first used (and thus never does if converted to a raw pointer immediately) been decided against?

Ixrec · December 10, 2019, 10:02pm

I haven't heard of any suggestions or explicitly rejected alternatives along these lines. I would guess that's because you can't know when a reference is "used" without analyzing all transitively called functions, and making all reference optimizations depend on that would be bad. And I don't see how that would help with the issues in this thread anyway.

(at least, I assume by "used" you mean something more interesting than just passing the reference to another function, which is already a semantically significant operation in Stacked Borrows)

CAD97 · December 14, 2019, 8:11pm

Does this mean that my (A)RcBorrow::upgrade is unsound? It's a not too uncommon pattern to allow upgrading from &T where the T is statically known to be behind an (A)Rc to an (A)Rc<T>, and that's what that library is providing a type for.

The implementation currently uses

pub struct $RcBorrow<'a, T: ?Sized> {
    raw: ptr::NonNull<T>,
    marker: PhantomData<&'a $Rc<T>>
}

impl<'a, T: ?Sized> From<&'a $Rc<T>> for $RcBorrow<'a, T> {
     fn from(v: &'a $Rc<T>) -> $RcBorrow<'a, T> {
        $RcBorrow {
            raw: (&**v).into(),
            marker: PhantomData,
        }
    }
}

impl<'a, T: ?Sized> $RcBorrow<'a, T> {
    /// Convert this borrowed pointer into an owned pointer.
    pub fn upgrade(this: Self) -> $Rc<T> {
        unsafe { $Rc::clone(&ManuallyDrop::new($Rc::from_raw(this.raw.as_ptr()))) }
    }

    /// Convert this borrowed pointer into a standard reference.
    pub fn downgrade(this: Self) -> &'a T {
        unsafe { &*this.raw.as_ptr() }
    }
}

As I understand it, we've determined that this way of implementing into_raw is incorrect when using the current Stacked Borrows rules, so my "as_raw" implementation would also be unsound to turn into a &$Rc<T> to clone, as it's lost write provenance over the location of T. (And a unique $Rc<T> can be used to write.)

If I understand correctly, the way to make this sound would be to

Provide a $Rc::<T>::as_raw(&self) -> *const T function that uses pointer manipulation instead of Deref,
Use that in the implementation of $RcBorrow<'_, T>, and
Say that the implicit case of &T -> $Rc<T> where T is known statically to only be allocated behind $Rc is actually unsound (unsafe when used to write).

I'm ok with that for this library, but I know &T -> $Rc<T> is used in the wild.

Tom-Phinney · December 14, 2019, 8:18pm

cc @RalfJung

xfix · December 15, 2019, 12:22am

Rc::get_mut_unchecked probably should require no other Rc and Weak pointers to the same value. It's possible to cause unsafety without violating "Any other Rc or Weak pointers to the same value must not be dereferenced for the duration of the returned borrow" requirement.

#![feature(get_mut_unchecked)]

use std::rc::Rc;

unsafe fn f<'a>(mut r: Rc<&'a str>, s: &'a str) {
    *Rc::get_mut_unchecked(&mut r) = s;
}

fn main() {
    let x = Rc::new("Hello, world!");
    {
        let s = String::from("Replaced");
        unsafe { f(Rc::clone(&x), &s) };
    }
    println!("{}", x);
}

Rc::into_raw probably should return a pointer without involving deref however (similarly to how Weak already does it in its as_raw method).

I don't think Rc should be modified to use UnsafeCell, as this changes variance, when the current variance is fine, and this is not necessary when Rc::into_raw could be changed instead.

CAD97 · December 15, 2019, 11:12pm

I opened a PR to adjust the implementation of into_raw. The "as_raw issue" remains. I also raised the as_mut_unchecked concern on the tracking issue.

dhm · December 16, 2019, 11:46am

Now that we are at it, do use ManuallyDrop rather than forget, then.

bill_myers · December 18, 2019, 1:42pm

The miri implementation can just execute code to find uses (i.e. actual pointer reads and writes).

As for optimizations, if a pointer is just passed around but never read/written from, then aliasing is irrelevant, so it should be fine to have arbitrary aliasing information for it (although LLVM might make assumptions that invalidate this).

It's kind of an intuitive principle: if you never dereference a pointer, then it's equivalent to an integer.

That said, stricter rules might be better to reduce accidental bugs later (i.e. someone incorrectly changing code to use a previously unused invalid reference).

Ixrec · December 18, 2019, 1:47pm

It's not just about what miri can execute, it's also about what rustc can assume without whole-program analysis. Maybe I don't understand what you're suggesting, but afaik most of the optimizations described in the Stacked Borrows blog posts become invalid if the model has to start caring about whether certain kinds of "uses" occurred somewhere in a transitively called function. Unless you bake those "uses" into function types with an effect system or something.

bill_myers · December 18, 2019, 1:51pm

rustc can just do optimizations as normal, and it should not cause problems because while the reference has incorrect aliasing assumptions, it is never read/written in practice so it effectively does not exist.

Obviously if a transitively called function started to read/write the reference, then it would be UB, but it's the unsafe code's responsibility to ensure this doesn't happen.

Ixrec · December 18, 2019, 1:54pm

Yeah I don't understand the suggestion at all. How can rustc "do optimizations as normal" on a reference without knowing whether it's a "real" reference or a reference that "effectively does not exist"?

bill_myers · December 18, 2019, 2:03pm

rustc just assumes that it's a real reference. Since the reference is never read/written to, the fact that it's a reference with incorrect aliasing assumptions does not matter (or more precisely, you can only do optimizations for which this statement is true, but this should be all optimizations).

That's what the current compiler does and why the code in the standard library works.

Topic		Replies	Views
Writing through a pointer derived from a shared reference, after that reference is dead Unsafe Code Guidelines	7	1590	March 27, 2019
Stacked Borrows: An Aliasing Model For Rust Unsafe Code Guidelines	49	7463	March 25, 2019
Can `Pin::map_unchecked_mut` actually be used safely at all? Unsafe Code Guidelines	7	1619	August 22, 2019
Ideas to get interior mutability primitive working	8	386	August 3, 2024
FFI mutating raw pointer where immutable reference in scope, is this UB? Unsafe Code Guidelines	9	2928	March 25, 2019

Rc and internal mutability

Related Topics