Why clonefromrawing and intorawing is better than refcount manipulation

Soni · June 4, 2021, 7:39pm

The current Wake uses refcount manipulation and copies the original raw pointer around, which for a human reader has the wrong provenance. (altho the compiler/optimizer doesn't seem to care, which is fine but this isn't about the compiler.)

A better alternative would be to use clonefromrawing and intorawing. The RawWaker clone would get a clone of the pointer, rather than the same pointer as the parent. Well the pointer would have the same value but like, it'd also have the correct provenance rather than the wrong provenance. This is conceptually cleaner.

Anyway, new code is still gonna have an "arc_clone_from_raw" or "rc_clone_from_raw" so the refcount manipulation methods don't actually solve anything and even encourage error-prone code (especially in the Arc case, where someone might attempt to combine these with atomics). Do we want/need ppl to roll their own ones of these?

jhpratt · June 4, 2021, 7:57pm

As with your now-closed and unlisted post from yesterday, what are you even asking?

Soni · June 4, 2021, 9:46pm

As with the other post,

    unsafe fn clone_waker<W: Wake + Send + Sync + 'static>(waker: *const ()) -> RawWaker {
        RawWaker::new(
            Arc::into_raw(unsafe { Arc::clone_from_raw(waker as *const W) }) as *const (),
            &RawWakerVTable::new(clone_waker::<W>, wake::<W>, wake_by_ref::<W>, drop_waker::<W>),
        )
    }

is better than

    unsafe fn clone_waker<W: Wake + Send + Sync + 'static>(waker: *const ()) -> RawWaker {
        unsafe { Arc::increment_strong_count(waker as *const W) };
        RawWaker::new(
            waker as *const (),
            &RawWakerVTable::new(clone_waker::<W>, wake::<W>, wake_by_ref::<W>, drop_waker::<W>),
        )
    }

and as such we should have Arc::clone_from_raw and deprecate refcount manipulation.

jhpratt · June 4, 2021, 9:52pm

These methods were just stabilized. There are presumably use cases that don't necessitate deprecating them — you're only looking at one use case. The only thing I see you actually bringing up is introducing an additional method, and even that I find to have minimal justification; I personally find the latter snippet clearer as to what's actually happening. Even so, you're free to send a PR making the method unstably public and let the libs team decide on it if you're so inclined.

Soni · June 4, 2021, 9:56pm

Sorry, how is incrementing a refcount clearer than cloning, in a method called "clone_waker"?

jhpratt · June 4, 2021, 10:02pm

I simply do not see the need for the method, and you have not convinced me in any way.

it'd also have the correct provenance rather than the wrong provenance

What is objectively wrong about the current approach? Copying a pointer is a perfectly valid operation.

Soni · June 4, 2021, 11:39pm

The current approach requires more cognitive load to audit.

jhpratt · June 5, 2021, 1:15am

I disagree, but what does that have to do with provenance?

Soni · June 5, 2021, 1:18am

the clone has the same provenance as the original vs the clone has provenance as a clone of the original?

mbrubeck · June 5, 2021, 5:43am

I don't think there's any provenance difference between your examples. All of these operations are ultimately copying raw pointers under the hood, which preserves their provenance. I don't see any operations here that "retag" the pointers with new provenance as defined in stacked borrows.

Soni · June 5, 2021, 1:26pm

Maybe not, but a clone operation should probably use another clone operation, rather than take advantage of low-level implementation details (even if those are part of the public API).

Additionally, a drop operation should also take ownership of the pointer (with Arc::from_raw) and drop it (with drop()), for additional clarity.

Having clearer, and more useful operations (cloning from raw is applicable in more cases, as many ppl seem to roll their own), is better than low-level trickery.

dhm · June 5, 2021, 6:24pm

While I find clone_from_raw to be too specific, I do agree that .increment_ref_count is almost as specific, and that indeed it does make public implementation details of Arc. But I don't think that these implementation details are leaking, since both Arc and Rc publicly advertise their refcounting nature. Even with their own very names!

But, indeed, both are too specific to allow for general usage. The reason for the increment_ref_count addition, by the way, was that in order to write that logic oneself, one needed to write mem::forget(Arc::ref_from_raw(&ptr).clone()). And the issue at that point is that, moreover, Arc::ref_from_raw did (and still does) not exist, which is an issue given that inlining its implementation is quite subtle:

a ManuallyDrop::new(Arc::from_raw(ptr)) temporary is very error-prone / hard to get right.

So Arc::increment_ref_count(ptr) is just a very handy and thus welcome shortcut for:

mem::forget(Arc::clone(&*
    ManuallyDrop::new(Arc::from_raw(ptr))
))

(in that regard, your suggestion is just about replacing mem::forget with Arc::into_raw)

And, finally, the remaining question is whether that ugly and complex middle point with the ManuallyDrop could be considered for an API addition of its own. I believe it could.

In other words, the API that could make sense to add, should increment_ref_count end up being too specific / should there be an actual use case for it, would be the Arc::ref_from_raw(ptr) API:

impl<T : ?Sized> Arc<T> {
    /// # Safety
    ///
    ///   - `ptr` must originate from an `Arc::<T>::into_raw` call;
    ///
    ///   - during the whole `'lt`, the owned refcount must not drop
    ///     below 1: there must still be at least one `Arc` alive
    ///     or leaked.
    pub unsafe fn ref_from_raw<'lt>(
        &ptr: &'lt (*const T),
    ) -> impl 'lt + Deref<Target = Arc<T>>
    {
        ManuallyDrop(Arc::from_raw(ptr))
    }
}

Soni · June 5, 2021, 6:31pm

Why ref_from_raw? That seems even more error-prone, for example:

https://docs.rs/hexchat-plugin/0.2.14/src/hexchat_plugin/lib.rs.html#708-710

            // hook may unhook itself.
            // however, we don't wanna free it until it has returned.
            let f: Rc<CommandHookUd> = rc_clone_from_raw(ud as *const CommandHookUd);

This is extremely easy to get wrong if you use the hypothetical ref_from_raw. But clone_from_raw forces you to get it right.

Additionally, cloning is cheap, so it should be preferred for things like this. Also, isn't the pointer a pointer to the contents, anyway? So if you only need the contents aren't you supposed to just deref it? ...

dhm · June 5, 2021, 6:57pm

Fair enough, "eagerly cloning" can sometimes be less-error prone, and it actually does not come at such a high cost: the non-Clone part of Arc's API is actually rarely interesting. So I realize my suggested API, in practice, may be as restrictive / narrow as a preemptive clone (I guess I got carried away by its conceptual beauty ). In that regard, I personally agree that a clone_from_raw (with a #[must_use] for people misusing it in the obvious way) would have been a more general-purpose tool than increment_strong_count, although I think I would name it Arc::cloned_from_raw, and necessarily returning an owned Arc (letting the caller call into_raw or forget accordingly).

But the ship has sailed: increment_strong_count is the one out there, and it seems to me that the two functions overlap too much in functionality for the stdlib to be featuring both.

Soni · June 5, 2021, 6:59pm

Well, how hard is it to search all of crates.io for code that implements arc_clone_from_raw/rc_clone_from_raw functionality? Seems pretty hard, but we know of at least 2 crates off the top of our head... These aren't gonna go away with increment_strong_count, but they could with clone(d)_from_raw.

CAD97 · June 5, 2021, 9:58pm

Again, I'll note that clone_from_raw was discussed at increment_ref_count stabilization time, and the general opinion was that having both was fine.

I personally like clone_from_raw over the raw refcount manipulation, because it's marginally easier (ime) to specify purely in terms of owned/raw handles, and nailing down exactly how to document the safety interactions between [incr|decr]ement_ref_count and [into|from]_raw was tricky. Whereas clone_from_raw doesn't require any extra spec work beyond [into|from]_raw already do, since it gives you a new owned handle. Plus, then,

increment_ref_count = into_raw(clone_from_raw(ptr)); and
decrement_ref_count = drop(from_raw(ptr)),

and the handle management rules fall out cleanly. On top of that, it's not immediately obvious that decrement_ref_count potentially drops the wrapped T, whereas drop(from_raw(ptr)) definitely is.

That said, the trains have left the station, and [incr|decr]ement_ref_count are stable. This is fine, they're good helpers to have! We've cleared up the handle ownership documentation, and increment_ref_count is the exact semantics that RawWaker wants to have. increment_ref_count was stabilized to serve that specific use case, rather than block on adding clone_from_raw instead. decrement_ref_count was also stabilized as its dual.

clone_from_raw is a higher-level interface that we can and should also provide; it's just nobody has bothered to actually PR it yet.

One note for @Soni: (pointer) provenance has a specific meaning: what memory the pointer is allowed to read from, and what memory it's allowed to write to. I suspect you're also trying to extend this to what "raw handles" to the reference counted object it owns, but that's not how we've specified the ref count management, and I'd be loathe to tie it into provenance which is already a tricky subject.

Roughly, the raw ref counting semantics are specified as:

Each Arc<T> owns an owned strong handle to the refcounted object.
When the last owned strong handle is dropped, the object is considered dropped. (An owned handle will always actually perform this drop.)
In addition to the owned handles owned by the strongly typed Arc handles, there is a pool of raw owned handles globally available to draw from or add to. (Semantically, you can think of this pool as ghost state living on top of the physical ref count tracking.)
You may only draw from the raw pool given a pointer with write provenance over the ref count location and type-appropriate shared provenance over the refcounted object.
into_rawing/mem::forgetting a strongly typed owned handle moves it into the raw owned handle pool.
from_rawing a pointer with appropriate provenance moves a raw handle to a strongly typed handle.
increment_ref_count adds a new raw handle to the pool.
decrement_ref_count destroys a raw handle from the pool.
Arc::clone creates a new typed handle given an existing typed handle.
(theoretical) clone_from_raw creates a new typed handle given an existing raw handle.

Soni · June 6, 2021, 1:01am

Was stabilizing this really necessary for Wake? As far as we can tell, it really wasn't. It's not exposed anywhere. It could've been an unstable implementation detail.

Soni · June 7, 2021, 12:15am

There have been 3 previous attempts to add clone_from_raw. The first got closed because there wasn't a good enough argument for it. The second was a duplicate. The third got closed as a duplicate of refcount manipulation.

There's still an open feature request for clone_from_raw, also.

We wanted to find more crates that implement their own clone_from_raw but instead we found unsoundness/UB. >.>

CAD97 · June 7, 2021, 1:07am

Links? Looking at previous PRs would help understand why they were closed, and what a new PR would have to argue against to provide new information and arguments towards adding the function.

Which provides a good reason to provide such functionality, if people are trying and failing to implement it themselves! (I assume rc-box is fine? IIRC it was the last time I checked, around when increment_strong_count was stabilized, but I could've missed something.)

And just as a procedural thing: please, when you refer to the existence of external things, provide links. It saves so much time to provide the research you've already done rather than just asserting it without any proof, which leaves your audience having to search for it themselves (or to ignore you, or to assume you're making it up, depending on prior biases).

When you come and ask for something, you're in a position of knowledge that everyone else isn't, and you have to explain your position from first principles (ideally up front, rather than piecewise as people ask for details). The burden of proof falls on you when you're asking to make a change.

Soni · June 7, 2021, 2:19am

https://github.com/rust-lang/rust/pulls?q=is%3Apr+clone_raw+is%3Aclosed

We'd say it's slightly worse than that: Crate hexchat allows you to free the callback's whole Fn while it's actively running! (We went to look there because uh, we make hexchat-plugin, which uses Rc (and rc_clone_from_raw). We figured they'd be doing the same. x.x)

Topic		Replies	Views
Thread-safety of RawWakerVTables libs	4	605	November 2, 2022
Pre-RFC Idea: `RefWake` trait language design	3	165	October 1, 2024
Immortal Arc API? libs	39	2713	April 23, 2024
Arc and Pin? libs	11	2696	April 29, 2022
Rustic Rust for Rapid Prototyping language design	6	1503	April 21, 2023

Why clonefromrawing and intorawing is better than refcount manipulation

Related topics