Safe Intrusive Collections with Pinning

RalfJung · April 10, 2018, 12:00pm

In my last post, I talked about the new “pinned references” which guarantee that the data at the memory it points to will not, ever, be moved elsewhere. I explained how they enable giving a safe API to code that could previously only be exposed with unsafe, and how one could go about proving such a thing. This post is about another application of pinned references—another API whose safety relies on the pinning guarantees: Intrusive collections. It turns out that pinned references can almost be used for this, but not quite. However, this can be fixed by extending the guarantees provided by pinned references, as suggested by @cramertj.

The first part is going to explain how intrusive collections could benefit from Pin, were we to accept that additional guarantee. This part assumes some familiarity with the Pin API, but not with the formal model I introduced previously.

In the second part, I am going to briefly sketch a formal perspective on intrusive collections and the extended pinning guarantees. This builds on the formal notation I introduced in my last post.

Finally, I will discuss a variant of our “running example” intrusive collection that combines shared references and pinning. It turns out this variant is actually incompatible with the formal model from the last post, but the model can be extended to fix this. Hiwever, this extended model will actually call for a change to the Pin API (or rather, for a revert to an earlier version).

https://www.ralfj.de/blog/2018/04/10/safe-intrusive-collections-with-pinning.html

Anyway, as usual, please let me know what you think! Of course, I am particularly interested in thoughts on the entire “shared pinning” situation.

nikomatsakis · April 10, 2018, 2:02pm

Nice Ralf, you’re on a roll! I still haven’t had time to fully digest the last one

pythonesque · April 10, 2018, 2:03pm

This is amazing–I suspected we could do some intrusive stuff in Rust using pinning, but I did not think we’d be able to decouple intrusive elements from the collection lifetime in this way because of the Drop restrictions! That would seriously remove one of the biggest cases safe Rust currently can’t handle–references from an older collection to a newer one, where the reference does have a lifetime statically known not to outlive the thing it’s pointing to, but the enforcement mechanism is dynamic. One wonders whether some primitives could be constructed for making actually implementing the intrusive collections easier, but that’s less important.

It definitely seems like it would be a good idea to change the API now if we are ever going to do it. And it makes sense to do so, anyway, unless there are soundness holes with the shared one.

The thing I am most concerned about here is the extra requirement on Pin and PinBox; it seems likely that they can uphold the invariant you mention, but it pays to be really sure because Drop is sneaky. I am mostly reassured by the fact that Pin<T> will not be safe to construct, so all the obvious ways that data structures can currently discard memory without calling its destructors shouldn’t matter that much. It does, however, seem to mean that Box<T> irrevocably needs its own allocation now, for PinBox to be safe–it can never be freed without deallocating itself (or so it seems to me). That’s probably fine, but I am curious about how custom allocators interact with this? Also, I wonder if some transmutes in existence today violate this?

I am a bit worried that it will be hard to write APIs that satisfy the invariant, but then again it would probably not be safe if they didn’t, so that’s okay. I was also initially worried about the stack pinning API, but I suppose that must work if scoped threads do.

RalfJung · April 10, 2018, 2:08pm

Yeah I'm sorry . I felt I was in a haste because of the effect on Pin API stabilization.^^ (And by some measures, I am late )

What do you mean by this? mem::forget on a PinBox is fine. I also don't see the connection to custom allocators.

No existing transmutes can mention PinBox, so I'd be surprised. What do you have in mind?

pythonesque · April 10, 2018, 2:15pm

mem::forget is fine, yeah, I was talking about actually freeing the backing memory. I was thinking about scenarios where people are doing custom allocation of some sort–I imagine many types of custom allocators would “own” their own memory, but might not be aware of what types were in it. In those cases, it seems possible to me that they might think it was okay to completely free the memory once (e.g.) the thread they were in exited, if it was a thread local allocator (I believe this would be safe today if you insisted T was not Send and were able to hook into thread exit). I don’t have specific examples, though.

As for transmutes: of course none would mention PinBox, but people do transmute into Box on occasion, and I was under the impression Box can become PinBox in safe code? So I’d want to make sure that none of those cases was actually a problem (again, I don’t think it should be, but I’ve been burned by Drop way too much before to be confident about this).

RalfJung · April 10, 2018, 3:23pm

You are right that we have to be careful here. However, it seems to me that enforcing T: Unpin here should be enough, which is actually easier than enforcing T: !Send which, as you mentioned, is also necessary. Allocator APIs don't usually involve the type, after all... and arenas are just fine as long as they don't hand out pinned references. (Also at the All Hands we came to the conclusion that TypedArena should be fine as well because it actually runs drop.)

I think it is debatable whether such an approach would even be considered correct. As far as I am concerned, if you fully own some memory, you can change its type, e.g. from something non-Send to something Send, if you then properly initialize it at the new type.

Yes, it can---if you own the Box. The really crazy transmutes I am aware of involve things like &Box<T>, which you cannot use to create a PinBox.

I guess I agree that we have to be careful, but the fact that this is still a "local" change means (I think) that all existing correct code will remain correct. I have to admit though that allocators are special, as we actually are able to obtain pinned references into them no matter what.

I share your excitement though, so I hope we can get enough confidence in this to actually commit to it

pythonesque · April 10, 2018, 4:01pm

In fact, on reflection, I think that the allocator I described is safe anyway–though it does sort of violate this invariant, I don’t think anyone would be able to observe it doing so (because you still need a reference to the PinBox to be able to do any damage, and if it’s not Send nobody should have one by the time the thread self destructs). You might be able to Send a reference to its interior, but only if the thread to which you sent it was scoped to end before the thread enclosing it did. So in fact I still don’t have a concrete example of a case where Box's contents could be deallocated but someone could still see into it, and it may be that there is no such example (which would be very nice!).

As for whether such an approach could be correct at all: I suppose I see where you’re coming from, but I fear that stance might limit custom allocators too much (since there are quite a lot of usecases where you want to specialize allocation by type–for instance, I might want to make sure my Gc'd allocated pointers used their tag bits as markers, which would stop working if you could just turn it into a bag of bytes whenever you felt like). But that’s future work, I guess…

RalfJung · April 10, 2018, 4:08pm

I don't think this kind of reasoning is sound. There may be other resources than memory implicitly managed here. And even on the memory side, re-use of addresses us observable and pinned data can rely on its address never being re-used unless drop got called.

pythonesque · April 10, 2018, 4:14pm

Ah, good point, I didn’t think about address reuse invariants; you are correct that those are observable (and even were relied on at times in the past, even if mistakenly, for stuff like Rc not overflowing). I’m less certain that resources other than memory are that relevant in this case, but it doesn’t matter since we already have a counterexample.

cramertj · April 10, 2018, 7:53pm

@RalfJung Loving the posts! Thanks so much for writing this all up. I'm curious about this statement:

As we defined previously, Pin<'a, T>.shr just uses T.shr. That made it easy to justify Pin::deref. However, it also means &Pin and &&T are the same type. The two invariants are equivalent. We could literally write functions transmuting between the two, and we could justify them in my model.

This is hard for me to understand precisely as I don't know the fine details of your model, but from a memory ownership perspective I agree that &Pin<T> and &&T are the same type. However, I think that &Pin<T> provides you with the extra guarantee that the underlying T has been pinned at some point by someone. This doesn't give us any special privileges with respect to how we access T, but it can allow us to enforce user invariants. The fact that the T has been pinned by someone, combined with the impl of !Unpin means that that the underlying T can never be moved, which is all we need to guarantee.

glaebhoerl · April 10, 2018, 10:45pm

This is great! Props to @cramertj for having the presence of mind to think of this, and to @RalfJung for carrying through the investigation.

I only have three random thoughts.

It’s vaguely bothered me for a while that Rust doesn’t have a clear answer for how to safely implement things like Qt’s QObject hierarchies, which have both parent-child and child-parent pointers, without resorting to Rc and Weak pointers (which C++ doesn’t need to). This seems to be exactly what is needed. I don’t know if one would typically want to do such a thing, but it’s good to know how it’d now (hopefully, presumably) be possible!
Could RefCell add a bit to its borrow flags to keep track of whether its contents are pinned, slash would doing this make sense? A bit that once set, stays set, and if it is set, you can get Pins out of the RefCell but not &muts. (It seems we have wisely refrained from exposing a BorrowState enumerating all of the possible states a RefCell can be in.)
Could you use this to implement a doubly-linked list in safe code? Or at least, less-unsafe code? Not as a (publicly) intrusive collection, but something like the LinkedList API in std.

comex · April 11, 2018, 2:29am

Just wanted to leave a pointer to my proposal to remove Pin entirely in favor of a library-only variant of !Move. Pin<T> would become &mut T, and &T would also be available, so we could have immutable pinned references (plus &move in the future) without an explosion of new types, and without the poor ergonomics of, e.g., needing an explicit method call to convert PinMut<T> to Pin<T>.

Despite what you might expect, this approach would be compatible with the extra invariant proposed by @RalfJung. Basically, my proposal still includes PinBox, but with a different approach for it to provide the no-move guarantee:

trait Anchor {
    type Inner: ?Sized;
    // PinBox and friends would use this method to implement DerefMut;
    // it's unsafe because you should only call it if you can guarantee
    // that `self` will never move again.
    unsafe fn get_mut(&mut self) -> &mut Self::Inner;
}

Adding the invariant would ‘just’ be a change to the contract of Anchor to also have callers of get_mut (such as PinBox) ensure that the Anchor is dropped before its memory is reused.

As an illustration, to implement Entry under this scheme, just like any other immovable struct, you’d use two types – one to hold the actual data, and one that’s just a marker:

struct EntryAnchor<T> {
    x: T,
    objects: Cell<whatever>,
}

extern { type Entry<T>; }

// Boilerplate
impl<T> Anchor for EntryAnchor<T> {
    type Inner = Entry<T>;
    unsafe fn get_mut(&mut self) -> &mut Entry<T> {
        &mut *(self as *mut EntryAnchor<T> as *mut Entry<T>)
    }
}

And all methods using Entry would do the reverse cast from &mut Entry<T> to &mut EntryAnchor<T> (or &Entry<T> to &EntryAnchor<T>).

Note that the two-struct requirement is essentially just a stopgap until immovable structs are eventually added as a builtin-language feature. In the meantime, of course, we could use a macro to reduce boilerplate.

By the way, suppose we want to allow T to also be immovable. This requires a bit of extra work in both @RalfJung’s version and mine, but not too much. In his version, there would need to be an (unsafely-implemented) custom accessor method to go from Pin<Entry<T>> to Pin<T>. In mine, the definition would be changed to

struct EntryAnchor<A: Anchor> {
    x: A,
    objects: Cell<whatever>,
}

and the accessor method would look like:

impl<A: Anchor> Entry<A> {
    fn get(&mut self) -> &mut <A as Anchor>::Inner {
        // usual conversion that applies to every method:
        let anchor = &mut (self as *mut Entry<A> as *mut EntryAnchor<A>);
        // the actual work is just:
        unsafe { anchor.x.get_mut() }
    }
}

(edit: fixed above example)

…but doesn’t that make things unergonomic if the contained type doesn’t need to be immovable, since it’d force us to unnecessarily define an Anchor type? No, because I’m envisioning there would be a blanket impl of Anchor for non-immovable (i.e. Sized) types:

impl<T> Anchor for T {
    type Inner = Self;
    unsafe fn get_mut(&mut self) -> &mut Self { self }
}

so you could use EntryAnchor<MyStruct>, and it would just work without needing a separate MyStructAnchor.

steven099 · April 11, 2018, 3:03am

It seems like you'd have to prime it to get it to know it's pinned, like:

impl<T> RefCell<T> {
    fn set_pinned(self: Pin<Self>) { ... }
}

You'd have to make sure that RefCell<T>: !Unpin where T: !Unpin, or else pinning it would be unsound.

I'm not sure how you could statically enforce that set_pinned will be called once the RefCell itself is pinned, though (barring a generalized pinning 'constructor' that gets called automatically, like an anti-drop). I think having:

Pin<T>: Deref<Target=T> where T: ?Sized

means that RefCell by default has to represent an unpinned inclusion within pinned data. Edit: Of course, if the RefCell is a field of a pinned data structure, then getting a Pin<RefCell<T>> is already an unsafe operation, so ultimately you and not compiler are responsible for ensuring the soundness of how you use it internally or expose it.

RalfJung · April 11, 2018, 8:53am

You are making two contradictory statements here -- or rather, I think, you are using the term "the same" differently than I do. When I say "the same type", I mean that it is okay for safe code to cast between these types. For example, &'a Box<T> and &'a &'a T are the same type -- exposing a transmute back and forth between these two types to safe code is an okay thing to do.

Ownership goes well beyond mere memory, because types can have their own invariants with their own ideas of ownership. Think of using a newtype around () as a magic non-duplicable token that an API uses to ensure an action can only ever be performed by one party. Just because there is no "memory ownership" here doesn't mean there is no ownership. For this reason, duplicating a ZST is illegal in general, even though there are no aliasing rules that would prevent this. Also see this discussion in the unsafe code guidelines repo.

So, if I cast your statement in my terms, you are saying that &&T and &Pin<T> are not the same type. Exposing the following functions to safe code would not be okay:

fn shr_shr_to_pin<'a, 'b, T>(x: &'a &'b T) -> &'a Pin<'b, T> {
  mem::transmute(x)
}
fn shr_pin_to_shr<'a, 'b, T>(x: &'a Pin<'b, T>) -> &'a &'b T {
  mem::transmute(x)
}

Actually exploiting this kind of reasoning requires us to talk about shared pinned data. That's not something we can get for free, we have to explain why sharing this is okay -- after all, once sharing is involved, multiple parties could be acting on the same data, and we have to prove that they their interaction will not cause havoc.

"Shared pinned" is different from "pinned" for the same reason that "shared" is different from "owned".

That is a very interesting thought. So the bit can only be set if there are no outstanding borrows, and you also cannot get a Pin into the RefCell unless the bit is set. Also setting the bit, I suppose, will require a Pin<RefCell<T>>. This could actually work!

Funny enough, with such a construction, we could have borrow_pin take a & RefCell<T> because we know if the bit is set, the RefCell is pinned, even if the shared reference we obtain here doesn't express that in the type. That's like run-time tracking of the pin state, matching how RefCell also otherwise run-time-tracks something like the borrow checker state.

I don't think you need this. You just declare that the contents of the RefCell are considered pinned only once set_pinned was called. The one thing you have to be careful about is a method like fn get_pin(Pin<RefCell<T>>) -> Pin<T>; this would have to check or set the pin bit to make sure we don't hand out &mut again after get_pin was ever called.

The pin bit is different from the other borrow flags in the sense that, when I have &mut RefCell<T> I can ignore the borrow flags (and get_mut does exactly that) but I cannot ignore the pin bit.

Still, this is very neat

EDIT: However, it occurs to me that this does not need a shared reference type! Because the pin state is tracked at run-time, and not in the type, borrow_pin and borrow are perfectly happy to hand out pinnd data given a &RefCell<T> if the run-time pin bit says this is pinned.

I am slightly bothered by the fact that we do not have a case for type where both the owned and the pinned, or both the shared and the shared pinned, invariant are interesting. It seems like types that want to exploit pinning only have boring owned invariants, and rather trivial shared invariants. For example, Entry<T>'s shared invariant would literally be True -- meaning you cannot do anything useful with an &Entry<T>. Well, maybe it could allow access to an &T? Still, the shared invariant is extremely restricted once a shared pinned invariant exists because it has to work both when the entry is pinned and when it is not (due to the Pin::deref). And the owned invariant usually just says something like "this is still in the initial state, wait for it to get pinned".

So, having the shared pinned invariant seems necessary, but not remotely as interesting as the owned vs. shared invariants. (Types like RefCell or Rc really go crazy on exploiting their shared invariant, and some of that feeds back into the owned case as well, particularly for Rc.)

RalfJung · April 11, 2018, 9:20am

The invariant has been proposed by @cramertj, not by me.

Also, I commented in the other thread on the aspects specific to your proposal. Concerning the interaction with the dropping invariant, I am worried that there may be code out there which re-uses the memory covered by a !Sized type without calling drop. I think one could build a variant of my horrible union-free ManuallyDrop that does this. So while it may be possible to have an invariant like "Unsized types have drop called on them before their memory gets deallocated or otherwise invalidated", I am extremely worried that this is incompatible with other code out there that already breaks this invariant. The reason this works for Pin is that there is no code out there that actually produces a Pin, so we can still add new invariants to whoever creates a Pin. You are proposing to add new invariants to whoever creates a &mut T where T: !Sized. That's quite a bold move! I don't think we can realistically do that.

pythonesque · April 11, 2018, 11:52am

I'm not sure we can arrive at that conclusion yet. The pinning APIs have only been around for a very short time. Actually, the conclusion I'm starting to come to is that it feels like stabilizing this stuff too early might be a very big mistake, since we still haven't had much actual usage yet and are still in the early stages of exploring what sorts of things they enable.

In particular, I'm curious about how much unsafe code people are going to have to write to use these in practice... for example, without unpin basically the entire API surface of Pin<T> is unsafe, which presumably means that the vast majority of users would be unsafe code (and tricky unsafe, at that). That's fine from a safety perspective, but in order for it to actually be useful in terms of reducing the overall amount of unsafe code people have to write, I imagine there have to be reusable safe wrappers that expose additional functionality on top of Pin without introducing too much extra space or runtime overhead.

To be more concrete: how do I actually set up my node? I almost certainly do not want to create a box allocation for the entry node (that would be a pointless extra indirection for every link traversal, not to mention an extra allocator invocation). So presumably that means I have to PinBox<T> the entire thing (or stack Pin it) if I want to avoid unsafe. That's fine, but what now? Another advantage of intrusive data structures is being able to have multiple collections in one structure, and ideally even have some control over where the links are, so I probably don't want to just make Entry wrap my data node. That means I somehow need to be able to get from my Pin'd state to a Pin'd entry.

But (correct me if I'm wrong) I'm stuck at this point, right? The compiler can't help me get a Pin'd version of the entry, regardless of Pin::deref. I could do it dynamically (using RefCell) but that would entail extra runtime tracking, which I really don't want (even PinCell needs at least one bit here, which might in practice mean an extra word when alignment is taken into account; RefCell takes up a whole extra word for sure; and both of them can panic, which is not desirable in an embedded context).

The problem (it seems to me) is that I don't have any way to preserve the link between the Pin' on the outer and the Pin on the field. Would it be possible to preserve this relationship safely, without restricting the API?

Note that in this case I am statically pinned for the lifetime of the outer PinBox, so any additional tracking should presumably not be needed. I know that the Github comments say this cannot be done safely, for the obvious reason (we can't prevent pointing into owned pointers like Box without compiler support). I'm wondering whether there is any way at all we could avoid this in most safe code (for example, have some unsafe trait witnessing such indirection).

If not, I fear we're not really in a much better place than before when it comes to intrusive data structures; the former state of the art was "if you want to use this, we have lots of macros that generate unsafe code, and some closure based APIs that let you access it; or you can use more limited versions with extra overhead or more restrictive requirements" and this feels like it is just a slightly nicer version of the same thing.

(I should add: I suppose it is possible that the "wrapper struct" thing could be made to work somehow by having ways to control the precise layout within the structure, but it doesn't seem very modular to me and sounds really irritating to work with if you have a nontrivial number of entries, so you'd probably end up needing macros anyway to access most of your fields. Still, at least there you would only have macros for ergonomics rather than safety).

comex · April 11, 2018, 11:52am

You are proposing to add new invariants to whoever creates a &mut T where T: !Sized.

I don't think this is as bold as it sounds. After all, the invariant is incapable of being violated for a zero-sized type, since there's no data to overwrite before calling drop. Under my proposal, the types which would be exposed to existing code either are zero-sized, or are, er, panic!()-sized, which can't be distinguished from zero-sized without panicking.

system · March 25, 2019, 8:30am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
A Formal Look at Pinning language design	25	8038	June 16, 2019
[post] Safe pin projections through view types language design	12	1296	June 3, 2022
Async/Await series language design	30	8045	March 25, 2019
Comparison of wirhoutboats' and Niko Matsakis' recent blog series on improving pinning? language design	11	1508	November 11, 2024
Design safe collection API with compile-time reference stability in Rust libs	9	1056	May 12, 2024

Safe Intrusive Collections with Pinning

Related topics