Unsoundness in `Pin`

RalfJung · January 6, 2020, 9:42pm

We had some more conversations about this at lunch at MPI-SWS (the research institute I work at), musing about how we'd formally verify the generic Pin, and also the lang team had a design meeting about this. I'll write down some thoughts mostly coming from the "formal specification" perspective; I think @nikomatsakis and/or @withoutboats will write down some more practically minded notes (or it seems Niko did that while I was still typing^^).

As mentioned in my previous post, the big question for a more formal approach to this is: what exactly is the invariant of Pin? In other words, what is the property that makes some value v safely usable at type Pin? (This is the safety invariant of Pin, so I will say "a safe T" or "safe at/for T" when I mean "a value for which the safety invariant of T holds".) While thinking about this, to my surprise I realized that I had a hard time figuring that out for my proposal based on maker traits, while we actually came up with a reasonably-sounding answer for @withoutboats' proposal relying on coherence.

Marker-trait-based invariant

The basic idea for this is to have three unsafe traits, matching the three traits that are relevant for Pin: unsafe trait DerefPin: Deref {}, unsafe trait DerefMutPin: DerefPin + DerefMut {} and unsafe trait ClonePin: DerefPin + Clone {}.

The key trait is DerefPin, and the most important part of its unsafe contract (that the implementer has to prove) is to pick an invariant for what a "pinned P-ptr to Deref::Target" is. So, for example, the unsafe impl<'a, T> DerefPin for &mut 'a T would have to decide what, logically speaking, a pinned mutable reference to T is. It would pick something like "an exclusive borrow for lifetime 'a of a T that is in the pinned typestate" (this is following the formalism that I laid down ~~last year~~ a while ago in this post and this follow-up). It's as if the DerefPin trait had a "virtual" or "logical" field defining that invariant -- an "associated invariant". Furthermore, the DerefPin unsafe contract would demand that Deref::deref can be called for (pointers to) values satisfying this invariant, and that they will then return a value that is a safe Pin<&Deref::Target>. (Yes this is cyclic, but we can define the notion of safe pinned references in advance -- they correspond exactly to the PinMut and PinShr types that we had before the generic Pin was introduced). DerefMutPin and ClonePin would have similar contracts, referring to the invariant picked by DerefPin.

We would then have to add a trait bound to Pin, like struct Pin<P: DerefPin>, and then we could say that the invariant for Pin is exactly whatever the DerefPin impl picked.

I think this is solid, and it avoids talking about "the impl of this trait for this type", instead only ever referring to impls that are "in scope" through explicit bounds. There's "just" one problem: Pin::new. That function is safe, so we have to prove that it returns a safe Pin, and the only thing we have to go by is that P::Target: Unpin. This tells us that for P::Target's (safety) invariant, the "normal" and the pinned typestate are the same. But on its own that doesn't help a bit, when what we have to do is show that the value we return satisfies the invariant for "this is a pinned P-ptr to P::Target". I guess we could add an axiom to the contract for DerefPin that says something like "further, if Self::Target is Unpin (in the semantic sense, not specifically implementing some trait), then the normal invariant for P implies the pinned-P-ptr-to-Target invariant".

Pin::new_unchecked is fine though, it just puts the burden on the caller to prove whatever P requires to be proven for pinned-P-pointers.

So, overall, this would (formally) require adding a DerefPin bound everywhere, thus breaking everything, and also supporting Pin::new requires an extra axiom instead of just "falling out" of the construction. That seems awkward, and not nearly as clean as I had hoped.

Coherence-based invariant

Turns out stating this invariant is much simpler than I thought. It goes something like this: a value v is safe for Pin if all of the following conditions hold:

If P: Deref, then passing a reference to v to Deref::deref is safe to do (will not cause UB), and will return a value that is a safe shared pinned reference (basically, the return value is safe for PinShr<'_, Deref::Target>, if PinShr was still a thing). PinShr is defined like above, i.e., something like "a pointer pointing to some memory that is in the shared-pinned typestate for Deref::Target".
If P: DerefMut, then passing a reference to v to DerefMut::deref_mut is safe to do, and will return a value that is a safe mutable pinned reference (basically, PinMut<'_, Deref::Target>).
If P: Clone, then passing a reference to v to Clone::clone is safe to do, and will return a value that is safe for Pin. This is recursive, but the recursive occurrence is of the kind that I think is fine. (For the initiated: the recursive occurrence is, I think, in a "positive position", meaning the function of which we need a fixed point is monotone, and such functions do indeed have a suitable fixed point that we can use for our definition.)

With this, whoever calls new_unchecked has to do the appropriate verification of deref, deref_mut and clone for the specific value they use.
This clearly explains what went wrong at least in some of the exploits by @comex: whoever constructs a Pin<&T> has to prove the second clause above, and they did this by assuming it to be vacuously true because &T does not implement DerefMut. Except, they were wrong about this. Similar for Pin<&mut T> and Clone.

This also explains why Pin::new is okay: for Deref::Target: Unpin, the argument is a valid P so clearly calling Deref::deref and the other functions on a shared reference to that value is fine. PinShr<'a, T> is the same as &'a T, so the first clause is satisfied. Also, PinMut<'a, T> is the same as &'a mut T, so the second clause is satisfied. And the third clause is the "inductive case", and thus satisfied by the "induction hypothesis". (Yeah this sounds like magic, but I am pretty sure if we pick the least fixpoint this is exactly what will happen.)

Basically, the observation here is that the only pinned pointer types that "really matter" are shared and mutable references. Everything else only matters insofar as we can call as_ref and as_mut to get a pinned shared/mutable reference, and then go on with that.

However, even with the proposed negative impls for the #[fundamental] types, we'd still rely on the orphan checker to make sure that Rc and Arc don't implement DerefMut. That's probably okay, but it seems suboptimal at best to rely on this for safety -- who knows in which ways we will be able to extend/improve the orphan check in the future. So, I think we should require either a specific impl or a specific "negative impl" when reasoning about requirements like the one for Pin here. Then we don't rely on the details of the orphan rule any more. We'd still rely on coherence, but I would feel fairly good about this overall setup. I actually came to like this better than my marker-trait-based proposal. If this kind of coherence breaks, we have a totally different set of soundness issues...

We (at MPI-SWS) hope to formalize this some time soon-ish (and when I say "we" I really mean another PhD student and an intern, not me ). We don't want to actually formally verify the trait system, so we'll make some assumptions of the form "the Deref impl for this type is exactly that", or "this type does not implement Clone", but we should be able to verify everything else -- and this also makes it clear what exactly we rely on from the trait system. (One "fun" aspect of this will be that even Pin<&mut T> will be defined like above, so actually verifying the specific operations for that might be non-trivial. What we really want is to know that this is PinMut<T>, but we might have to do something silly like call <&mut T as DerefMut>::deref_mut to get there, even though that's a NOP... well, we will see, that's our problem. )

Topic		Replies	Views
The Pin unsoundness and `CoerceUnsized` language design	19	3968	January 30, 2020
`Pin` from `&mut` refs? libs	10	2533	January 28, 2019
Can `Pin` own its referent instead of borrowing mutably? language design	74	5472	March 25, 2019
Pin ergonomics	29	1459	October 28, 2024
A Formal Look at Pinning language design	25	8071	June 16, 2019

Unsoundness in `Pin`

Marker-trait-based invariant

Coherence-based invariant

Related topics