A Formal Look at Pinning

Recently, a new API for “pinned references” has landed as a new unstable feature in the standard library. The purpose of these references is to express that the data at the memory it points to will not, ever, be moved elsewhere. Others have written about why this is important in the context of async IO. The purpose of this post is to take a closer, more formal look at that API: We are going to take a stab at extending the RustBelt model of types with support for pinning.

https://www.ralfj.de/blog/2018/04/05/a-formal-look-at-pinning.html

I am looking forward to hearing your comments! In particular, I am curious if the made-up syntax for making the typestate invariants more precise was helpful.

34 Likes

Great blog post! Thanks so much for doing the formal side of this work. :smiley:

One advantage of this proposal is that it is local: Existing types (and new types) that are designed without considering Pin remain sound in the presence of this new typestate, even if we automatically derive the Unpin trait for these types.

This is really the most impressive thing about Rust's type system to me. We are able to add these very novel invariants using a bit of encapsulation & unsafe, without any additional language level features, and without impacting the rest of the ecosystem at all.

1 Like

@RalfJung I don’t personally do that much theory, but I certainly appreciate the greater confidence that your work affords the rest of us :slight_smile: Thanks!

4 Likes

In particular, I am curious if the made-up syntax for making the typestate invariants more precise was helpful.

It was for me! Good job!

But please consider putting an extra newline code blocks containing two multiline definitions.

In particular, if we changed the signature of init() to take &mut, we could easily cause UB by writing the following code:`

Could that please be more explicit: self: &mut Self? I first thought you meant self: &mut Pin<Self> and had forgotten .as_pin()

Edit: corrected illegal syntax, Ralf's post below contains the original...

Thanks for your encouraging feedback! :slight_smile: I was a little (well, a lot) worried that I would be way too formal and proceed way too quickly.

Just out of curiosity: If I managed to somehow use a JS-based LaTeX engine here and give this a more math-y flavor, do you think that would improve things or not? It's more readable IMHO, but it also moves away from Rust code.

Thanks for the suggestion, done.

I don't entirely follow, AFAIK both of the things you wrote are illegal syntax? What I meant is &mut self, which is equivalent to self: &mut Self. I will try to make this clear in the post.

I'm still learning rust, have no formal math background what so ever. (Although, I do like to watch math related youtube videos by the Numberphile, 3Blue1Brown, and some others.)

I had to go back and reread parts a few times, but it's a blog post I read in my own time. If you were to give a RustBelt talk about it would most certainly go over my head. That is assuming I was attending in person, VODs would allow me go back too, although that is much less ergonomic then scrolling up. So I would probably give up, like I did with that RustBelt talk... (I intend to see if I fare better after I finish Programming Rust.)

Uhm... I know how to pronounce LaTeX, I think... Do whatever people in Rust working groups find best?

You did!

1 Like

In particular, I am curious if the made-up syntax for making the typestate invariants more precise was helpful.

I read two "levels" of formal notation in your post, both simplified from the formal paper. I'll have to spend some more time to fully grasp the forall-based notation, but the T.own and T.shr syntax is very clear and understandable to me.

With that said, this is the first time I've understood Unpin. So yeah, the precision did allow me to grasp the core concepts more deeply. Thank you!

As someone without a PL/language theory background, the significance has only become apparent to me as I have come to Rust. I am glad you "took the gamble" to write with more formal syntax, because for me, this is both interesting and educational.

Perhaps a separate blog post on "formal syntax 101" for folks without formal training could serve as a "rosetta stone" for this and future blog posts where you would like to write with more precision in a "formal-lite" style. Personally, I would love to have a gentle on-ramp to understanding topics more formally like this.

I think this blog post was just the right speed for me. Many thanks, @RalfJung! I thought this was great.

4 Likes

It strikes me as interesting that the whole pinning API could be implemented with no language changes, but we introduce a new typestate to model it. It seems like the pin typestate should be a “composition” of other typestates or something, if that makes sense. Is there an intuition for why this is? Is it that the own and shr typestates actually express weaker assumptions that full blown rust ownership and borrowing?

@mark-i-m I think that could be because the language has unsafe escape hatches that the model doesn’t. There is no safe way to construct a Pin, rather you have to manually ensure you will follow the invariants it requires and call unsafe { Pin::new_unchecked(&mut foo) }. That is hidden from most users as PinBox does the guaranteeing for you, and once the exact invariants are pinned down:sunglasses: then there may be a stack pinning API to do this for you; but behind the scenes these are unsafely claiming to Pin that they have guaranteed its invariants. (There is also a safe Pin<T>::new(&mut T) where T: Unpin, but the invariants for Pin<T> where T: Unpin are identical to &mut T).

2 Likes

Good observation. This is very interesting indeed! It is testimony of a type system with powerful abstraction features. Notice that this is not the first time this happened: T.shr was introduced the same way! If we removed UnsafeCell and all its users, we could ditch T.shr and just say "shared data is read-only". And if the compiler was less keen on making optimizations, no language change whatsoever would be necessary to introduce interior mutability, just like no language change was necessary to introduce pinning.

There's actually something in common between &T and Pin<T> that makes them both candidates for a separate typestate, and that's a lack of free operations---where by "free" I mean operations that can be performed regardless of T. Notice that if you have Vec<T>, there are tons of things you can do even if you have no idea what T is (e.g. iteration, removing or reordering elements). But for &T, literally the only thing you can do is duplicating the reference (exploiting that &T: Copy). For Pin<T>, you don't even have that, there is essentially nothing left. This is a lot like a newtype, but shared and pinned references are both something like "universal newtypes", and the shared and pinned typestate let a type control what happens to it when it gets wrapped in that newtype.

Does this make sense?


Ah, that is interesting. I did not intend there to be two "levels". You also mentioned that Unpin made sense to you, which does use forall. I'd be curious if you could formulate a question concerning the notation you are still having trouble with?

Thanks for the suggestion! That sounds like a good idea. I'll put it on my list.

1 Like

I should use the opportunity to ask y’all something. :slight_smile: So in the post, I wrote

ptr.points_to_owned(data: U) where List<Bytes>: TryInto<U> :=
  exists |bytes| bytes.try_into() == Ok(data) && ptr.points_to_owned_bytes(bytes)

If I were to go full-on with Rust syntax, I would instead have written

impl Pointer {
  fn points_to_owned<U>(self, data: U) -> Prop where List<Bytes>: TryInto<U> {
    exists |bytes| bytes.try_into() == Ok(data) && self.points_to_owned_bytes(bytes)
  }
}

Do you think this is better or not? It’s more text to read, begs questions like "don’t we want &self" that are irrelevant here, surfaces the strange type Prop, but it’s also looking more familiar.

Semantic types would then be a trait

trait Type {
  fn own(self, bytes: List<Byte>) -> Prop
  // And more: shr, pin
}

and things like

Box<T>.own(ptr: Pointer) :=
  exists |inner_bytes| ptr.points_to_owned(inner_bytes) && T.own(inner_bytes)

would become

impl<T: Type> Type for Box<T> {
  fn own(self, ptr: Pointer) -> Prop {
    exists |inner_bytes| ptr.points_to_owned(inner_bytes) && T.own(inner_bytes)
  }
}

The second argument doesn’t match the trait definition; it is “magic” in that, when implementing the trait, it can have any type U such that List<Byte>: TryInto<U>.

There’s more “syntactic noise” here that I feared might distract from the content, but that noise is also familiar. What do you think?

1 Like

Maybe change -> Prop to -> proof Prop or something like that. Or maybe take a proof self. That would make the Type trait illegal Rust too, which I believe would be useful in the (probably rare) cases that you show it.

Either would do just fine for me, so long are there is a post about the syntax if you deem this formal-lite style worth while.

Oh, but these are not proofs. These are just definitions of propositions, like "what does it man to own a Box<T>". The proofs are kept in informal English in the post, and I plan to keep it that way.

1 Like

This probably would have made it clearer for me, but I'm not sure if I'm in the minority. Also a BNF thingy with the introduced forms would have been handy as a reference (it can be hard sometimes to infer these from the prose).

2 Likes

Yeah, that makes sense. Thanks :slight_smile:

@RalfJung SelfReferential should implement !Unpin, shouldn’t it? Unpin is an auto trait, and there are no negative impls for Unpin in the standard library (as of right now), so SelfReferential implements Unpin by default. Thus, I can cause undefined behavior in safe code with this type:

fn main() {
    let mut s = PinBox::new(SelfReferential::new());
    s.as_pin().init();
    println!("{:p} {:p}", &s.data, s.self_ref); // same address twice
    println!("{:?}", s.as_pin().read_ref()); // prints Some(42)
    let mut s = *Into::<Box<SelfReferential>>::into(s);
    let s = Pin::new(&mut s);
    println!("{:p} {:p}", &s.data, s.self_ref); // different addresses now!
    println!("{:?}", s.read_ref()); // reads from freed heap memory!!
}

@withoutboats I think raw pointers should implement !Unpin so that mistakes like this don’t slip by accidentally. Otherwise, the pinning API is “unsafe by default”, which I believe goes against the spirit of Rust. What’s unfortunate though is that there are very few hand-written self-referential structs, so most types with raw pointers would have to implement Unpin explicitly to become compatible with pinning and, by extension, futures. Still, types made of raw pointers are still relatively rare, so perhaps it’s not so bad.

Another option is to rely on a lint to catch the mistake; the lint would detect when a field of a struct that is a raw pointer is assigned a pointer (directly or indirectly) into the struct, and suggest that the type should implement !Unpin. If we choose this option, this lint ought to be in rustc (though its life could start in Clippy) so that everyone who needs to see it does see it. A problem with this lint is that there might be false positives, especially with indirect pointers. For example, consider this struct:

pub struct Foo<T> {
    items: Vec<T>,
    current: *const T, // points into items
}

It’s perfectly safe to move a Foo, but the lint would think that this type should implement !Unpin, even though it’s not necessary.

1 Like

Yes it should be. Unpin is only discussed way later in the post, so I did not want to bring it up here... but maybe you are right that this is more confusing than the alternative. I changed the post, thanks.

Sorry for the long delay, @RalfJung--I was on vacation.

Ah, that is interesting. I did not intend there to be two “levels”.

I think I probably should have said there was a part of what you said that I felt I understood, and a part that I felt I didn't. The part that I didn't understand seemed more "advanced", so I created two tiers in my mind.

All of that is to say I probably should have just said "I understood part of your post and will need to spend more time to fully grasp the rest". :slight_smile:

I’d be curious if you could formulate a question concerning the notation you are still having trouble with?

That's always a tough question--I'm not completely sure what it is, specifically that I don't know--after all, I don't know, right? :). So here's my best guess: I don't yet fully understand the for notation in HRTB, and I am assuming the forall in your notation here is analogous or somewhat analagous. (Not yet knowing either domain, I have no real basis for making that statement other than both involve "for" in a way that is unfamiliar to me.)

So upon seeing a possible second instance of for that I would like to understand, I will dig in on HRTB, then re-read your post to see if I can understand it more deeply. To the degree that I still have questions after that, then I will come back here to ask. (It's important to me to have done my homework before asking others to spend their time to help me.)

I hope that helps clarify my statements. And thanks again--this is interesting stuff.

I know. :slight_smile: You did a good job though :smiley:

Yes, for in HRTB and forall in my notation are closely related. They are both universal quantifiers. Both essentially just say "for all", hence the name. :wink: So, for example, forall |x, y| even x && even y -> even (x + y) means that the given statement about x and y holds for all possible choices of x and y.

This is besides the main point, but it seems to me that many of the mut-s in the example are redundant. All pointers can be *const, and more importantly, the signature of get_ref could just ask for a self: &Pin<SelfReferential>. Am I missing something?