Pin: meaning of invalidation

The Pin documentation says

Memory can be “invalidated” by deallocation, but also by replacing a Some(v) by None, or calling Vec::set_len to “kill” some elements off of a vector. It can be repurposed by using ptr::write to overwrite it without calling the destructor first. None of this is allowed for pinned data without calling drop.

How does Vec::set_len invalidate memory? In the following code

let v = vec![PhantomPinned; 2];
let pinned = unsafe { Pin::new_unchecked(v.get_mut(1).unwrap()) };
unsafe { v.set_len(1) };

If I start pushing to the vec after line 3, then my pinned memory would be invalidated. That is obviously unsound. But is line 3 itself unsound? If I immediately abort the program in line 4, no memory would be invalidated.

In either case, I think the documentation should be updated to clarify what invalidate mean ("memory is overwritten" / "memory can be overwritten in the future without unsafe code" / ...).

Rust's validity/soundness is based on theoretical rules, not on what code is generated in practice.

It's like bishop in chess can only move diagonally because the rules say so, not because you physically can't move the piece in another direction.

So set_len is immediate invalidation, even if the compiler doesn't actually generate code that overwrites the bytes.

5 Likes

To put it differently: The "invalidation" that Pin refers to is an entirely logical concept. It is about when ownership of the underlying memory is taken away from the pinned type and used for another purpose (it could be deallocated or other data could be stored there). set_len conceptually takes ownership from the element back into the vector. If you really want you can define yourself when exactly that ownership transfer happens (e.g. when the next element is stored there OR the vector gets reallocated), but you better make sure you know exactly what you are doing then -- putting it at set_len is a very good default, since that is when this memory becomes up for grabs for other safe code (via push).

Soundness is a property of a safe function exposed to unknown code, so asking whether a single line of code is sound is like asking which color Monday has -- it's an ill-typed question.

Using terminology from Two Kinds of Invariants: Safety and Validity, basically every aspect of Pin is in the realm of "safety invariants", not "validity invariants". This means that violating these rules might not immediately cause UB in a way that Miri can detect or that could go wrong at runtime, it "just" means that invariants are broken and if control enters any code that depends on these invariants, it's your fault when that code goes wrong. The standard library will generally assume all code outside the standard library upholds these invariants all the time; exceptions are made only when that is specifically documented.

It already says that deallocation and replacement (aka overwriting) can mean invalidation. To make this fully precise requires a lot more mathematics than we can put into the documentation, I am afraid.

Though if you have concrete suggestions for improvements, we're always happy about PRs.

5 Likes

But since it's a safety invariant, unsafe code outside the stdlib is technically allowed to temporarily violate it, as long as no other code can observe this temporary violation and it's changed right back, right? Or only the stdlib is allowed to do this?

If only the stdlib is allowed to (temporarily) violate this safety invariant, is it because Pin is defined in the stdlib or because Vec is defined in the stdlib, or both?

Nope, violating the pin invariant may result in instantaneous undefined behavior in certain circumstances. For example, moving a pinned !Unpin self-referential struct may result in instantaneous undefined behavior by creating dangling references. This comes up most often, but not exclusively, in compiler-generated futures.

1 Like

This is allowed only if the stdlib documents which parts are allowed to be temporarily violated. E.g. for &str we document that the validity invariant is just the same as for &[u8], and only the safety invariant requires UTF-8.

If this is not documented, then future versions of the stdlib may turn some of these safety invariants into validity invariants.

3 Likes