Rust 1.20 caused pinning to become incorrect

I wrote a comment https://github.com/rust-lang/rust/issues/32836#issuecomment-362714494 that would have been better placed here…

I don’t think anyone’s in love with the contextual defn of “correctness”, primarily because it’s brittle wrt the observational power of contexts. I’d be a lot happier with a semantic defn in terms of invariants.

The only thing I’d ask for is please can we give ourselves some wiggle room and define two invariants for rely-guarantee reasoning (code can rely on R and should guarantee G, where G implies R). That way there’s some room for strengthening R and weakening G. If we just have one invariant (i.e. R=G) we’re stuck never being able to change them!

I’m not sure I understand why the drop impl for String is called here? I’m assuming that Drops work like destructors in C++, though, which is: the only reason a destructor gets called on a member object is because the destructor of the containing object was called. Does Rust have a non-recursive implementation that skips over types without explicit Drop impl’s rather than generating Drop impl’s which do recursive calls?

In fact, something similar does exist in the wild - almost 3 years old, ~400,000 downloads:

https://crates.io/crates/nodrop

I don’t think NoDrop is incompatible with Pin, since the implmentation of Drop for NoDrop<T> (https://docs.rs/nodrop/0.1.12/src/nodrop/lib.rs.html#81) is:

    impl<T> Drop for NoDrop<T> {
        fn drop(&mut self) { ... }
    }

and doesn’t use may_dangle or anything else to disable the drop checker. So this doesn’t cause problems: the only reason why ManuallyDrop<T> does is because of the way it treats drop checking.

Oh, you’re right. That seems like an accident, though… like, NoDrop definitely satisfies the condition for #[may_dangle] to be permissible (namely, the destructor doesn’t do anything weird with borrowed T values), so it could be added, and I wouldn’t consider adding it to be a fundamental change in the assumptions the crate’s making. But that’s going back somewhat into the realm of the hypothetical.

2 Likes

I found another way to break Pin, using only the standard library and safe code, though it only works with unwinding enabled:

https://play.rust-lang.org/?gist=86b8a068bb072fba4aad47ddbf3e5ab8&version=nightly

…But you could make the case that it’s just a bug in the standard library.

Basically, VecDeque's Drop impl (which has may_dangle set) drops the two halves of its ring buffer separately:

            ptr::drop_in_place(front);
            ptr::drop_in_place(back);

If the drop of front panics, back never gets dropped, but the backing buffer still gets deallocated.

(This doesn’t work with structures or built-in arrays, because the compiler-generated drop glue will always try to drop all fields/elements, even if one drop call panics. And it doesn’t work for Vec because its Drop impl relies on the built-in [T] drop glue.)

6 Likes

#[may_dangle] is not stable yet, so NoDrop can’t use it. NoDrop is effectively deprecated, now that ManuallyDrop exists.

Obviously you’ve put a lot of thought into this but I can’t see why it is said that Pin “is correct” without reservation. Since post-leakpocalypse such mechanisms all shifted over to using closures for scopes instead of relying on mandatory drops.

Oh my goodness, VecDeque really?

That raises all kinds of interesting issues, like is that a bug with VecDeque, and if so would it be a breaking change to fix it?

So good catch. Pin is incorrect in the presence of panic unwinding. I wonder how much other code is? For example, code which uses closures to ensure clean-up code runs.

PS: Perhaps I should rename the topic to “Rust 1.9 caused etc.” since that’s when panic unwinding was introduced :slight_smile:

I did say “(AFAIK!)” because there could still be weird corner cases in std (like hey, panic unwinding and VecDeque). I think my main point still stands, which is that the contextual definition of correctness is brittle, and that this is an interesting example of it producing problems “in the wild”.

1 Like

Code which uses closures to perform cleanup should be using destructors of variables in scope at the time the closure is called to perform the cleanup - this way even if the closure panics, the cleanup still runs.

AFAICT, the problem with Pin is that it relies on memory not being freed unless the destructor was run, and this cannot be relied on: in general, you can only rely on destructors running if you are the one calling them (implicitly) and that’s why since the leakpocolypse, people have used functions-taking-closures to perform cleanup (since the destructor is being called from their own code’s unwinding path in that case).

1 Like

Since the leakapocalypse, the only destructors that we guarantee that will always be run are the destructors of stack variables (e.g. of guard closures).

We have never tried to guarantee anything of the kind “address-exposed data can’t be mem-freed without its destructor being run”, as VecDeque shows (IMO yes that’s a bug, but both behaviors are safe, and we didn’t specify it the wrong way so we can change it).

2 Likes

@arielb1 brings up an interesting point, which is that the API evolution RFC (rfcs/text/1105-api-evolution.md at master · rust-lang/rfcs · GitHub) doesn't say much about behavioural changes, but does say:

This RFC does not attempt to provide a comprehensive policy on behavioral changes, which would be extremely difficult. In general, APIs are expected to provide explicit contracts for their behavior via documentation, and behavior that is not part of this contract is permitted to change in minor revisions.

This has an impact on the contextual defn of correctness, as the VecDeque example shows: what if C[P] has UB for reasons that were not part of its contract? Ditto if C[P] has no UB.

@RalfJung: this might be a way to tie the contextual definition of correctness back to rely-guarantee reasoning, if the contracts are themselves stated using rely-guarantee.

1 Like

If it’s a problem that VecDeque doesn’t continue to call destructors of other elements if there is a panic in one of them’s drop, which data structure does that “correctly” in that sense? Apart from Vec? Does HashMap, BTreeMap? I kind of doubt it.

I guess this is an issue of what the implicit contract for Drop is. Is it expected to drop all of its owned data? Even in the presence of drop code panic? Is it allowed to panic? Is it allowed to abort? Is there any contract at all?

1 Like

This is a great point. However, rely-guarantee is (AFAIK) terminology from concurrency research and describes modular (thread-per-thread) verification of concurrent interactions. I'm not actually aware of any research for type system invariants that provide "wiggle room". Somehow, this would have to make a difference between the types that are produced by an unsafely implemented library (where it has to show the stronger invariant) and types that are consumed (where it can only rely on the weaker invariant). But given this is all higher-order and we can both produce and consume closures, I will only believe this really works once I see it worked out. :slight_smile:

So, while I agree with the goal, I actually have no idea how to formally do that. I know I should write this down as a future research project though :smiley:

It's not just brittle. It's anti-modular: Two libraries can be correct independently but break in combination. Just imagine VecDeque not being in libstd; then VecDeque and Pin form such a pair.

So, I'd say the contextual definition is plain broken.

Could you elaborate? I don't follow. This doesn't help with the anti-modularity, right?

I agree with what has been said above that ever sine the leakpocalypse, the contract is something like "drop is called on a best-effort basis; you can't ever rely on it for safety" (EDIT: except if it's all your code. You can rely on your own stack objects getting dropped). But probably this has not been documented clearly enough.

Pinning is unsafe in the presence of arena-allocated Rc data anyway, which is something we want to allow one day:

struct Cyclic<A, T> {
    cycle: RefCell<Option<Rc<Cyclic<T, A>, A>>,
    data: T
}

fn foo(arena: &ArenaAllocator) {
    let rc1 = arena <- Cyclic {
        cycle: Default::default(),
        data: Pin::new(),
    };
    
    // make a cycle    
    *rc1.cycle.borrow_mut() = Some(rc1.clone());
    rc1.data.pin();

    // no destructors are run, but the arena (and its containing Rc) is eventually free-ed.
}

Arena-allocated data doesn’t get its destructor called? That’s annoying. It would be nice if there were a way to call the destructors for data that was marked may_dangle or some similar marker for drop code that is safe in the presence of cycles. But that’s a different topic!

Yes, more research is needed!

We might need some kind of Kleisli semantics future world trick, that is you don’t just show rely/guarantee for the current R and G, but for all future R’ and G’. This is the usual trick for getting monotonicity in the presence of contravariance (in particular for higher-order functions).

For contextual correctness, I was wondering if there’s a way to interpret the defn “for all safe C . C[P] has no UB” as something like “for all safe C . for all implementations I . (I satisfies appropriate contracts) implies (C[P] linked against I has no UB)”. There is lots of handwaving here, but this might be a way to recover a defn of correctness based on invariants (assuming that’s how the contracts for I are defined).

guess in the example

  • the counter in Rc never reaches 0 because of the reference loop
  • so drop is never called on the Pin instance
  • yet the memory it occupies is free-ed when ArenaAllocator itself is free-ed