Rust 1.20 caused pinning to become incorrect


#1

This week was an interesting one for me in the realm of “what counts as safe uses of unsafe code”.

The kick-off was this bug report: “ManuallyDrop causes pinning to be unsound?” https://github.com/asajeffrey/josephine/issues/52.

A stripped-down example is at https://play.rust-lang.org/?gist=607e2dfbd51f4062b9dc93d149815695&version=nightly. The idea is that there’s a type Pin<'a, T>, with a method pin(&'a self) -> &'a T whose safety relies on the invariant “after calling pin.pin(), if the memory backing the pin is ever reclaimed, then the pin’s destructor must have been run”.

This invariant was maintained by Rust until #[allow(unions_with_drop_fields)] was added, and used by ManuallyDrop https://doc.rust-lang.org/src/core/mem.rs.html#949.

Now, this code is certainly incorrect (in the sense of https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html) with respect to Rust 1.20. The definition of “incorrect” is a contextual one: P is incorrect if there is a safe context C such that C[P] generates UB. The interesting thing is that (AFAIK!) this program was correct prior to Rust 1.20, because the context that I know of that produces UB uses ManuallyDrop (mem::forget isn’t enough).

The interesting thing is that this came up in the wild, we already knew that theoretically additions to std could cause programs to become incorrect, but this is an instance of it actually happening.


#2

This is a very interesting issue indeed! There’s more discussion about the involvement of #[may_dangle] here, and about unions and ManuallyDrop here.

In particular, there’s an example of unsafe code (that did not come up in the wild) that is also “correct” according to the contextual rule quoted above (and remains so with Rust 1.20), but is incompatible with Pin – effectively, this is a (horrible) pre-1.20 implementation of ManuallyDrop without unions.


#3

Thanks! One thing that’s interesting about your example is that, while it is correct, it’s not obvious whether or not it satisfies the rules for may_dangle. I thiiiiiiink it does, but it depends on how you interpret the language (in particular whether discarding data counts as a “use”) :confused:


#4

Stylo is beyond wild. I remembered reading your comment that this is how the Gecko bindings in Stylo encode C++ unions.

#[repr(C)]
#[derive(Debug)]
pub struct URLValueData_RustOrGeckoString {
    pub mString: root::__BindgenUnionField<::nsstring::nsStringRepr>,
    pub mRustString: root::__BindgenUnionField<
        ::gecko_bindings::structs::ServoRawOffsetArc<root::RustString>,
    >,
    pub bindgen_union_field: [u64; 2usize],
}
#[repr(C)]
pub struct __BindgenUnionField<T>(::std::marker::PhantomData<T>);
impl<T> __BindgenUnionField<T> {
    #[inline]
    pub fn new() -> Self {
        __BindgenUnionField(::std::marker::PhantomData)
    }
    #[inline]
    pub unsafe fn as_ref(&self) -> &T {
        ::std::mem::transmute(self)
    }
    #[inline]
    pub unsafe fn as_mut(&mut self) -> &mut T {
        ::std::mem::transmute(self)
    }
}
impl<T> ::std::default::Default for __BindgenUnionField<T> {
    #[inline]
    fn default() -> Self {
        Self::new()
    }
}
impl<T> ::std::clone::Clone for __BindgenUnionField<T> {
    #[inline]
    fn clone(&self) -> Self {
        Self::new()
    }
}
impl<T> ::std::marker::Copy for __BindgenUnionField<T> {}
impl<T> ::std::fmt::Debug for __BindgenUnionField<T> {
    fn fmt(&self, fmt: &mut ::std::fmt::Formatter) -> ::std::fmt::Result {
        fmt.write_str("__BindgenUnionField")
    }
}
impl<T> ::std::hash::Hash for __BindgenUnionField<T> {
    fn hash<H: ::std::hash::Hasher>(&self, _state: &mut H) {}
}
impl<T> ::std::cmp::PartialEq for __BindgenUnionField<T> {
    fn eq(&self, _other: &__BindgenUnionField<T>) -> bool {
        true
    }
}
impl<T> ::std::cmp::Eq for __BindgenUnionField<T> {}

#5

My example does not have a Drop implementation. It doesn’t use #[may_dangle]. How can it possibly violate the #[may_dangle] rules?


#6

I was including the Drop defn from https://github.com/rust-lang/rust/issues/32836#issuecomment-362556443


#7

I wrote a comment https://github.com/rust-lang/rust/issues/32836#issuecomment-362714494 that would have been better placed here…

I don’t think anyone’s in love with the contextual defn of “correctness”, primarily because it’s brittle wrt the observational power of contexts. I’d be a lot happier with a semantic defn in terms of invariants.

The only thing I’d ask for is please can we give ourselves some wiggle room and define two invariants for rely-guarantee reasoning (code can rely on R and should guarantee G, where G implies R). That way there’s some room for strengthening R and weakening G. If we just have one invariant (i.e. R=G) we’re stuck never being able to change them!


#8

I’m not sure I understand why the drop impl for String is called here? I’m assuming that Drops work like destructors in C++, though, which is: the only reason a destructor gets called on a member object is because the destructor of the containing object was called. Does Rust have a non-recursive implementation that skips over types without explicit Drop impl’s rather than generating Drop impl’s which do recursive calls?


#9

In fact, something similar does exist in the wild - almost 3 years old, ~400,000 downloads:

https://crates.io/crates/nodrop


#10

I don’t think NoDrop is incompatible with Pin, since the implmentation of Drop for NoDrop<T> (https://docs.rs/nodrop/0.1.12/src/nodrop/lib.rs.html#81) is:

    impl<T> Drop for NoDrop<T> {
        fn drop(&mut self) { ... }
    }

and doesn’t use may_dangle or anything else to disable the drop checker. So this doesn’t cause problems: the only reason why ManuallyDrop<T> does is because of the way it treats drop checking.


#11

Oh, you’re right. That seems like an accident, though… like, NoDrop definitely satisfies the condition for #[may_dangle] to be permissible (namely, the destructor doesn’t do anything weird with borrowed T values), so it could be added, and I wouldn’t consider adding it to be a fundamental change in the assumptions the crate’s making. But that’s going back somewhat into the realm of the hypothetical.


#12

I found another way to break Pin, using only the standard library and safe code, though it only works with unwinding enabled:

https://play.rust-lang.org/?gist=86b8a068bb072fba4aad47ddbf3e5ab8&version=nightly

…But you could make the case that it’s just a bug in the standard library.

Basically, VecDeque's Drop impl (which has may_dangle set) drops the two halves of its ring buffer separately:

            ptr::drop_in_place(front);
            ptr::drop_in_place(back);

If the drop of front panics, back never gets dropped, but the backing buffer still gets deallocated.

(This doesn’t work with structures or built-in arrays, because the compiler-generated drop glue will always try to drop all fields/elements, even if one drop call panics. And it doesn’t work for Vec because its Drop impl relies on the built-in [T] drop glue.)


#13

#[may_dangle] is not stable yet, so NoDrop can’t use it. NoDrop is effectively deprecated, now that ManuallyDrop exists.

Obviously you’ve put a lot of thought into this but I can’t see why it is said that Pin “is correct” without reservation. Since post-leakpocalypse such mechanisms all shifted over to using closures for scopes instead of relying on mandatory drops.


#14

Oh my goodness, VecDeque really?

That raises all kinds of interesting issues, like is that a bug with VecDeque, and if so would it be a breaking change to fix it?

So good catch. Pin is incorrect in the presence of panic unwinding. I wonder how much other code is? For example, code which uses closures to ensure clean-up code runs.


#15

PS: Perhaps I should rename the topic to “Rust 1.9 caused etc.” since that’s when panic unwinding was introduced :slight_smile:


#16

I did say “(AFAIK!)” because there could still be weird corner cases in std (like hey, panic unwinding and VecDeque). I think my main point still stands, which is that the contextual definition of correctness is brittle, and that this is an interesting example of it producing problems “in the wild”.


#17

Code which uses closures to perform cleanup should be using destructors of variables in scope at the time the closure is called to perform the cleanup - this way even if the closure panics, the cleanup still runs.

AFAICT, the problem with Pin is that it relies on memory not being freed unless the destructor was run, and this cannot be relied on: in general, you can only rely on destructors running if you are the one calling them (implicitly) and that’s why since the leakpocolypse, people have used functions-taking-closures to perform cleanup (since the destructor is being called from their own code’s unwinding path in that case).


#18

Since the leakapocalypse, the only destructors that we guarantee that will always be run are the destructors of stack variables (e.g. of guard closures).

We have never tried to guarantee anything of the kind “address-exposed data can’t be mem-freed without its destructor being run”, as VecDeque shows (IMO yes that’s a bug, but both behaviors are safe, and we didn’t specify it the wrong way so we can change it).


#19

@arielb1 brings up an interesting point, which is that the API evolution RFC (https://github.com/rust-lang/rfcs/blob/master/text/1105-api-evolution.md) doesn’t say much about behavioural changes, but does say:

This RFC does not attempt to provide a comprehensive policy on behavioral changes, which would be extremely difficult. In general, APIs are expected to provide explicit contracts for their behavior via documentation, and behavior that is not part of this contract is permitted to change in minor revisions.

This has an impact on the contextual defn of correctness, as the VecDeque example shows: what if C[P] has UB for reasons that were not part of its contract? Ditto if C[P] has no UB.

@RalfJung: this might be a way to tie the contextual definition of correctness back to rely-guarantee reasoning, if the contracts are themselves stated using rely-guarantee.


#20

If it’s a problem that VecDeque doesn’t continue to call destructors of other elements if there is a panic in one of them’s drop, which data structure does that “correctly” in that sense? Apart from Vec? Does HashMap, BTreeMap? I kind of doubt it.