Destructuring Droppable structs

kornel · June 23, 2024, 3:03pm

Could this be an opt-in feature that gives other Drop types ability to move out of like the Box?

matthieum · June 23, 2024, 3:55pm

That's an interesting question. I had not been considering the parallel with DerefMove. I'm not sure to be honest.

CAD97 · June 23, 2024, 4:09pm

(Ab)using “DerefMove” would be an interesting way to handle moving out of impl Drop types, for sure. If we get a DerefMove before destructurable Drop, I expect we'll see some libraries use it for such, similar to how we see some usage of Deref[Mut] for emulating field aliasing and/or read-only fields today.

My own ptr_union::UnionN already uses an explicit unpack for destructure, although there it's due to structural necessity.

serde_json::Value is perhaps a perfect case study for wanting this. It's fundamentally just the enum and the defaulted drop glue is sufficient; it wants a custom impl Drop not out of necessity but because that allows it to avoid the recursive drop.

example

impl Drop for Value {
    fn drop(&mut self) {
        // todo: improve alloc utilization
        let mut q = vec![mem::take(self)];
        while let Some(v) = q.pop() {
            match v {
                Value::Array(a) => q.extend(a),
                Value::Object(o) => q.extend(o.into_values()),
                _ => continue,
            };
        }
    }
}

Value wouldn't be very happy with a DerefMove based destructure, and would probably stick to just the default drop glue rather than make the common usage worse for a conditionally improved drop impl.

Based on this, I currently believe the ideal behavior would probably be, for any impl Drop type:

If all structure is visible and the type is not marked #[non_exhaustive], allow destructuring the value, but not moving from fields individually, with a lint warning that custom drop glue is being bypassed. As noted in the OP, this is for all practical purposes not a new capability, as the destructure site could instead swap out each field.
If any structure is not visible, continue to forbid destructuring. Partially public structure can continue to use swap based workarounds if desired, and this avoids problematic questions w.rt. mixed public and private fields receiving needed fixups in impl Drop.
Allow the impl Drop block to be annotated with an attribute (bikeshed: #[diagnostic::ignorable_drop]) that makes it so destructuring the type doesn't cause the aforementioned lint. (Maybe: also impact if the type mem::needs_drop if all drop glue is ignorable.)

Jules-Bertholet · June 23, 2024, 5:28pm

Any type relying on Drop forbidding moves out of public fields for soundness is already on thin ice, due to conflict with replace_with (which RustBelt proves sound).

matthieum · June 23, 2024, 5:40pm

Any type relying on Drop forbidding moves out of public fields for soundness is already on thin ice, due to conflict with replace_with (which RustBelt proves sound).

Indeed, and there have been mentions for at least one of the crates that combining it with replace_with was unsound. What is less clear is whether the crate's technique itself is unsound, or the combination somehow unearths a conflict in the theory underpinning Rust's soundness. The former is most likely, but its creator believes his crate sound and nobody so far has proved it was not.

Jules-Bertholet · June 23, 2024, 5:43pm

There is no such theory, at least not a complete one officially blessed by the Rust lang team. RustBelt is the closest thing we have, and under that model, replace-with is sound and partial-borrow is unsound.

withoutboats · June 24, 2024, 10:00am

(NOT A CONTRIBUTION)

I agree that there needs to be a transition story for users who currently rely on this behavior for correctness, but I don't agree that the abstractly correct behavior would be for destructability with a destructor to be "opt in." I'd rather see a cross-edition change with an opt-out that can be added to types with the current behavior by rustfix. The syntax for controlling in which modules a type can be destructured is privacy, having a destructor should really have nothing to do with it.

Rust's privacy system is unfortunately also baroque and strange. You highlight that you can't have private variants of enums for example: the correct code if you want an enum that can't be destructured outside of this module is to wrap it in a newtype struct for public consumption. This sucks but its how it works. There are also confusing things like that a tuple struct with any private fields can't be destructured but a record struct with any public fields can be (I had to test this to find out; I surely don't have these rules memorized). Making the privacy rules less complicated is maybe also an appealing project, but seems orthogonal to fixing the rules here.

matthieum · June 24, 2024, 4:16pm

I disagree, and I think API designers will too.

SemVer is yet another reason to pick an opt-in behavior here:

Adding destructuring is backward compatible.
Removing destructuring is NOT backward compatible.

So what happens if an API designer forgot to add the attribute (or whatever):

Opt-in: add the attribute, make a patch/minor release, move on.
Opt-out: add the attribute, make a major release, watch the ecosystem cry out in anguish and curse you, apologize and curse the Rust developers.

Opt-in is much more forgiving. Let's make it opt-in.

TIL...

I wouldn't say "correct".

I've used the enum-in-a-struct when the enum is an implementation detail, and that's a fine (if boilerplatey) trick.

It's not a suitable solution when exposing the enum is the whole point, though.

Jules-Bertholet · June 24, 2024, 7:50pm

Yes it can, if you ask nicely:

mod inner {
    pub struct Foo(pub bool, bool);
}

use inner::*;

fn foo(a: Foo) {
    let Foo { 0: _a, .. } = a;
}

(this makes 0 sense ofc)

Nemo157 · June 24, 2024, 8:29pm

It does make some sense if you consider the Foo(_, _) pattern as a separate item that inherits the least privilege of the fields, same as the separate fn Foo(_, _) function; while the Foo { .. } constructor and pattern behaves the same for all three struct syntaxes.

Jules-Bertholet · June 25, 2024, 10:59am

(From the post)

However, JSON is often parsed from untrusted input, and its very plausible to define pathological JSON which would cause the ordinary destructor to overflow its stack. For example, consider the JSON [[[]]]. Parsing this to a Value will result in an array containing a array containing an array which is empty. With the automatic destructor code that is generated for you, this will perform 3 nested calls to deallocate each array variant. A pathological JSON string would contain enough nested arrays to overflow the stack when destroying a value of that type, so serde_json’s Value type is meant to have a custom destructor which uses a form of tail call optimization to avoid having the stack space use grow with the depth of the JSON.

The problem is that if you implement this destructor on the Value type, now you can’t destructure it: but the whole point of Value is to destructure it using match ! David Tolnay worked around this issue by adding a custom Array type instead of using Vec , which has the tail call optimizing destructor, but this complicates the API for no good reason.

IIUC, doesn't using Array give you the benefit of preserving the stack overflow protection even after you move it out of the JsonValue?

withoutboats · June 25, 2024, 9:03pm

(NOT A CONTRIBUTION)

There's no difference because once you'd moved out of the original Value you'd have a Vec<Value> and each of those values in the vec would prevent stack overflow internally.

withoutboats · June 27, 2024, 12:00pm

(NOT A CONTRIBUTION)

If exposing the enum is the whole point, you want it to be destructurable. If you don't want it to be entirely destructable, you can use private fields of struct variants. It does not make sense to say you want an enum to be exposed but not destructurable: what you want to expose is the structure of the type (that it is an enum) which is exposed by destructuring a value of the type.

matthieum · June 27, 2024, 4:13pm

Yes, by reference (or mutable reference).

It does not automatically follow that I want the Drop implementation to be bypassed, which would happen when destructuring by value.

ryanavella · June 27, 2024, 4:44pm

That is not my reading of the RFC discussion. My interpretation is that if we extend the sound subset of Rust (i.e. the initial scope of RustBelt) with replace-with, that combination is itself sound and consistent. The author of partial-borrow claims the same is true of their library, which implies that both libraries provide inconsistent extensions of the sound subset (cf. Axiom of Choice), though nobody has provided a proof or counterexample.

I personally lean towards Ralf being correct here as the burden of proof lies with the author of partial-borrow, but I do not see a clear consensus reached in that discussion.

withoutboats · June 28, 2024, 9:59am

(NOT A CONTRIBUTION)

This strikes me as rather contrived: you want an enum that can be destructured by reference, but not by value, but you also want to pass ownership of that enum, rather than a reference, just not allow it to be destructured-by-value. Do you have any example of this scenario in practice?

The solution to this use case using privacy would be to pass ownership of a newtype struct around the enum that has methods or uses Deref to reveal the inner enum by reference only.

Again, the framing of "bypassing the destructor" just doesn't actually make any type theoretical sense: the destructor is what happens when you don't move the value; destructure-by-value is just a kind of move; if you want to control which modules can destructure-by-value the solution is privacy.

Jules-Bertholet · June 28, 2024, 1:20pm

From @RalfJung’s dissertation on RustBelt:

For this dissertation, we have developed RustBelt, the first formal (and machine-checked) model of Rust which can account for Rust’s extensible approach to safe systems programming and verify its soundness. Our soundness proof is extensible in the sense that, given a new unsafely implemented library and its type, RustBelt defines a proof obligation that is sufficient to show that any safe Rust program using this library is memory safe and thread safe. It is also modular in the sense that libraries can be verified independently and composed arbitrarily.

At the center of RustBelt is λ_Rust, a Rust core language, with its type system. […]

The key idea that makes the extensible soundness proof of RustBelt work is to define a semantic model of λ_Rust’s type system. This is hardly a new idea; in fact Milner’s original type soundness proof for an ML-style polymorphic λ-calculus followed the semantic approach, based on the idea of a logical relation. A logical relation defines what a type “means”— what terms it is inhabited by—through the observable behavior of a term rather than “what the syntactic typing rules allow”. In particular, for a function to be well-typed, only its input-output behavior is relevant; the body of the function can use unsafe features as long as it does not violate the type system guarantees.

[…]

With our semantic model in hand, the safety proof of λ_Rust divides into three parts:

Verify that the typing rules of λ_Rust are sound when interpreted semantically. For each typing rule we show a lemma establishing that the semantic interpretations of the premises imply the semantic interpreta- tion of the conclusion. This is called the fundamental theorem of logical relations.

Verify that, if a closed program is semantically well-typed according to the model, it is safe to execute—there will be no safety violations. This is called adequacy.

For any library that employs unsafe code internally, verify that its implementation satisfies the “safety contract” of its public interface. This safety contract is determined by the semantic interpretations of the types of the library’s public API. It establishes that all unsafe code has been properly “encapsulated” behind that API. In essence, the semantic interpretation of the interface yields a library-specific verification condition.

A Rust API is sound iff it is impossible to cause UB by combining it with arbitrary safe code, or unsafe code that respects the library's safety invariants. However, these safety invariants are separate from the library code itself. In fact, for a single Rust library, you could potentially have several distinct, incompatible models of these invariants, each of which is sound and adequate while permitting different sets of unsafe code.

RustBelt's λ_Rust model consists of the syntax, type system, and operational semantics of a subset of the Rust language, plus a specific semantic interpretation of Rust constructs (like &mut) that defines their safety invariants. replace-with obeys the rules of this model, partial-borrow does not.

Again, with replace-with, if you have a mutable reference then you can get an owned value.

ryanavella · June 28, 2024, 3:24pm

This is the part I'd like to see elaborated more. I struggle to see how this conclusion is a direct consequence of RustBelt's semantic interpretation, and I don't see it fully spelled out in the RFC discussions.

Jules-Bertholet · June 28, 2024, 4:51pm

I believe the Coq proof is here, though I don’t claim to understand it.

RalfJung · June 29, 2024, 10:33am

This is correct. Note, however, that this is a consequence of how I chose to set up the model. It is conceivable that there could be a model that makes partial-borrow sound and replace-with unsound (though I have not seen that worked out). Ultimately, every time we have a "soundness conflict" like this (two crates that are sound in isolation but not in composition), we will have to make a choice to bless one of the two crates -- which will then inform the formal model. Conversely, our choice can also be informed by the consequences it has for the formal model.

Alternatively, we could have an officially blessed formal model exactly spelling out the safety invariants associated with all types. The issue is that: (a) expressing these invariants requires advanced PL theory, making these discussions basically inaccessible to people who don't happen to have a PhD in PL theory -- I hope one day we'll find more accessible ways to make these invariants precise, but currently, the English-language description quickly becomes both verbose and vague. (b) Setting these invariants decides the fate of all future soundness conflicts -- which can be seen as a good thing, but the downside is that we don't even know yet what these conflicts might be so making such a far-reaching judgment call is somewhat scary.

That said, for this particular conflict, I think looking at the formal model makes it clear that replace-with is sound in the "obvious" model and partial-borrow requires some advanced contortions to have its soundness justified. So I personally think we should decide this one in favor of replace-with. Also see the discussion starting here. Basically, partial-borrow needs its Partial<T> type to be "immovable", or at least only movable in restricted ways, but RustBelt is built on the assumption that all Rust types are movable. The way to encode "references to data that must not be moved" is Pin.

Topic		Replies	Views
[Pre-RFC] Destructuring values that impl Drop language design	15	3090	October 3, 2019
pre-RFC: the Destruct trait	32	3045	March 25, 2019
pre-RFC: simple/safe pattern destruction of `Drop` types language design	6	1310	March 25, 2019
Drop bounds are useless, so let's change that language design	5	596	September 16, 2024
Fixing the `Drop` trait bound language design	14	3883	July 15, 2019

Destructuring Droppable structs

Related topics