Could this be an opt-in feature that gives other Drop
types ability to move out of like the Box
?
That's an interesting question. I had not been considering the parallel with DerefMove
. I'm not sure to be honest.
(Ab)using âDerefMove
â would be an interesting way to handle moving out of impl Drop
types, for sure. If we get a DerefMove
before destructurable Drop
, I expect we'll see some libraries use it for such, similar to how we see some usage of Deref
[Mut
] for emulating field aliasing and/or read-only fields today.
My own ptr_union::UnionN
already uses an explicit unpack for destructure, although there it's due to structural necessity.
serde_json::Value
is perhaps a perfect case study for wanting this. It's fundamentally just the enum
and the defaulted drop glue is sufficient; it wants a custom impl Drop
not out of necessity but because that allows it to avoid the recursive drop.
example
impl Drop for Value {
fn drop(&mut self) {
// todo: improve alloc utilization
let mut q = vec![mem::take(self)];
while let Some(v) = q.pop() {
match v {
Value::Array(a) => q.extend(a),
Value::Object(o) => q.extend(o.into_values()),
_ => continue,
};
}
}
}
Value
wouldn't be very happy with a DerefMove
based destructure, and would probably stick to just the default drop glue rather than make the common usage worse for a conditionally improved drop impl.
Based on this, I currently believe the ideal behavior would probably be, for any impl Drop
type:
- If all structure is visible and the type is not marked
#[non_exhaustive]
, allow destructuring the value, but not moving from fields individually, with a lint warning that custom drop glue is being bypassed. As noted in the OP, this is for all practical purposes not a new capability, as the destructure site could instead swap out each field. - If any structure is not visible, continue to forbid destructuring. Partially public structure can continue to use swap based workarounds if desired, and this avoids problematic questions w.rt. mixed public and private fields receiving needed fixups in
impl Drop
. - Allow the
impl Drop
block to be annotated with an attribute (bikeshed:#[diagnostic::ignorable_drop]
) that makes it so destructuring the type doesn't cause the aforementioned lint. (Maybe: also impact if the typemem::needs_drop
if all drop glue is ignorable.)
Any type relying on Drop
forbidding moves out of public fields for soundness is already on thin ice, due to conflict with replace_with
(which RustBelt proves sound).
Any type relying on
Drop
forbidding moves out of public fields for soundness is already on thin ice, due to conflict withreplace_with
(which RustBelt proves sound).
Indeed, and there have been mentions for at least one of the crates that combining it with replace_with
was unsound. What is less clear is whether the crate's technique itself is unsound, or the combination somehow unearths a conflict in the theory underpinning Rust's soundness. The former is most likely, but its creator believes his crate sound and nobody so far has proved it was not.
There is no such theory, at least not a complete one officially blessed by the Rust lang team. RustBelt is the closest thing we have, and under that model, replace-with
is sound and partial-borrow
is unsound.
(NOT A CONTRIBUTION)
I agree that there needs to be a transition story for users who currently rely on this behavior for correctness, but I don't agree that the abstractly correct behavior would be for destructability with a destructor to be "opt in." I'd rather see a cross-edition change with an opt-out that can be added to types with the current behavior by rustfix. The syntax for controlling in which modules a type can be destructured is privacy, having a destructor should really have nothing to do with it.
Rust's privacy system is unfortunately also baroque and strange. You highlight that you can't have private variants of enums for example: the correct code if you want an enum that can't be destructured outside of this module is to wrap it in a newtype struct for public consumption. This sucks but its how it works. There are also confusing things like that a tuple struct with any private fields can't be destructured but a record struct with any public fields can be (I had to test this to find out; I surely don't have these rules memorized). Making the privacy rules less complicated is maybe also an appealing project, but seems orthogonal to fixing the rules here.
I disagree, and I think API designers will too.
SemVer is yet another reason to pick an opt-in behavior here:
- Adding destructuring is backward compatible.
- Removing destructuring is NOT backward compatible.
So what happens if an API designer forgot to add the attribute (or whatever):
- Opt-in: add the attribute, make a patch/minor release, move on.
- Opt-out: add the attribute, make a major release, watch the ecosystem cry out in anguish and curse you, apologize and curse the Rust developers.
Opt-in is much more forgiving. Let's make it opt-in.
TIL...
I wouldn't say "correct".
I've used the enum-in-a-struct when the enum is an implementation detail, and that's a fine (if boilerplatey) trick.
It's not a suitable solution when exposing the enum is the whole point, though.
Yes it can, if you ask nicely:
mod inner {
pub struct Foo(pub bool, bool);
}
use inner::*;
fn foo(a: Foo) {
let Foo { 0: _a, .. } = a;
}
(this makes 0 sense ofc)
It does make some sense if you consider the Foo(_, _)
pattern as a separate item that inherits the least privilege of the fields, same as the separate fn Foo(_, _)
function; while the Foo { .. }
constructor and pattern behaves the same for all three struct syntaxes.
(From the post)
However, JSON is often parsed from untrusted input, and its very plausible to define pathological JSON which would cause the ordinary destructor to overflow its stack. For example, consider the JSON
[[[]]]
. Parsing this to aValue
will result in an array containing a array containing an array which is empty. With the automatic destructor code that is generated for you, this will perform 3 nested calls to deallocate each array variant. A pathological JSON string would contain enough nested arrays to overflow the stack when destroying a value of that type, so serde_jsonâs Value type is meant to have a custom destructor which uses a form of tail call optimization to avoid having the stack space use grow with the depth of the JSON.The problem is that if you implement this destructor on the
Value
type, now you canât destructure it: but the whole point ofValue
is to destructure it usingmatch
! David Tolnay worked around this issue by adding a customArray
type instead of usingVec
, which has the tail call optimizing destructor, but this complicates the API for no good reason.
IIUC, doesn't using Array
give you the benefit of preserving the stack overflow protection even after you move it out of the JsonValue
?
(NOT A CONTRIBUTION)
There's no difference because once you'd moved out of the original Value
you'd have a Vec<Value>
and each of those values in the vec would prevent stack overflow internally.
(NOT A CONTRIBUTION)
If exposing the enum is the whole point, you want it to be destructurable. If you don't want it to be entirely destructable, you can use private fields of struct variants. It does not make sense to say you want an enum to be exposed but not destructurable: what you want to expose is the structure of the type (that it is an enum) which is exposed by destructuring a value of the type.
Yes, by reference (or mutable reference).
It does not automatically follow that I want the Drop
implementation to be bypassed, which would happen when destructuring by value.
That is not my reading of the RFC discussion. My interpretation is that if we extend the sound subset of Rust (i.e. the initial scope of RustBelt) with replace-with
, that combination is itself sound and consistent. The author of partial-borrow
claims the same is true of their library, which implies that both libraries provide inconsistent extensions of the sound subset (cf. Axiom of Choice), though nobody has provided a proof or counterexample.
I personally lean towards Ralf being correct here as the burden of proof lies with the author of partial-borrow
, but I do not see a clear consensus reached in that discussion.
(NOT A CONTRIBUTION)
This strikes me as rather contrived: you want an enum that can be destructured by reference, but not by value, but you also want to pass ownership of that enum, rather than a reference, just not allow it to be destructured-by-value. Do you have any example of this scenario in practice?
The solution to this use case using privacy would be to pass ownership of a newtype struct around the enum that has methods or uses Deref to reveal the inner enum by reference only.
Again, the framing of "bypassing the destructor" just doesn't actually make any type theoretical sense: the destructor is what happens when you don't move the value; destructure-by-value is just a kind of move; if you want to control which modules can destructure-by-value the solution is privacy.
From @RalfJungâs dissertation on RustBelt:
For this dissertation, we have developed RustBelt, the first formal (and machine-checked) model of Rust which can account for Rustâs extensible approach to safe systems programming and verify its soundness. Our soundness proof is extensible in the sense that, given a new unsafely implemented library and its type, RustBelt defines a proof obligation that is sufficient to show that any safe Rust program using this library is memory safe and thread safe. It is also modular in the sense that libraries can be verified independently and composed arbitrarily.
At the center of RustBelt is λRust, a Rust core language, with its type system. [âŠ]
The key idea that makes the extensible soundness proof of RustBelt work is to define a semantic model of λRustâs type system. This is hardly a new idea; in fact Milnerâs original type soundness proof for an ML-style polymorphic λ-calculus followed the semantic approach, based on the idea of a logical relation. A logical relation defines what a type âmeansââ what terms it is inhabited byâthrough the observable behavior of a term rather than âwhat the syntactic typing rules allowâ. In particular, for a function to be well-typed, only its input-output behavior is relevant; the body of the function can use unsafe features as long as it does not violate the type system guarantees.
[âŠ]
With our semantic model in hand, the safety proof of λRust divides into three parts:
- Verify that the typing rules of λRust are sound when interpreted semantically. For each typing rule we show a lemma establishing that the semantic interpretations of the premises imply the semantic interpreta- tion of the conclusion. This is called the fundamental theorem of logical relations.
- Verify that, if a closed program is semantically well-typed according to the model, it is safe to executeâthere will be no safety violations. This is called adequacy.
- For any library that employs
unsafe
code internally, verify that its implementation satisfies the âsafety contractâ of its public interface. This safety contract is determined by the semantic interpretations of the types of the libraryâs public API. It establishes that allunsafe
code has been properly âencapsulatedâ behind that API. In essence, the semantic interpretation of the interface yields a library-specific verification condition.
A Rust API is sound iff it is impossible to cause UB by combining it with arbitrary safe code, or unsafe
code that respects the library's safety invariants. However, these safety invariants are separate from the library code itself. In fact, for a single Rust library, you could potentially have several distinct, incompatible models of these invariants, each of which is sound and adequate while permitting different sets of unsafe
code.
RustBelt's λRust model consists of the syntax, type system, and operational semantics of a subset of the Rust language, plus a specific semantic interpretation of Rust constructs (like &mut
) that defines their safety invariants. replace-with
obeys the rules of this model, partial-borrow
does not.
Again, with replace-with
, if you have a mutable reference then you can get an owned value.
This is the part I'd like to see elaborated more. I struggle to see how this conclusion is a direct consequence of RustBelt's semantic interpretation, and I don't see it fully spelled out in the RFC discussions.
I believe the Coq proof is here, though I donât claim to understand it.
This is correct. Note, however, that this is a consequence of how I chose to set up the model. It is conceivable that there could be a model that makes partial-borrow sound and replace-with unsound (though I have not seen that worked out). Ultimately, every time we have a "soundness conflict" like this (two crates that are sound in isolation but not in composition), we will have to make a choice to bless one of the two crates -- which will then inform the formal model. Conversely, our choice can also be informed by the consequences it has for the formal model.
Alternatively, we could have an officially blessed formal model exactly spelling out the safety invariants associated with all types. The issue is that: (a) expressing these invariants requires advanced PL theory, making these discussions basically inaccessible to people who don't happen to have a PhD in PL theory -- I hope one day we'll find more accessible ways to make these invariants precise, but currently, the English-language description quickly becomes both verbose and vague. (b) Setting these invariants decides the fate of all future soundness conflicts -- which can be seen as a good thing, but the downside is that we don't even know yet what these conflicts might be so making such a far-reaching judgment call is somewhat scary.
That said, for this particular conflict, I think looking at the formal model makes it clear that replace-with is sound in the "obvious" model and partial-borrow requires some advanced contortions to have its soundness justified. So I personally think we should decide this one in favor of replace-with. Also see the discussion starting here. Basically, partial-borrow needs its Partial<T>
type to be "immovable", or at least only movable in restricted ways, but RustBelt is built on the assumption that all Rust types are movable. The way to encode "references to data that must not be moved" is Pin
.