Struct Destruction Order

Hi, first post to the internals list, but this seemed more relevant here, as I understand what the current usage restrictions are, but I am interested in the internals, and whether it has to be this way. I have a vague idea of what I am talking about when it comes to type-systems and published a paper back in 2004 Strongly typed heterogeneous collections. Since then I have been working on various personal language projects including compositional type inference with principal typings and a pure prolog like logic language with sound unification, a complete search strategy, and hopefully complete inference (proper negation is still a work in progress).

I have come across a couple of cases where I want to return a struct which both owns some data and contains references to it. Something like this:

struct Test<'a> {
    v : Box<[i32]>,
    r : &'a i32
}

fn test<'a>() -> Test<'a> {
    Test {
        v : Box::new([1,2,3]),
        r : &v[2]
    }
}

I realise this is not possible currently, and the reason I have read is based on the uncertainty of destruction order. However the following works fine:

let w = Box::new([1,2,3]);
let r = &w[1];

Why not treat struct like a slice of stack, and enforce construction top to bottom and destruction bottom to top, or alternatively why not have the destructor take ownership of the struct then destruction order does not matter?

It’s not just destruction order. When you borrow v[2], you also borrow <the Test object>.v and <the Test object>, and while it’s borrowed it can’t be moved. This is necessary in general because if these paths aren’t recorded as being borrowed, it’s legal to:

  • borrow them mutably — this means mutable aliasing, which causes all sorts of problems
  • drop them — this obviously makes r a dangling pointer
  • move them — and then the new owner might take any of these actions in turn

In particular, the return type Test<'a> doesn’t include any information that it would be illegal to drop the v member (e.g. by swapping it out for another Box<[i32]>). So memory unsafety can occur long before Test is dropped.

And that doesn’t even get into the issue that &v.deref().index(1) (the desugared form of &v[1]) doesn’t become dangling when v is moved, but e.g. for a SmallVec, the equivalent expression does become dangling when v is moved.

(Edit: I previously gave a stupid example in the last paragraph, forgetting that v.len() would return the length by value and thus the pointer would be dangling only because it’s a temporary. See how tricky reasoning about pointer invalidation upon move is? :smiley: )

1 Like

So why is it legal to take the reference when the Box is on the stack?

The Box is borrowed and you can’t move it (or mutate it, or mutably borrow it, etc.). This is the same whether it’s directly in a local variable of in a field of a struct, e.g. this works (using a single integer to save some typing, it’s irrelevant to the issue):

struct Wrapper(Box<u8>);
let w = Wrapper(Box::new(0));
let r: &u8 = &*w;

But you can’t move or mutate w while r exists, neither in this snippet nor in the one you posted.

So why not apply that rule to the struct, you cannot move the box, because it is borrowed? A sequence of variable declarations reserves memory on the top of the stack, so I don’t see why you cannot treat a sequence of initialisations in a struct identically to a sequence of initialisations on the top-of-stack. They both represent contiguous regions of memory.

But you do move it when you return the struct! Furthermore, suppose we allowed the return. When the borrow checker looks at the calling function, it doesn’t know the contents of test, it just sees the signature (all analyses of this kind are local, deliberately). Thus:

let t = test();
// type of t.v: Box
// type of t.r: &'a i32 for some 'a (which one, is another though question)
// no borrows in sight!

How should the borrow checker know that t.v is borrowed? I’m sure with enough work a formalism can be found to encode this in the signature of test and the type Test, but with the current model of borrowing and lifetimes it’s not possible and I don’t think any obvious, simple extension can do that. Global inference is right out.

I don’t see why it would require global inference, it would just need local inference over the struct, possibly by tying the lifetime of the Box to that of the reference. Something like:

struct Test<'a> {
    v : Box<[i32] + 'a>,
    r : &'a i32
}

To drive home the point that returning is moving, consider this function which looks suspiciously similar:

struct Bar<'a> {
    x: i32,
    r: &'a i32,
}

fn make_bar() -> Bar<'a> {
    Bar { x: 1, r: &x }
}

Under the obvious compliation scheme (no RVO) this creates a temporary Bar on the stack, sets up &r to point at part of this temporary, and then copies that object to its actual destination, making r dangling.

So this issue mentioned above:

is not just an annoying technicality, it's a fundamental problem with returning anything that has borrows into itself.

You can convert it into a pass by mutable reference. Allocate the space for the struct in the callers stack frame, and then pass the uninitialised struct into the the function by mutable reference (this is Ada’s ‘out’ passing convention).

Also could we do this if we heap allocated the struct, so if a struct contains a self reference it is marked immovable and then has to be boxed to be returned (like closures)? A new kind of marker trait?

Edit, actually mechanism already exists for making things immovable it the ‘borrow’ so a struct that contains a self-borrow just has to be considered as borrowed by the borrow-checker (in other words an internal borrow counts the same as an external borrow). This would prevent movement.

Yes, but that mandates an implementation detail that doesn’t even exist yet, and it falls apart if you do such simple things as vec.push(test()) (admittedly, vectors will probably get placement syntax to sidestep the issue). More generally, it’s pretty useless if you can return the struct but not move it any further — usually, as in your starting example, the borrow is specifically designed such that you can move the struct as a whole without invalidating it. Expressing this requires distinctions like the one I alluded to with Box/SmallVec. There are of course potential ways to handle all this, but it’s quite a lot of complexity.

Well, the second part, where anything borrowed cannot be moved, counting an internal borrow to be the same as if an external borrow exists would seem to be sufficient, and does not require any wider changes, although it would restrict what you can do with such a struct, the things you can do would be safe.

Converting to pass by mutable reference can be done as an AST to AST rewrite in the compiler, and the allocation can be pushed up as many levels as necessary, although it does effectively mean changing the calling convention to include pre-allocated space, which may be something Rust does not want, to keep things compatible with ‘C’ calling conventions.

To come back to this point:

Well that's exactly what I meant with extensions that let you express that t.v is borrowed. I guess Box<[i32] + 'a> is supposed to mean that the contents of the Box are borrowed for the lifetime 'a? (This specific syntax is already in use and means something different, but that's not important.)

You can't express in current Rust that something is borrowed, there is no syntax for it. You create the borrows and the compiler computes the loans from that. That's partly because it was never deemed important. Another reason is that loans are relatively complicated to describe: they can concern only parts of a value (e.g. a single field of a single field), there are several kinds of borrows (more than & and &mut), and there might be more dimensions I'm not even aware of since I never really worked on this part of the compiler. To give you an idea of the framework we're talking about and into which proposed rules have to be integrated, librustc_borrowck's readme is a good start although it's probably pretty outdated by now.

But IMHO the most important issue is that the model is not complicated enough: e.g., there is no way to reason about "the contents of a collection" (e.g. you can't extend a vec while you borrow one of its elements, but that's because the methods you used to create the borrow create a borrow of the whole vec, and extending requires a mutable borrow). For example, before even tackling the problem this thread is about, I would expect a way to permit calling vec.len() while mutably borrowing vec[0].

There is also currently no way for the compiler to reason "okay this path of x is borrowed but it's still okay to move x because [...]". Furthermore, non-lexical borrowing will probably make the whole story even more complicated.

To reiterate, none of the things I'm saying are supposed to imply that a sound static analysis is impossible, but all the complications I'm listing here (plus my inuition gained from fighting the borrow checker) make me pretty sure that it would require several big extensions, possibly even a departure from core tenets of the current lifetime system.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.