Lifetime Rules Implementation

I'm currently working on a compiler (written in C++) that intends to implement C++, Rust, and a few other languages. I had a few questions about how lifetimes work with borrow checking, as I will be implementing that at some point. This could also serve as a basis for rigourously specifying the borrow checker. Note that in the below rules, the term "object" is used in the same manner as it is in the C++ Standard ISO 14882, and has the same definition as provided there.

These are the High Level Rules I have interpreted as being applicable:

  1. A type T is bounded by a lifetime 'a, iff it is a reference type with lifetime 'a, or it contains 1 or more members which are bounded by lifetime 'a
  2. A type which is not bounded by any lifetime 'a, as defined by rule 1, is said to be bounded by 'static.
  3. A type T satisfies lifetime 'a, if it is bounded by some lifetime 'b, such that either 'b is exactly the lifetime 'a, or 'b starts before 'a, and ends after 'a. Note -- The lifetime 'static starts before any lifetime other than 'static, and ends after all lifetimes other than 'static -- End Note
  4. There shall be no reference types to any type T, with lifetime 'a, except where T satisfies 'a
  5. An object created with the static specifier has lifetime 'static.
  6. An object created at block scope without the static specifier has a lifetime which starts immediately after its initializer, and ends not later than the end of the block scope it is declared it, before all object's for which the lifetime started before that object. Note -- Such an object has automatic lifetime -- End Note
  7. A type T has non-trivial destruction if that type implements the Drop trait, or any member of the type has non-trivial destruction. A type T does not have non-trivial destruction has trivial destruction. Note -- An instantiation of the type core::marker:PhantomData<T> has non-trivial destruction if and only if T has non-trivial destruction -- End Note
  8. The lifetime of an object with automatic lifetime ends at the last point the object is used or to the end of the lifetime the object is borrowed, whichever is longest.
  9. The lifetime of a temporary object ends after the full expression which creates it, unless the object is borrowed. The lifetime of a borrowed temporary object is the length of the lifetime it is borrowed for.
  10. The lifetime of any object ends when the object is moved. Note -- An object of a type T which implements the Copy Trait is never moved -- End Note
  11. A borrow without an explicit lifetime is for the length of the reference object it creates.
  12. A non-mutable borrow shall not be created for an object at block scope and automatic lifetime if there exists a mutable borrow of the same object for which the lifetime has started and not yet ended. Note -- An object created in a non-mutable binding can always be borrowed non-mutably -- End note
  13. A mutable borrow shall not be created for an object at block scope and automatic lifetime if there exists any other borrow of the same object for which the lifetime as started and not yet ended. Note -- This implies there can only be at most one mutable borrow of an object -- End Note
  14. A mutable reference may be dereferenced, and the result borrowed, which produces a new reference (mutable or non-mutable) to the same object with a smaller lifetime. The original reference may not be accessed for the lifetime of the new reference, except that if the new reference is to a field, the original reference may be accessed to borrow other fields of the same object. Note -- This implies that if a mutable reference is reborrowed as non-mutable, the object cannot be mutated through the mutable reference for the lifetime of the non-mutable reference -- End Note
  15. A non-mutable reference may be derefereced, and the result borrowed, which produces the same reference, which has a potentially smaller lifetime. Note -- This operation is identical to copying the reference -- End Note
  16. For the purposes of the above rules, the instantiation of the type core::marker::PhantomData<T> is considered to have a single field of type T, except that this field cannot be named and does not affect the size or alignment requirements of the type.
  17. The end of scope for an object with non-trivial destruction and automatic lifetime that was not moved is considered mutably borrow the object (ignoring the objects declared mutability) and invoke the destructor operation for that object. The lifetime of the borrow is exactly the time necessary to perform the destructor operation. If multiple objects with automatic lifetime have non-trivial destruction and have their scope end at the same time, the destructor operations are performed in the reverse order the objects are created. If the lifetime of any subobject has been ended, the destructor operation is only applied to subobjects for which the lifetime has not ended
  18. Subobjects (that is, fields or array elements of another object) may be borrowed independently of other subobjects. A borrow of any object borrows each subobject of that object. The lifetime of any subobject can be ended before any object containing that subobject, except where the type of either the subobject, or any containing object, implements the Drop trait. Lifetimes of such subobjects are only ended explicitly (by moving the subobject) Note - A complete object that contains subobjects for which the lifetime has ended can be called a partially-moved value - End Note.

Would the above 18 rules (that's a lot) be sufficient for soundness from a rust compiler?

3 Likes

The short answer is that depending on what is and isn't in scope of your question, this is either broadly correct (although I only skimmed) albeit likely riddled with subtle imprecision, or a gross underestimation of how much complexity Rust is abstracting away with these systems.

As for the long answer, well, basically check out everything linked in this thread: https://users.rust-lang.org/t/math-behind-borrow-checker/37079

7 Likes

The scope of the question is a technical specification of rust's borrowing rules from a syntax and semantic perspective as well as if all programs which create and use references according to those rules specified would not violate any of rust's runtime requirements for references.

Also, polonius needs to be directly mentioned here, as a somewhat more formal attempt at implementing the compile time borrow checker.

The actual runtime requirements are somewhat up in the air, with the most prominent (pre-)proposal for formal specification being Stacked Borrows.

Of course, the static model is simpler.

(If your goal is to just compile a trivial subset, just require everything be 'static. If your goal is just to compile all valid Rust code but not necessarily reject all invalid Rust code, you can take the mrustc approach and just skip borrow checking.)

1 Like

Then yeah, it's gonna be far more complex than a 15 bullet point list. If it was that simple such a list would be in the Reference by now.

Probably the most important question that's still not super clear is whether you're including unsafe Rust, because that would definitely require you to grapple with not just Polonius' rules but also everything in and touched upon by Stacked Borrows (or whatever model we eventually adopt) and pretty much everything else linked in the thread I linked earlier.

1 Like

I would not recommend trying to derive the rules from scratch; you should start from an existing proven model. Ralf Jung is working on the "Stacked Borrows" model, and the folks working on Polonius (linked above) are producing logic-programming implementations of such models (and more advanced models). Use one of those, so that you can exactly match the logic of the Rust compiler rather than producing something subtly different.

6 Likes

I'll take a look. An actual specification would be nicer, but that is a start.

I am currently not looking to deal with the runtime requirements (except if the static requirements will allow violations of the runtime requirements). I'm probably going to go with a much wider interpretation than stacked borrows, since the optimizations would share some code with C's restrict, as well as some other behaviour. The static specification is primarily what I need here.

For now, not anything that can't be done safely (You will notice there is nothing in those rules reguarding raw pointers or mutable objects with static lifetime). This would simply be the static checking rules.

The implementation would be necessarily more complex as it is partially checked by the merged ir-visitor, though some of it will be in rust-specific visitors (for example, the actual borrow checking would be in rust, but bounds on types would have ir support).

As above, I will be taking a look at polonius for static checking. Having a good technical specification for the static rules would be beneficial to provide. Defining these rules could help making a full rust specification, so rust can no longer be effectively limited to the standard compiler.

In addition to what was said above, you can also read about rustc's own implementation here: https://rustc-dev-guide.rust-lang.org/borrow_check.html

2 Likes

To answer concretely (some of) your questions: In 7, a PhantomData<T> is considered owning a T for all intents and purposes, including (or in particular) for drop checking. So a PhantomData<T> has indeed non-trivial destruction when T has non-trivial destruction.

In 6, and in others, note that since NLL (non-lexical lifetimes) lifetimes no longer end at "the end the block". Similarly in 3, it's correct with "fully contains", but that's not exactly "starts before and ends after".

At least in the standard rust-parlance, references can have lifetimes, and types cen used by them, but objects do not. This is far from to say this is an unmeaningful concept, but it's not described as such usually.

amosonn
June 21

To answer concretely (some of) your questions: In 7, a PhantomData<T> is considered owning a T for all intents and purposes, including (or in particular) for drop checking. So a PhantomData<T> has indeed non-trivial destruction when T has non-trivial destruction.

That is fair. I'll adjust those rules.

In 6, and in others, note that since NLL (non-lexical lifetimes) lifetimes no longer end at "the end the block". Similarly in 3, it's correct with "fully contains", but that's not exactly "starts before and ends after"

I considered NLL in writing 6. The rules I wrote state that the lifetime of an object ends not later than the block it contains. I would assume that scope guards would require that the objects lifetime ends at the end of the block to have the user-intended effect (unless moved from), so that's why I considered that an object with non-trivial destruction has a lifetime that ends at the end of a block, so that drop timings are consistent. If this is incorrect, I'd be happy to adjust it, but that would be suprising to me, at least with a C++ background, where destructors are evaluated at the end of scope. Also, fully contains implies starts before and ends after.

At least in the standard rust-parlance, references can have lifetimes, and types cen used by them, but objects do not. This is far from to say this is an unmeaningful concept, but it's not described as such usually.

I use similar language from the C and C++ standards, as rust does not have a set of normative descriptive terms. I use the C++ definition of object here which is:

  • Has a type
  • Has a size and alignment
  • Has a storage duration
  • Has a lifetime. It's slightly easier to reason about those rules then whatever the heck rust parlance uses, and it isn't fundamentally incorrect.
1 Like

PhantomData does not by itself have a destructor, it only affects dropck of any struct it is contained within.

core::mem::needs_drop() says PhantomData<T> does have trivial destruction: https://rust.godbolt.org/z/1zk03o

1 Like

Hmm, I guess you're both right. Of course PhantomData can't have a destructor, it doesn't have any data; but for dropck it pretends it does, that's part of it's usage. And I read the "non-trivial destruction" part of OP as refering mostly to lifetimes and such dropck related things, for which PhantomData<T> behaves like T.

But I agree, these are two different things. I guess that's another point against "summing it up in 15 points".

I didn't mean to imply that your reasoning was incorrect. Just to note that the term "lifetime" is usually used specifically for borrows, and is used to reason about when is it valid to borrow a value im/mutably, when is it valid to use a refernce, etc - all of which are checked in borrow-checking. Many of the things you are describing are related to drop-checking, which is a different thing with different rules (though hopefully compatible). The fact that they are different does make it sometimes worth to reason about them separately, and not together.

That is indeed my primary use for the term "non-trivial destruction" here, rather than talking about how it evaluates the destructor, it extends the lifetime of objects to the enclosing scope. I guess it makes a difference when you wrap a reference in a struct (whereas references have trivial destruction, structs containing (even phantom) references can have non-trivial destruction).

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.