Proposal about expired references

vincent163 · January 22, 2020, 1:00pm

EDIT: The thread's original title was "Some thoughts on self-referential structs & proposal about out-of-lifetime references". The part of self-referential structs turned out to be overly complex, and my original intent was to share my thoughts on why it has to be complex. However, such a topic has little value for discussion. I decided to focus on out-of-lifetime references instead.

The document is available down the page at Proposal about expired references, and please ignore the following link and any discussions related to self-referential structs.

~~The text is a bit long so I decided to keep it in a git repo at https://github.com/vincent-163/rust-notes.~~

I found Rust's ownership model to be very interesting. However there are some limitations with the model, so I did some research and tried to imagine what a self-referential struct would look up by pure logic. While a self-referential struct turned out to be necessarily complex, the idea about out-of-lifetime references might be of practical interest.

Any comments or suggestions would be greatly appreciated.

RustyYato · January 22, 2020, 3:29pm

Overall this is nicely written, but it is way too complex while also having limited impact and introduces a few breaking changes. The lifetime rules are intentional simple to allow for optimizations and understanding of the system.

Out of lifetime references would break many optimizations around references and would be a breaking change. This seems really complex, and I have a few questions and notes:

how do you handle moving self referential types?
why do you need to name 'none?
What is MaybeExist?
if you can't access some fields of a type from behind a reference, it doesn't seem very useful
&mut means a unique borrow, not a mutable borrow. See Rust: A unique perspective
most importantly to me: how are you going to teach this?

A list of breaking changes in no specific order

out of lifetime references
- needing the 'self bound
the # usage breaks the quote macro used in every proc macro

Contrary to intuition, f.ptr1 does not borrow anything; the only reason it can contain a reference is because Box::borrow_mut gave us one

So these lifetimes only work for references? That is extremely limiting.

vincent163 · January 22, 2020, 5:01pm

Thanks for the feedback.

how do you handle moving self referential types?

Existing mechanisms like Pin should be sufficient. I don't think it's possible to really move them around in memory, if you consider the following example:

struct Data {
    a: u64,
    b: &'self u64,
}

if Data is moved, the reference b has to be updated as well. The current mechanisms are only able to copy data in a byte-by-byte manner, and it's not possible to move references along with data in this way. Storing relative offsets to the struct's pointer is possible but overly complicated.

why do you need to name 'none ?
out of lifetime references needing the 'self bound

To contrast with 'self without breaking existing code. In order to avoid breaking existing code and to allow writing idomatic Rust, all lifetime parameters should be considered 'self unless explicitly written to allow 'none.

struct Pair<'a, 'b>(&'a str, &'b str);
impl Pair<'a, 'b> where 'b: 'none { // 'b can be 'none while 'a is implicitly 'self
    fn pair_first(&self) -> &'a str { self.0 } // &self is implicitly 'self
}

&mut means a unique borrow, not a mutable borrow. See https://limpet.net/mbrubeck/2019/02/07/rust-a-unique-perspective.html

I agree with that and that is my mental model, but I found those names in the book so I thought the notation of "mutable borrow" and "immutable borrow" were more commonly used than "unique borrow" and "shared borrow".

if you can't access some fields of a type from behind a reference, it doesn't seem very useful

If the reference is a direct argument in the function signature then you're right, it's useless. The problem that out-of-lifetime references try to solve is the case when &'a ref deep inside the structure forces the whole structure into 'a lifetime, even though other parts of the structure that are not bound by 'a may still be useful, and that's when out-of-lifetime references shine. I included some examples in the text to show the idea.

What is MaybeExist?

It's a made-up type to mean that a field might exist here or it might not exist here. The format is: MaybeExist<?a, T>, where ?a is a boolean either true or false, and T is the type that could be stored in it. The special property about MaybeExist is that the boolean part can change during program execution. One can move from a MaybeExist<true, T> into a variable T, and its type becomes MaybeExist<false, T>. This is similar to the borrow checker tracking whether a field is moved, at the type level. This actually requires some intricate logic since this changes the type of an object. It is similar to shadowing if the variable is in local scope, but in order to really implement it in self-referential structs, the type has to be changed behind a reference, which is more complex.

The following code should demonstrate the behavior:

type MovablePair<?a, ?b, T> = (MaybeExist<?a, T>, MaybeExist<?b, T>);

fn work() {
    let pair: MovablePair<true, false, Box<usize>> = (Box::new(0), _);
    pair.1 = pair.0;
    // pair: MovablePair<false, true, Box<usize>> = (_, Box::new(0))
}

the # usage breaks the quote macro used in every proc macro
most importantly to me: how are you going to teach this?

I don't expect the ideas to be actually implemented anytime soon because right now it is too complex to have any practical value. The # is just used for demonstration purposes, and if this is ever going to be implemented, we need to find another syntax for that of course. I have another idea that creates self-referential structs from function breakpoints which is TBD, and I believe that would be more practical and easier to learn.

So these lifetimes only work for references? That is extremely limiting.

I'm not sure if I understood this correctly, but I'll try to answer. References take a lifetime because they are not something that can be used anywhere; they can only be dereferenced within its lifetime. Primitives and structs do not have a lifetime because if you have a struct, you have all its fields, so its lifetime is always 'static. There might be certain cases when structs want a lifetime as well, maybe for some unsafe structs, though I failed to think of an example. But the lifetimes are used to call the functions, not to allow the struct to exist, and you can always add any lifetime constraints to the &self parameter of any struct function.

RustyYato · January 22, 2020, 6:35pm

What I meant by this is, what if I had a type like std::slice::Iter<'a, T>. How would that fit into this model? Note that Iter does not contain any references (only raw pointers and a PhantomData)

vincent163 · January 23, 2020, 3:08am

You should be able to implement it in the usual way:

use std::marker::PhantomData;

struct Iter<'a, T> {
	slice: *const T,
	remaining: usize,
	_data: PhantomData<&'a T>,
}

fn make_iter<'a, T>(slice: &'a [T]) -> Iter<'a, T> {
	Iter { slice: &slice[0], remaining: slice.len(), _data: PhantomData }
}

impl<'a, T> Iterator<'a> for Iter<'a, T> {
	type Item = &'a T;
	// There is a problem with traits here. The Iterator trait's signature is:
	// fn next(&'self mut self) -> Option<Self::Item>
	// Tthe lifetime 'self does not include 'a, so we cannot dereference `slice`
	// from the function in safe code, and we cannot ensure that a reference to `slice`
	// is valid in unsafe code.
	// Instead, we need to write the signature as:
	// fn next<'b: 'a>(&'b mut self) -> Option<Self::Item>
	// But this function does not compile because the signature is not equivalent
	// to that of the trait.
	// A solution is to add a new lifetime parameter to every trait that could have 
	// a lifetime. This would a breaking change so we want to find ways to preserve
	// the current behavior.
	// Another solution is to make the `self` reference capture all lifetime parameters
	// by default, and I suggest this way because this minimizes breakage.
	//
	// While researching this problem, I found that the above signature does not compile
	// in current Rust as well!
	// They are in fact equivalent because currently the existence of a reference implies
	// that all references that can potentially be reached from this reference are within
	// their lifetime, so 'b: 'a is already implied.
	// This might be a sign that current Rust lacks explicitness on lifetimes.
	//
	// This implicitly adds a bound 'a: 'self if 'a: 'none is not specified here.
	fn next(&'a mut self) -> Option<Self::Item> {
		if self.remaining == 0 {
			None
		} else {
			// To properly dereference it, we need to ensure two things:
			// * The reference is within its lifetime. This is enforced by 'a: 'self which is implicit.
			// * We give the reference a correct lifetime so it will be usable during that lifetime.
			//   Since the iterator was constructed from &'a [T], we know that the &T reference
			//   will be usable during 'a.
			let item = unsafe { &(*self.slice) as &'a T };
			self.remaining -= 1;
			Some(item)
		}
	}
}

// Proposed way of writing traits that potentially have a lifetime:
trait Iterator<'a> {
	type Item;
	// I think it would be better to make 'a an associated type so there won't be ambiguation
	// if Iterator is implemented for multiple different lifetimes.
	fn next(&'a mut self) -> Option<Self::Item>;
}

EDIT: After trying it out, I found that I misunderstood Iterator. The above definition does not allow me to use an iterator in a while loop, since the next function consumes &'a mut self, allowing only one call to next() during 'a. Instead, it should be written as:

trait Iterator<'a> {
	type Item;
	fn next<'b>(&'b mut self) -> Option<Self::Item> where 'a: 'b, 'b: 'self;
}

This is a bit confusing, but I would argue that this makes the logic clearer.

vincent163 · January 23, 2020, 3:18am

The intent of the out-of-lifetime references proposal is not just for the out-of-lifetime references itself. By breaking the assumption that the existence of a reference implies the availability of a reference, I hope that the logic around lifetimes will be clearer and more explicit, and pave way for other interesting ideas based on this as they pop out. If someone is interested in this, I'll try to write an RFC.

RustyYato · January 23, 2020, 3:56am

I think this comes in conflict with the desired validity requirements of references. cc @RalfJung

vincent163 · January 23, 2020, 1:09pm

I've just skimmed @RalfJung's awesome paper called Stacked Borrows. From what I have read, the paper proposes a method to prevent undefined behavior caused by invalid usage of pointer aliases even in unsafe code. But there is one thing strange about it: unsafe code is all about the low-level operations that form the basic building component of safe code, and it's just strange to check them because they often wrap concepts outside Rust that cannot be checked at all. If there are ways to check that they are correct, why not improve safe Rust instead and turn unsafe code into safe code?

I have a strong feeling that my proposal about out-of-lifetime references align well with the problem that Stack Borrows is trying to solve. Instead of "checking unsafe code", we can make these code safe! ~~Specifically, the following code snippet will be able to compile without unsafe:~~

let x = 1u64;
let y = &mut x;
let z = &mut x;
*y = 2;

EDIT: Stacked Borrows prohibit this type of code as well (merely creating the pointer z counts as usage and y's lifetime overlaps with that of z). I'll use this piece of code instead:

let list_owned: Vec<String>; // assume we get this variable somewhere
// 'a starts
let list_ref: Vec<(&'a str, Metadata)> = list_owned.iter().map(|s| (s.as_str(), get_metadata(s)).collect();
// For some reasons we don't want to drop list_ref right away. Maybe the Metadata contains a drop handler that takes too long to run, or we want just the metadata and not the strings. I don't know if a specific use case exists, but the current behavior is preventing us from writing safe code. As long as we don't dereference the string outside 'a, we should be actually safe!.
// 'a ends
drop(list_owned);
// At this point, we are outside 'a. We can still use list_ref safely as long as we do not dereference a borrow with lifetime 'a outside 'a.
for let (s, metadata) in list_ref {
    do_something_with(metadata); // safe!
    println!("{}", s); // error: dereferencing an out-of-lifetime reference
}

The work around Stacked Borrows provide a formal and sound implementation of how lifetimes can be checked, while out-of-lifetime references allow programmers to understand and express the kind of code that Stacked Borrows is able to check in safe code. With out-of-lifetime references added to the language and checked with Stacked Borrow, I'm confident that this will remove a lot of unsafe code and insert "ob" to Rust.

Ixrec · January 23, 2020, 1:33pm

Close, but importantly wrong. It proposes a rigorous definition/specification for undefined behavior itself. Or at least, the "aliasing model" part of such a specification.

One of the arguments in favor of that proposed definition is that it's compatible with tool checking, which should help prevent a lot of UB in practice, but fundamentally it's about defining what UB is in the first place.

Because much of the unsafe code we're interested in cannot be checked in a 100% ironclad way. When I say "compatible with tool checking" above, I mean that you can run your test suite under miri with a special flag to validate that no UB-according-to-Stacked-Borrows takes place within any of the program executions in that test suite. That is amazing compared to other systems languages, but it's still not the kind of airtight soundness proof we get when safe Rust passes all the type/borrow/etc checkers (ignoring compiler soundness bugs for the moment). Even if we officially adopt Stacked Borrows, there's always the possibility that you just missed the test case that would trigger UB.

There is certainly unsafe code that we could theoretically replace with new safe Rust features, but those have to be evaluated on a case-by-case basis like any other feature proposal.

In particular, thinking of Stacked Borrows as an implementation of anything is simply incorrect. If we adopt Stacked Borrows as the official normative aliasing model for Rust, then it becomes the rules by which we judge the correctness of other code. Either you're proposing a different aliasing model (which is what I think you are doing), or your proposal is incompatible with Rust's aliasing model and therefore a complete non-starter that's impossible to implement without breaking or pessimizing lots of existing unsafe code (which I'm pretty sure is not your intent).

I haven't had the chance to actually read the paper you linked above, but anything called "out of lifetime references" certainly sounds like it would largely be about making a huge extension to the existing aliasing rules as informally understood by the community, as enforced by the compiler, and as formally specified in Stacked Borrows (or whatever replaces it). In that sense I guess it's not wrong to say it "aligns well", but it's so much deeper than that.

vincent163 · January 23, 2020, 3:35pm

Thanks for the detailed explanation on Stacked Borrows. That makes sense and corrects my understanding of that paper.

I just realized that Stacked Borrows is designed to be implemented dynamically and catch errors at runtime. I made a huge mistake of regarding it as a static checker! Then I made an ungrounded assumption that the static version of Stacked Borrows as I've understood wrongly is equivalent to current Rust plus out-of-lifetime references. I still hold that assumption (that the same concepts in Stacked Borrows can be used to build a robust borrow checker; and that such a borrow checker equals existing Rust plus out-of-lifetime references) but I don't have adequate knowledge to understand how it would work out yet.

I believe out-of-lifetime references:

Is an extension to current safe Rust. In order to avoid breaking existing code, we should try to make it opt-in as much as possible, so:
- Lifetime parameters will need to explicitly specify 'none to allow lifetime parameters that are potentially 'none. Otherwise, lifetime parameters on functions are automatically 'self.
- Another problem is that currently types that contain references have a lifetime shorter than all references reachable from it, but with out-of-lifetime references, types can live longer than the references that they contain. This can break code that rely on this assumption, such as unsafe code like:
```
struct IntoIter<'a, T> {
    cur: *mut T,
    _data: PhantomData<&'a mut T>,
}
impl Iterator<T> for IntoIter<'a, T> {
    fn next(&mut self) -> Option<T> {
        unsafe { Some(*self.cur) } // Unsafe! Even though PhantomData exists, we are no longer guarenteed that we are within 'a.
    }
}
```
  In order to avoid this, elided lifetimes should capture all lifetime parameters of the type, so &mut self is automatically &'b mut self where 'b: 'self + 'a.
Is NOT an extension to Stacked Borrows. Rust code involving out-of-lifetime references should be guarenteed to pass Stacked Borrows without UB, as long as the compiler removes any instructions related to 'none references and does not reborrow or dereference them. In fact, out-of-lifetime references are designed exactly to allow the kind of safe code that are valid under Stacked Borrows and can be statically expressed and verified, but rejected by the borrow checker. This is again ungrounded, since Stacked Borrows is much more rigorous than a newly born idea, but that's my intent.

vincent163 · January 24, 2020, 8:11am

Thanks for your interest. I've written a draft regarding what expired references are and how they are useful. I'll update the document and add more details as new discoveries are made.

The name was originally "Out-of-lifetime references", but after some thinking I find the term "expired references" somehow more intuitive. If someone has a better name, feel free to leave suggestions.

Expired references

Motivation

The idea originally came up in my attempt to design self-referential structs. Self-referential structs turned out to be inevitably complex, but expired references remain an interesting topic to explore.

The intent is not just to enable specific use cases; it is expected to make the logic around lifetimes explicit, and simplify the implementation of the compiler and remove corner cases like Sound Generic Drop and Member constraints.

Guide-level explanation

Currently, Rust ensures that all usage of a reference is within its lifetime. It does so by checking a function as a whole, and inferring every reference's lifetime based on its occurrences.

let x = 0usize; // 1
let y = &mut x; // 2
*y = 1; // 3
let z = &mut x; // 4 at this point, y can no longer be used
*z = 2; // 5
*y = 1; // 6 Rust's borrow checker will complain if this line is included

From the above code it concludes that:

The lifetime of reference y begin before 2 and end after 6
The lifetime of reference z begin at 4 and end at 5
The lifetime of reference y and z shall not coincide because they are mutable borrows from the same variable, x

It thus concludes that the code is invalid and rejects the code.

References can also appear nested, such as in vectors or in a struct field. This is commonly used to create iterators, for example: (elided lifetimes added for clarity)

struct Iter<'a, T> {
    arr: &'a [T],
    index: usize,
}
impl<'a, T> Iterator for Iter<'a, T> {
    type Item = &'a T;
    fn next<'b>(&'b mut self) -> Option<&'a T> {
        if self.index == self.arr.len() {
            None
        } else {
            let item = Some(&'a self.arr[self.index]);
            self.index += 1;
            item
        }
    }
}
fn make_iter<'a, T>(arr: &'a [T]) -> Iter<'a, T> {
    Iter { arr, index: 0 }
}
fn main() {
    let arr = vec!["this", "is", "a", "sentence"];
    let iter = make_iter(&arr);
    for word in iter {
        println!("{}", word);
    }
}

A reference to arr is used to create an iterator, and that reference is used in the next function of the iterator to generate a reference for the next function.

How does the compiler ensure that the reference is valid in the next function? It does so by propagating reference lifetimes. While creating the iterator with the make_iter method, the method accepts a lifetime parameter called 'a and creates a type called Iter<'a, T>. Since Iter<'a, T> contains a reference &'a T, the compiler propagates that lifetime to Iter<'a, T> and ensures that any usage of Iter<'a, T> can only appear within its lifetime 'a. That is, for any reference &'b mut Iter<'a, T> or &'b Iter<'a, T>, the type checker enforces that 'a: 'b.

As the borrow checker checks main function, while passing make_iter function, it is able to infer from the function signature make_iter<'a, T>(arr: &'a [T]) -> Iter<'a, T> that:

The lifetime 'a refers to the lifetime of arr.
The 'a in the returned value Iter<'a, T> is the same as the lifetime of arr.

By doing lifetime propagation, the compiler ensures that Iter<'a, T> can only be used during 'a, that is, while arr exists. Hence, if we drop arr before the for loop:

fn main() {
    let arr = vec!["this", "is", "a", "sentence"];
    let iter = make_iter(&arr);
    drop(arr);
    for word in iter {
        println!("{}", word);
    }
}

The borrow checker complains, because the later for loop uses iter which cannot exist when arr no longer exists. This prevents a potential memory safety bug in which we access freed memory through iter.

The above is the way Rust handles lifetimes since its very beginning. It works well to prevent undefined behavior, and it has proven to be rigorous. However, it suffers from some limitations.

Consider an accumulator, which pulls numbers from a &Cell<u64> and sums them up, and outputs the sum without &Cell<u64> being available:

use std::cell::Cell;
struct State {
    val: Cell<u64>,
}
impl State {
    fn new() -> State {
        State { val: Cell::new(0) }
    }
    fn observe(&self) -> u64 {
        self.val.get()
    }
    fn set_val(&self, v: u64) {
        self.val.set(v)
    }
    fn shutdown(self) {
    }
}
struct Accumulator {
    ptr: *const State,
    sum: u64,
}
impl Accumulator {
    fn new(ptr: &State) -> Accumulator {
        Accumulator { ptr, sum: 0 }
    }
    unsafe fn add(&mut self) {
        let state = unsafe { &*self.ptr };
        self.sum += state.observe();
    }
    fn get_sum(&self) -> u64 {
        self.sum
    }
}
fn main() {
    let mut state = State::new();
    let mut acc = Accumulator::new(&state);
    state.set_val(3); acc.add();
    state.set_val(4); acc.add();
    state.set_val(5); acc.add();
    state.shutdown();
    println!("{}", acc.get_sum()); // 12
}

Note how this can only be implemented in unsafe. If we try to implement this in safe code, Accumulator will need to contain a reference to state , like: (main and State code are exactly the same and omitted)

struct Accumulator<'a> {
    ptr: &'a State,
    sum: u64,
}

impl<'a> Accumulator<'a> {
    fn new(ptr: &'a State) -> Accumulator<'a> {
        Accumulator { ptr, sum: 0 }
    }
    fn add(&mut self) {
        self.sum += self.ptr.observe();
    }
    fn get_sum(&mut self) -> u64 {
        self.sum
    }
}

If we try to compile this, we will get this error:

error[E0505]: cannot move out of `state` because it is borrowed
  --> c.rs:47:5
   |
43 |     let mut acc = Accumulator::new(&state);
   |                                    ------ borrow of `state` occurs here
...
47 |     state.shutdown();
   |     ^^^^^ move out of `state` occurs here
48 |     println!("{}", acc.get_sum());
   |                    --- borrow later used here

Our code did not pass the borrow checker! The code should be safe because, even though state is dropped at line 47, acc.get_sum() does not use state in its code. In fact, the unsafe variant complies with Stacked Borrows and passes Miri tests, so as we'll explain later, most usual compiler optimizations still apply when calling get_sum.

The problem is that the compiler cannot understand that the get_sum function does not use the ptr reference. Instead, it only understands "lifetime propagation" as we've explained above. The Accumulator<'a> struct inherits the lifetime of all its fields, one of which is 'a, and thus the lifetime of Accumulator<'a> is 'a. This ensures that as long as the object Accumulator<'a> exists, all its fields can be accessed, so in our code:

    fn add(&mut self) {
        self.sum += self.ptr.observe();
    }

self.ptr is guaranteed to exist, since &mut self exists. However, in the following code:

    fn get_sum(&mut self) -> u64 {
        self.sum
    }

self.ptr is guaranteed to exist as well, even though we did not use it. This means that after state.shutdown(), state no longer exists, and we cannot access self.sum because the other field self.ptr is holding us back! What if we want to tell the compiler, as well as users of the function, that get_sum works outside the lifetime 'a? That's when expired references come to play.

How expired references work

Expired references, or out-of-lifetime references, are references that cannot be used. This might seem weird, since references that cannot be used does not have any value, but it tries to prevent a single reference in a struct field from constraining the whole struct and making the other parts of a struct unusable outside a single reference's lifetime.

We achieve this by introducing two new special lifetime identifiers, called 'self and 'none, in addition to 'static. 'none is the antonym of 'static. 'static means to live forever, while 'none is used for references that is not alive at all. 'self has a context-specific meaning and is different everywhere. In this document, we give it a meaning in the context of functions, that the references lives for the duration of the function call.

We use the same syntax as in current Rust to denote lifetime subtyping. 'a: 'b has the following equivalent meanings:

'a is assignable to 'b: let x: &'b T = &'a T
'a is longer than 'b. It can be understood as 'a >= 'b

And thus:

'static is longer than any lifetime. For any 'a, 'static: 'a
'none is shorter than any lifetime. For any 'a, 'a: 'none

For the purposes of explanation, we'll force all lifetimes to be explicitly written. That is, if a lifetime argument is not constrained, it will default to 'none, and thus any references with that lifetime will not be usable. That is, the following function does not work:

fn work<'a>'(&'a mut self) {
    // 'a could be 'none and thus the reference may not be valid at all!
}

Instead, it has to be written this way:

fn work<'a: 'self>(&'a mut self) {
    // 'a couldn't be 'none because of 'a: 'self. 
}

This is not the proposed syntax, since it breaks nearly all existing code. Instead, we will later explain how lifetime elision rules are modified to ensure that expired references are an opt-in feature and that existing code is not affected.

In order for the above example code to work, we need to break a rule that has been fundamental to the Rust compiler: that the existence of a reference implies the validity of the reference. The impact of this is a bit hard to understand, but for now, think like this: if inserting a usage of reference a at some point of code causes the code to fail the borrow checker, then the reference is considered to be "expired", and treat it as if it were a raw pointer.

First of all, we need to somehow express, at the function signature, the fact that the function get_sum does not use the reference &'a State. This has different meanings for both sides of the function:

For the function caller, in order to call acc.get_sum(), we do not need to be within 'a . The borrow checker needs to understand that the call to acc.get_sum() does not cause the lifetime 'a to be extended till this point.
For the function body, get_sum() must not dereference self.ptr, not even a phantom dereference. self.ptr may point to invalid memory at this point, and it may contain invalid data or point to an invalid page, so any attempt to dereference it can cause undefined behavior. Instead, it must be treated like a raw pointer or opaque data.

We express the fact using the 'none and 'self lifetime specifier as we've just defined. The Accumulator is implemented in this way:

struct Accumulator<'a> {
    ptr: &'a State,
    sum: u64,
}

impl<'a> Accumulator<'a> {
    // note that the 'a: 'self bound is not necessary here
    fn new(ptr: &'a State) -> Accumulator<'a> {
        Accumulator { ptr, sum: 0 }
    }
    fn add(&'self mut self) where 'a: 'self {
        self.sum += self.ptr.observe();
    }
    fn get_sum(&'self self) -> u64 {
        self.sum
    }
}

We'll explain the function signatures add and get_sum with the new syntax.

fn add(&'self mut self) where 'a: 'self

This function borrows Accumulator for exactly the duration of the function body, and hence the input argument is &'self mut self. However, in our model. it is no longer safe to dereference self.ptr directly, because even though self is valid within the function body, it is no longer guaranteed that self.ptr is valid at all (imagine self.ptr being *const State at this point)! Instead, we have to explicitly specify in the function signature that 'a is indeed valid during 'self. Hence 'a: 'self.
fn get_sum(&'self self) -> u64

This function borrows Accumulator for exactly the duration of the function body, and hence the input argument is &'self mut self. Since sum is not a reference, we can access sum directly with a reference to Accumulator, and the only constraint we need is that the reference must be valid during function execution. This is explicitly written out by the 'self argument.

Relationship with the current lifetime model

As we said in the first section, currently Rust expects a struct/slice/array/vector to capture the lifetime of all its references. That is, if struct S contains a field &'a T , either directly or indirectly through another struct/array/vector or through PhantomData, then the whole struct becomes 'a: S. This means that when attempting to create a reference &'b S or &'b mut S to S, the type checker adds a constraint 'a: 'b for every field with lifetime 'a. Note that this is exactly the constraint that the proposal is going to remove.

Rust also enforces that all generic parameters must be used within the struct body, so the set of lifetime parameters is the same as the set of all lifetimes.

That is, for the following struct:

struct Example<'a, 'b, 'c> {
    a: &'a T,
    b: &'b mut T,
    c: PhantomData<&'c mut T>,
}
impl<'a, 'b, 'c> Example<'a, 'b, 'c> {
    fn work<'d>(&mut self, val: &'d mut usize) {}
}

The signature of fn work is equivalent to this in our new model:

fn work<'d>(&'self mut self, val: &'d mut usize) where 'd: 'self, 'a: 'self, 'b: 'self, 'c: 'self;

Basically, all lifetime parameters that appear:

in a reference in one of the struct fields, enum variants, tuple fields, slice items, PhantomData reachable from a function argument

currently outlive 'self implicitly.

Impact on trait lifetimes

This proposal is likely to result in a change to how traits work. Currently, the lifetime parameter of &self or &mut self in every trait automatically includes the lifetime of all references reachable from self, even if the lifetime is not specified as a lifetime parameter in the trait. This is an implicit lifetime parameter that is actually used in the trait but not expressed well. For impl Trait in function return values, this has resulted in Member constraints. The lifetime for dyn Trait has also been a special case; it defaults to 'static and is specified using + 'lifetime syntax, as shown in the following example:

struct Ref<'a>(&'a usize);
trait Trait{}
impl<'a> Trait for Ref<'a> {}

fn f<'a>(val: Box<dyn Trait + 'a>) {
}
fn g(val: Box<dyn Trait>) {
}

fn main() {
    let num = 0;
    let data = Ref(&num);
    let r = Box::new(data);
    f(r);
    // In this case, the lifetime parameter for `dyn` object defaults to `'static`, so it doesn't work.
    g(r);
}

With this proposal implemented, such a lifetime would have to be explicitly specified in a trait's parameters rather than implicitly capture all the fields accessible from self.

For example:

use std::marker::PhantomData;

struct Iter<'a, T> {
	slice: *const T,
	remaining: usize,
	_data: PhantomData<&'a T>,
}

fn make_iter<'a, T>(slice: &'a [T]) -> Iter<'a, T> {
	Iter { slice: &slice[0], remaining: slice.len(), _data: PhantomData }
}

impl<'a, T> Iterator<'a> for Iter<'a, T> {
	type Item = &'a T;
	// There is a problem with traits here. With the proposal implemented, the
	// Iterator trait's signature is:
	// fn next(&'self mut self) -> Option<Self::Item>
	// Before the implementation of the proposal, the lifetime 'x of the reference
	// &'x mut self implicitly captures all struct fields, that is, there is an implicit
	// lifetime constraint 'a: 'x if the struct contains references of lifetime 'a.
	// However, with the proposal implemented, the lifetime 'self does not include
	// 'a by default, so we cannot dereference `slice` from the function in safe
	// code, and we cannot ensure that a reference to `slice` is valid in unsafe code.
	// Instead, we need to write the signature as:
	// fn next(&'self mut self) -> Option<Self::Item> where 'a: 'self.
	// This means that we need to specify `'a` as a lifetime parameter of the trait!
	// This is not extra complexity caused by expired references, but rather
	// a sign that Rust lacks explicitness in the trait definition. The trait only works
	// within a certain lifetime, but currently, this "certain lifetime" is derived from
	// the concrete type rather than explicitly expressed on the trait signature.
	fn next(&'self mut self) -> Option<Self::Item> where 'a: 'self {
		if self.remaining == 0 {
			None
		} else {
			// To properly dereference it, we need to ensure two things:
			// * The reference is within its lifetime. This is enforced by 'a: 'self which is implicit.
			// * We give the reference a correct lifetime so it will be usable during that lifetime.
			//   Since the iterator was constructed from &'a [T], we know that the &T reference
			//   will be usable during 'a.
			let item = unsafe { &(*self.slice) as &'a T };
			self.remaining -= 1;
			Some(item)
		}
	}
}

In order to solve the above problem, we need to add an implicit lifetime to all traits that potentially contain a lifetime for self references. Currently, all traits have a hidden generic parameter called Self as the first parameter. We add one more hidden lifetime parameter in addition to that, called 'ref. In a trait's signature, for every function that takes a reference to self, such as &'a self and &'a mut self, the lifetime bound 'ref: 'a is automatically added (in addition to 'a: 'self). While implementing traits, the 'ref parameter will automatically catch all lifetime parameters of the current type, unless the user explicitly overrides it. Whether and how that lifetime parameter should be exposed to the user is left as a question.

// Old way of writing traits:
trait Iterator {
	type Item;
	fn next(&mut self) -> Option<Self::Item>;
}

// The above desugars to this:
trait Iterator<'ref> {
	type Item;
	fn next(&'self mut self) -> Option<Self::Item> where 'ref: 'self;
}

Avoid special cases like Sound Generic Drop

With a new lifetime parameter for all traits, such a design gets the logic clearer and shows a clearer way to understand corner cases like Drop Check!

Consider the following program in the book:

struct Inspector<'a>(&'a u8);

impl<'a> Drop for Inspector<'a> {
    fn drop(&mut self) {
        println!("I was only {} days from retirement!", self.0);
    }
}

struct World<'a> {
    inspector: Option<Inspector<'a>>,
    days: Box<u8>,
}

fn main() {
    let mut world = World {
        inspector: None,
        days: Box::new(1),
    };
    world.inspector = Some(Inspector(&world.days));
    // Let's say `days` happens to get dropped first.
    // Then when Inspector is dropped, it will try to read free'd memory!
}

The example is a case where a type implementing Drop differes from one not implementing it! In order to solve the problem, the language introduced a concept called Sound Generic Drop. The rule is as follows from the link:

For a generic type to soundly implement drop, its generics arguments must strictly outlive it.

However, the concept strictly outlive is very confusing, since up to now, all lifetime notations 'a: 'b allows equality (which is analogous to 'a >= 'b), and there is no special symbol to say 'a != 'b. Why exactly do we need to introduce such a concept, and how an item implementing Drop is breaking the safety guarantees and therefore cannot be implemented safely?

The reason becomes clear if we explicitly write out the call to drop:

fn main() {
    let mut world = World {
        inspector: None,
        days: Box::new(1),
    };
    world.inspector = Some(Inspector(&world.days));
    World::drop(&mut world); // doesn't pass the borrow checker!
}

The main function first immutably borrows world.days , and then writes it into world.inspector . However, at this point, it is no longer possible to take a mutable reference to world, since world.days is mutably borrowed by something else: the world struct is borrowed by itself. Thus, it cannot be mutably borrowed until the borrow to itself is released, which is impossible because we don't have a mechanism to tell the compiler that we want to forget a specific reference on a struct field!

However, with the proposal implemented, it makes sense for the Drop trait to take a lifetime parameter because it is no longer assumed 'a: 'self in a function that takes &mut self . We no longer need a special dropck, because the distinction between a type without a drop handler and a type with a drop handler is clearer; the former is as if Drop<'none> while the latter is Drop<'b> where 'b: <catches all lifetime parameters> .

struct Inspector<'a>(&'a u8);

trait Drop<'a> {
  fn drop(&'b mut self) where 'b: 'self, 'b: 'a;
}

impl<'a> Drop<'none> for Inspector<'a> {
    fn drop(&mut self) { // fn drop(&'b mut self) where 'b: 'self
        // Now we are able to tell Drop that we do not need access to self.0.
        // This means that we are able to implement a drop handler!
        println!("I cannot see anything!");
    }
}

struct World<'a> {
    inspector: Option<Inspector<'a>>,
    days: Box<u8>,
}

fn main() {
    let mut world = World {
        inspector: None,
        days: Box::new(1),
    };
    'a: {
        let days = &'a world.days;
        let inspector = &'a mut world.inspector;
        *inspector = Some(Inspector(days));
    }
    // In order to call `world`'s drop handler, we need a mutable reference to `world`.
    // This means that the lifetime of both `days` and `inspector` has to end here.
    // Therefore, `world`'s drop handler must be Drop<'none>, and so is `Inspector`'s
    // drop handler.
    call_drop(world);
}

Opt-in mechanism and migration notice

Naively implementing it will break existing code, and worse yet, unsafe code that relies on the assumption that their code can only be called in the lifetime of their struct fields, such as PhantomData, because such code will not cause compiler errors and silently break the program. We need a way to opt-in to this feature and also force authors of unsafe code to make lifetime parameters explicit.

In order to avoid breaking existing code, we apply the following rules by default:

All types continue to inherit lifetime from all its fields, except when the field or the entire type was marked with #[maybe_expire], in which case its lifetime is ignored while computing the lifetime of the whole type. In particular, if the entire type was marked with #[maybe_expire], its lifetime becomes 'static.
For functions, the constraint T: 'self is applied to every argument T that was not marked #[maybe_expire]. The mark may also be applied at the function level, impl level, module level or cargo level.

For example, if in most cases our struct need to access self.a: &'a T and self.b: &'b mut T but not self.c:

struct Example<'a, 'b, 'c> {
    a: &'a T,
    b: &'b mut T,
    #[maybe_expire]
    c: PhantomData<&'c mut T>,
}
impl<'a, 'b, 'c> Example<'a, 'b, 'c> {
    fn work(&'self mut self) {
        // cannot use self.c but can use self.a and self.b
    }
    fn work_full(&'self mut self) where 'c: 'self {
        // we explicitly want to use self.c
    }
}

Note its interaction with struct lifetimes and hidden lifetime parameters in traits:

fn has_lifetime<'a, T>() where T: 'a {}

has_lifetime::<Example<'a, 'b, 'c>, 'a>() // doesn't compile
has_lifetime::<Example<'a, 'b, 'c>, 'c>() // doesn't compile
fn bar<'a, 'b, 'c, 'x> where 'a: 'x, 'b: 'x { has_lifetime::<Example<'a, 'b, 'c>, 'x>() } // compiles

trait Foo {
    fn work(&mut self);
}
// The implicit lifetime parameter is: 'x where 'a: 'x, 'b: 'x
impl<'a, 'b, 'c> Foo for Example<'a, 'b, 'c> {
    fn work(&'self mut self) {
        // cannot use self.c but can use self.a and self.b
    }
}
// We need to explicitly specify the lifetime parameter if
// we want to use a reference in a trait
impl<'a, 'b, 'c> Foo<'ref> for Example<'a, 'b, 'c> where 'c: 'ref {
    fn work(&'self mut self) { // implicit 'ref: 'self
        // can use self.c!
    }
}

In order to notify library authors of unsafe code to make lifetimes explicit in function signature, we can emit compiler warnings for PhantomData fields that are neither #[maybe_expired] nor 'static, and tell them how to apply #[maybe_expired] and make lifetimes explicit.

Lifetime parameters unused in the struct field should no longer cause hard errors. Instead, the compiler should emit warnings if these parameters were not used in the function signatures.

Compiler optimizations and compliance with Stacked Borrows

Note that all kinds of optimizations still work. For example, a compiler might store &u64 as u64 directly. However, it must not read memory at &'none u64 because it may point to invalid memory or invalid pages, causing segfaults. Apart from preventing dereference operations with &'none u64 , a compiler implementation must also take care not to perform out-of-thin-air reads with such a reference and must ignore it completely. Since lifetime constraints are statically known, the compiler should be able to figure out whether such reads are allowed. A way to avoid this is to treat them as a raw pointer, and any operations that are safe with raw pointers are also safe with 'none references. For example, &'none u64 can implement the Copy trait.

Its compliance with Stacked Borrows is trivial. Stacked Borrows is strictly more permissive than current Rust, and since Stacked Borrows only track reference usages, by preventing usage of 'none references, the reference usages are equivalent to current Rust. Thus, Stacked Borrows should allow 'none references without complaining.

Outline of implementation

Start by reserving the special lifetime identifiers 'self, 'none and 'ref, and implement the mark #[maybe_expired] to say that the references in a field or a struct may be expired and therefore not accessible. Add the constraint T: 'self for every function argument that was not marked #[maybe_expired]. Change the compiler to account for potentially expired references, check the lifetime parameter for every dereference operation, tell LLVM to ensure that unwanted optimizations doesn't happen. Look for any unsoundness caused by this change, such as where the compiler assumes that any reference in a struct field is accessible.

Then add a new hidden lifetime parameter to every trait. When implementing a trait, automatically add the lifetime parameter 'a and add constraint 'a: T for every struct field of type T that is not marked with #[maybe_expired]. Find a way to expose that parameter for both trait definitions and implementations. Somehow handle the case of dyn and impl Trait.

Finally, try to encourage use of #[maybe_expired] and make lifetime parameters explicit and deprecate struct-level lifetimes. We can remove the special case for Sound Generic Drop and check it with the borrow checker directly. Remove Member constraints.

bjorn3 · January 24, 2020, 11:19am

'a: 'b can also mean that 'a and 'b are the same.

Something which may be relevant (just skimmed your proposal): In let (a, b) = (vec![], vec![]); a and b have the same lifetime despite not being dropped at the same time. For this reason the dropck exists. (otherwise the variable dropped last may contain a reference to the one dropped first and read from it in Drop::drop.)

vincent163 · January 24, 2020, 3:01pm

Hence the >= symbol. The constraint 'a == 'b should be equivalent as 'a: 'b, 'b: 'a just like a >= b && b >= a => a == b.

That's an interesting case. I've just skimmed dropck section in the Rust book, and I found that the proposal might get the concept around Drop clearer.

Consider the following program in the book:

struct Inspector<'a>(&'a u8);

impl<'a> Drop for Inspector<'a> {
    fn drop(&mut self) {
        println!("I was only {} days from retirement!", self.0);
    }
}

struct World<'a> {
    inspector: Option<Inspector<'a>>,
    days: Box<u8>,
}

fn main() {
    let mut world = World {
        inspector: None,
        days: Box::new(1),
    };
    world.inspector = Some(Inspector(&world.days));
    // Let's say `days` happens to get dropped first.
    // Then when Inspector is dropped, it will try to read free'd memory!
}

Without drop the example works because the main function first immutably borrows world.days, and then writes it into world.inspector. However, at this point, it is no longer possible to take a mutable reference to world, since world.days is mutably borrowed by something else. Instead, the world struct is borrowed by itself, and it has to be destructed and separated into two variables before it can be dropped correctly.

However, with the proposal implemented, it makes sense for the Drop trait to take a lifetime parameter because it is no longer assumed 'a: 'self in a function that takes &mut self. We no longer need a special dropck, because the distinction between a type without a drop handler and a type with a drop handler is clearer; the former is Drop<'none> while the latter is Drop<'b> where 'b: <catches all lifetime parameters>.

struct Inspector<'a>(&'a u8);

trait Drop<'a> {
  fn drop(&'b mut self) where 'b: 'self, 'b: 'a;
}

impl<'a> Drop<'none> for Inspector<'a> {
    fn drop(&mut self) { // fn drop(&'b mut self) where 'b: 'self
        // Now we are able to tell Drop that we do not need access to self.0.
        // This means that we are able to implement a drop handler!
        println!("I cannot see anything!");
    }
}

struct World<'a> {
    inspector: Option<Inspector<'a>>,
    days: Box<u8>,
}

fn main() {
    let mut world = World {
        inspector: None,
        days: Box::new(1),
    };
    'a: {
        let days = &'a world.days;
        let inspector = &'a mut world.inspector;
        *inspector = Some(Inspector(days));
    }
    // In order to call `world`'s drop handler, we need a mutable reference to `world`.
    // This means that the lifetime of both `days` and `inspector` has to end here.
    // Therefore, `world`'s drop handler must be Drop<'none>, and so is `Inspector`'s
    // drop handler.
    call_drop(world);
}

bjorn3 · January 24, 2020, 4:48pm

A type without drop handler doesn't implement Drop at all. Making it implement Drop<'none> would affect coherence and other things in a non-compatible way.

vincent163 · January 25, 2020, 7:55am

Noting one more thing regarding how out-of-lifetime references affect traits.

In order to figure out how out-of-lifetime references interact with the borrow checker, I've been diving into rustc and its design documents. Here is one of them that is a bit confusing to understand: https://rust-lang.github.io/rustc-guide/borrow_check/region_inference/member_constraints.html

It basically states that if the return type is impl Trait<'a, 'b>, then the value must only capture 'a and 'b. This ensures that the lifetime of the return value is obvious from the signature. Otherwise, if an unknown lifetime 'c is involved and the value can only be used within 'c, it would not be enforcable if the caller only knows impl Trait<'a, 'b> and not its concrete type.

This exposes inexplicitness in nearly every Rust trait: every trait has an implicit lifetime parameter, which is the lifetime of a type, which equals the intersection of all its fields, which can only be known if its concrete type is known. That is, a trait relies on something defined outside the trait's signature, in this case the lifetime of the concrete type, to work.

With out-of-lifetime references, we will no longer need such a trick. For any trait Trait<'a, 'b>, the trait's functions can only use 'a, 'b, 'self and 'static in their signatures, and the implementor will only be able to use 'a and 'b references. This comes at the cost of potentially breaking some code, so we'll need some backward-compatible way to handle this.

Another interesting finding is while I was reading the borrow checker code, I found this definition: https://github.com/rust-lang/rust/blob/8647aa1a2ce279f8ec7cc5252d10b8cb9ea504eb/src/librustc_mir/borrow_check/universal_regions.rs#L41. From the description, it looks very similar to 'self lifetime in the proposal. I believe this shows that the proposal is exposing fundamental concepts rather than merely adding extra complexity.

RalfJung · January 28, 2020, 10:18pm

Without having read in detail the full proposal, let me just mention a piece of related work: one proposal that comes up fairly regularly is to allow annotating functions of a struct (like your get_sum) with the fields of the struct that they actually need. So e.g. instead of taking &mut self,it might take something like &mut self[sum], indicating that it only borrows the sum field, not the entire thing.

Unfortunately, I don't know which name is usually used for this feature, so I couldn't find an existing RFC. I've seen @nikomatsakis mention it, though.

Ixrec · January 28, 2020, 10:46pm

FWIW, I'm used to calling it things like "partial self borrows" or "borrow views" or "view structs" (the latter being what niko's blog post used).

vincent163 · January 29, 2020, 3:36am

Great point! While researching self-referential structs, I stumbled upon a similar problem*. Note that partial borrows are different from the expired references proposed here. ~~If I got a logically consistent idea of how partial borrows could be designed, I'll open another proposal.~~

*: It was called MaybeExist in the original post, but that post is messy and I don't recommend reading it.

newpavlov · January 29, 2020, 3:49am

Another name is "borrow regions".

mcy · January 29, 2020, 6:55pm

I've only lightly skimmed the proposal (can't fully context-switch right now to think about it carefully), but a small syntactic note: I sketched "dead references" a couple of years ago, mostly for fun, and picked '! as the syntax for the empty lifetime (there's no post for it, this was entirely in some language design chatroom). At least, I would avoid using a non-keyword for 'none, and '! is nicely consistent with ! being the bottom type. Admittedly, &'! is fairly sigiltastic, but perhaps that's what we get for not using never as our bottom type. =P

Topic		Replies	Views
Minimal support for self-referential structures?	3	1528	March 2, 2022
Blog post: Indirect ownership, shallow borrow and self-referential data structures language design	6	1597	July 27, 2022
Improving self-referential structs language design	83	23721	March 25, 2019
Self references (yet again) language design	14	1439	April 30, 2021
[Idea] Mut/immut markers for references language design	7	572	May 10, 2024