Pre-RFC: 'self lifetime

binarycat · August 13, 2024, 6:01pm

this lifetime specifier would only be valid when used in the definition of structs, unions, and enums (possibly in the future it could also be valid for tuples). it can be used directly, in a reference, or as a generic lifetime paramater.

this lifetime is for data that lives exactly as long as the struct (usually self-borrowed data)

&'self references themselves would need to have different aliasing rules than regular references, namely that the underlying data can be modified while the reference exists. this is to avoid tracking partial borrows outside of an individual function.

however, whenever a 'self reference is moved/copied out of its struct, it is implicitly reborrowed into a reference with the actual lifetime of that specific struct, and then must follow normal borrow rules.

farnz · August 13, 2024, 6:14pm

This needs expansion to make the proposal interesting - how, exactly, are you going to change the semantics of &'self Foo and &'self mut Foo to make this work for your intended use cases? How do these changes interact with other uses of lifetimes inside data types? Are there any use cases that you rule out with these semantic changes?

SkiFire13 · August 13, 2024, 6:50pm

This would only work for trivial references. The moment you introduce an enum or indirection it would become unsound, and those are precisely the cases where 'self references would be useful.

binarycat · August 13, 2024, 11:47pm

show me an example of the unsoundness, and i'll bet there's a way to circumvent it, perhaps by introducing a new form of lifetime bound.

binarycat · August 13, 2024, 11:57pm

&'self references would have the same aliasing rules as raw pointers (or &UnsafeCell<T>), namely, this means the compiler can't cache their values in registers, every access must be a full memory access

when they are reborrowed, they borrow the entire struct, so that the memory they point to can't be mutated, preventing the aliasing rules from being broken.

SkiFire13 · August 14, 2024, 6:04am

The burden of the soundness proof should be on the one proposing it, and a lack of counterexamples doesn't make a theorem true. But anyway.

Example 1:

struct SelfRef {
    a: i32,
    r: &'self i32
}

impl SelfRef {
    fn new() -> SelfRef {
        SelfRef {
            a: 1,
            r: &self.a // Is this syntax even valid according to your Pre-RFC?
        }
        // When SelfRef is returned it can be moved in memory, thus its `a` field
        // will change address, leaving `r` pointing to some random memory.
    }
}

Example 2:

struct SelfRef {
    v: Vec<i32>,
    r: &'self i32
}

impl SelfRef {
    fn new() -> SelfRef {
        SelfRef {
            v: vec![1],
            r: &self.v[0]
        }
    }
}

fn main() {
    let mut s = SelfRef::new();
    s.v = Vec::new(); // This deallocates the old Vec's memory and now r is invalid
    println!("{}", s.r); // Use after free
}

farnz · August 14, 2024, 7:25am

This also implies that the thing being pointed to cannot be moved, since a reference is semantically a pointer to the absolute location of the thing.

That means that the following code is not legal with the current operational semantics of Rust:

struct SelfRef {
    items: Vec<i32>,
    smallest: &'self i32,
}

impl SelfRef {
    pub fn new(items: Vec<i32>) {
       let smallest_value = items.iter().copied().min();
       let smallest_idx = items.iter().position(|item| *item == smallest_value);
       let smallest = &items[smallest_idx];
       // Returning this is not valid under current Rust semantics,
       // since moving it invalidates the value of `smallest`.
       SelfRef {
             items,
             smallest,
    }
}

Additionally, note that the aliasing rules for raw pointers are not yet fully defined, and the rules for &UnsafeCell<T> just say that the UnsafeCell itself cannot be aliased, but the rules for what you can do with the T part are still not yet fully defined.

So far, the major component of those rules is that a raw pointer or UnsafeCell cannot be assumed to not alias something else; this doesn't mean that every access must be a full memory access, but rather that the compiler cannot assume that a memory operation does not affect the pointed-to value.

However, in the absence of a store in this thread, or an Acquire or stronger atomic load in this thread, the compiler can still cache the value in a register, because in those two cases, it knows that the pointed-to value has not changed (within the rules of the language).

DragonDev1906 · August 14, 2024, 8:36am

I think a 'self lifetime might be possible, but only when multiple things are changed:

Introduction of a relative reference type (for sake of argument: r&T) or change of normal references &T to (at compile time) not only contain a lifetime but also a memory region (start-to-end or the scope/area of a variable). Without this anything self-referential couldn't be moved (the status quo).
Distinction between borrowed immovable (what we have now) and borrowed movable, the latter meaning that something has to be considered borrowed in terms of lifetimes but can be moved around in memory (this is probably a big box of issues though)
Addition of finer grained borrowing: In SkiFire13's Example 2 (SelfRef with Vec<i32>) the data the Vec is pointing to has to be considered "borrowed immovable" as long as r exists, thus preventing any resizing and any modifications that could modify v or v[i]. SelfRef::v itself could still be movable but effectively can be seen as "borrowed movable", allowing the vec to be moved but not the underlying data.
And finally, to make all of this work a 'self lifetime alone likely wouldn't be sufficient: You'd need the ability to specify the lifetime of individual fields 'self.v and (probably) the ability to distinguish between the field and the data this field is pointing to 'self.v.(*ptr) (or however that would work).

What does this mean?

When r borrows something in the memory region of SelfRef directly, it would have to be a relative pointer (r&'self i32) with the memory region being the location + size of this SelfRef, thus making SelfRef "borrowed movable"
Whenever converting between absolute and relative reference the base address (of the memory region it is pointing to) has to be added/removed, thus requiring knowledge of the reference's address in relation to it and the struct/variable representing this memory region (SelfRef) must be considered non-movable for the lifetime of that reference.

For the examples mentioned so far (yes, those probably don't cover all situations):

struct SelfRef {
    a: i32,
    r: r&'self i32
}

impl SelfRef {
    fn new() -> SelfRef {
        SelfRef {
            a: 1,
            r: r&self.a // If this is allowed
        }
        // Due to the use of a reference relative to `SelfRef`
        // (forgive me being sloppy and using the lifetime for that,
        // too) `SelfRef::a` is considered "borrowed immutable movable",
        // allowing this function to move SelfRef in memory without `r`
        // pointing to some random memory.
        // 
        // The same argument goes for farnz' example.
    }
}

struct SelfRef {
    v: Vec<i32>,
    r: &'self i32 // Note that this is not relative, as it points to the data of v
}

impl SelfRef {
    fn new() -> SelfRef {
        SelfRef {
            v: vec![1],
            r: &self.v[0]
        }
    }
}

fn main() {
    let mut s = SelfRef::new();
    // s.v = Vec::new(); // This is invalid because s.v is "borrowed immutably movable".
    // One thing to consider though is how swap should be handled.
    // If the underlying data is not dropped but outlives `s`, `s.r`
    // would still point to a valid memory location, but it's no longer
    // inside of `SelfRef`, so it's lifetime wouldn't be `'self`.
    println!("{}", s.r);
}

I'm probably missing something important (besides a lot of added complexity for the borrow checker).

SkiFire13 · August 14, 2024, 10:47am

What is the usecase for this though? I can't think of a case where you would want to store a reference in a struct that can only point to another field.

DragonDev1906 · August 14, 2024, 11:23am

Me neither, as most situations where something like this could be releavant end up being behind a pointer. Perhaps when using dynamically sized types (which already isn't often):

struct A {
    pos: r&'self u8,
    data: [u8],
}

One place where it could be relevant is if the memory region isn't the struct itself but a larger section of memory this struct is in (though then the lifetime 'self isn't of much use anymore). For example a list/vec or when using an arena allocator. Granted, the latter doesn't benefit much from the ability to move the entire thing and you probably wouldn't want to do it to the former (except for when the vec has to get resized).

kornel · August 14, 2024, 1:31pm

I recommend reading this article, which explores some (non)solutions to this problem that have been rejected:

https://without.boats/blog/pin/

binarycat · August 15, 2024, 10:08pm

I am proposing an idea in order to get feedback. Expecting me to somehow have a formal proof at this point is unreasonable.

Aras14HD · August 18, 2024, 6:38pm

It is very hard to make things immovable (like this) in rust. Take this example:

struct SelfRef {
  a: usize,
  b: &'self usize,
}
fn moving() {
  let mut x = SelfRef {
    a: 10,
    b: &self.a,
  };
  let mut y = SelfRef {
    a: 10,
    b: &self.a,
  };
  std::mem::swap(&mut x, &mut y); // y.b now points to x.a
  drop(x); // x dropped
  println!("{}", y.b); // access of dropped x.a
}

I can't think of any way to prevent this while staying useful and backwards compatible (so no Move).

P.S.: One way to make something like these possible, would be to add a Borrow Type, which is the origin for a unique lifetime and invalidated by any access including moves. (self plug for my gist)

Topic		Replies	Views
Self references (yet again) language design	14	1493	April 30, 2021
Idea: on movable self-referential structs language design	10	2005	March 25, 2019
Proposal about expired references language design	32	3394	April 30, 2020
Internal references as a separate type language design	44	3343	November 24, 2020
Improving self-referential structs language design	83	23995	March 25, 2019

Pre-RFC: 'self lifetime

Related topics