Self references (yet again)

atagunov · January 28, 2021, 8:37pm

Has this been discussed?.. Imagine

Rust edition boundary made it illegal to share a name between a lifetime and a let binding/etc:
```
// now illegal
fn foo<'a>(a : &'a .. )
```

next/same edition boundary assigned a new meaning to a name shared between lifetime and a let binding/etc:

struct S1 {
  p : Vec<u8>,
  // lifetime extends to such point until which the compiler can trivially
  // prove that p has not been "consumed" or modified in any way
  ref : &'p u8
}

struct S2 {
  p : Vec<u8>
}

// OK
fn foo(s2 : S2) -> S1 {
  let ref : &'s2.p = ...
  S1{ p : s2.p, ref }
}

if the compiler "looses sight of" p or p is modified all &'p references end their lifetime

fn bar(s1 : S1) {
  baz(s1.p); // move out of s1.p
  ... *s1.ref ... // compilation error, s1.ref's lifetime has ended

All cases when compiler cannot trivially prove things are safe are forbidden.

Example 1: passing &mut S2 around is okay. Modifying s2.p via &mut S2 is not okay. Using &mut *s2.p is invalid too. Indeed a fn up the call stack may later use s2.ref which has possibly become invalid.

Example 2: re-assigning all of *s2 where s2 : &mut S2 is okay since it preserves validity of s2.ref.

Libraries that find these restrictions too tight are welcome to opt-out via unsafe.

scottmcm · January 28, 2021, 8:50pm

You might take a look through Tracking issue for RFC 2115: In-band lifetime bindings · Issue #44524 · rust-lang/rust · GitHub and the corresponding RFC thread -- I vaguely remember some conversations about automatically creating lifetimes named for the corresponding parameters.

josh · January 29, 2021, 1:00am

The primary issue with self-references isn't the naming of the relevant lifetimes, it's the compiler support for handling self-references. That compiler support requires some substantial improvements to lifetime tracking, which are being worked on. Once those improvements are in place, we can start figuring out what the syntax for self-references looks like.

Aloso · January 29, 2021, 2:42am

What's the plan how moving self-referential structs is handled? For example, in this case:

struct Graph<const Size: usize> {
    a: [i32; Size],
    b: Vec<(&'a i32, &'a i32)>,
}

When the struct is moved, the references might need to be updated, but this is nontrivial when the references are in a structure such as Vec.

josh · January 29, 2021, 6:15am

Most likely, by defining self-references using relative pointers, such that the compiled code for dereferencing a self-reference will use it as an offset from the location of the pointer.

H2CO3 · January 29, 2021, 9:22am

As to the concrete syntax, I think it's a terrible idea to mix names from different namespaces, and even more so to introduce tacit dependency between two different entities based solely on the fact that they share part of the identifier. That's completely counter-intuitive compared to how identifiers currently work, and even it weren't a breaking change, it would not be acceptable from a (re-)learning point of view.

kornel · January 29, 2021, 10:01am

The syntax lacks a way to distinguish between a reference to data owned by the same struct, but allocated elsewhere under stable(ish) address, vs actual self-referential to the same object:

struct Indirect {
   p: Vec<u8>, ref: &u8
}

and

struct Direct {
   p: [u8; 10], ref: &u8
}

This makes a huge difference if the object is moved. It's safe to move self-referential object pointing to Vec (the heap), but not one that points to u8 in-line in the same struct. And an offset-based reference could be implemented for in-line data (a bare field, array, ArrayVec), but a relative offset wouldn't work for the Vec case (offset between Vec and struct would change when the struct moves).

atagunov · January 29, 2021, 1:50pm

I had a feeling last night I was missing something

Quite right. The plan might still have some value. But it supports references into the heap only.

Breakage is solvable by (two?) editions. Ease of (re-)learning is subjective.

dodomorandi · January 29, 2021, 5:20pm

Let's hypothesize a certain syntax to specify a self-referencial type, it could be possible to make self-referencial mutable borrow?

I am asking because the soundness issue with Generators is still open, and I am wondering if the current stacked borrow model could work in the general case. For instance, with an unwieldy and totally inappropriate syntax:

struct A {
    a: u8,
    b: &selfref mut u8,
}

let mut a = A { a: 0, b: &selfref mut A::a };

I can obviously take &mut a, &mut a.a and &mut a.b. But the following

let indirect_a: &mut u8 = &mut *a.b;

Is this sound?

And what about the following:

let A { a_refmut, b_refmut } = &mut a;

In this case we are creating two mutable references pointing to the same u8, breaking aliasing rules.

Maybe it's just me, but I think that it is not only more than syntax (as already @josh said), but also more than compiler implementation. I would obviously like, like many other rustaceans, the possibility of creating self-referencial types in a complete safe way, but if the instance is somehow special (if you cannot perform destructuring, it's surely a strange struct), then I am not sure about the real usefulness of the feature.

atagunov · January 29, 2021, 5:24pm

Apologies for a bad topic name. My interest was in lifetimes that last as long as a specific heap allocation. Which could start in a callee and end in the caller if so desired.

@dodomorandi: guess the compiler would error on any usages of references that it cannot prove safe. Identifying a useful subset or all possible code sequences easily provable to be safe by the compiler will be an interesting task.

dodomorandi · January 29, 2021, 5:51pm

Surely it's the compiler that should trigger an error if something is not allowed. My point, however, is that the compiler just implements a valid set of rules that make safe Rust sound. If you take the simple destructuring case I made before, it is totally not trivial to imagine an additional set of rules that make current Rust valid and, at the same time, allow to handle destructuring self-mut-referencial types soundly. Only at that point it is possible to talk about the problems with the implementation in the compiler, which can be as hard as creating the right set of rules (see NLL, Polonius, Chalk...).

My humble opinion is that deciding a syntax is (warning: hyperbole ahead) just an implementation detail. Proposing a working sound model for self-referencial types coherent with the current stacking borrows model is the hard part.

ibraheemdev · January 29, 2021, 8:11pm

Would polonius be a step forward towards the compiler being able to handle self-references?

CAD97 · January 29, 2021, 11:00pm

Polonious is "just" another implementation of the borrow rules as currently formulated. It does so in a much easier to extend way, but it doesn't fundamentally change what we say is sound.

panstromek · January 30, 2021, 8:48am

Needless to say that Niko specifically mentions this case in his latest talk about Polonius as one possible case where it could help in the future

system · April 30, 2021, 8:48am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Proposal about expired references language design	32	3261	April 30, 2020
Minimal support for self-referential structures?	3	1522	March 2, 2022
Pre-RFC: 'self lifetime language design	12	696	August 18, 2024
Make borrow safe earlier language design	3	950	March 25, 2019
Blog post: Indirect ownership, shallow borrow and self-referential data structures language design	6	1595	July 27, 2022

Self references (yet again)

Related topics