Self references (yet again)

Has this been discussed?.. Imagine

  • Rust edition boundary made it illegal to share a name between a lifetime and a let binding/etc:

    // now illegal
    fn foo<'a>(a : &'a .. )
    
  • next/same edition boundary assigned a new meaning to a name shared between lifetime and a let binding/etc:

    struct S1 {
      p : Vec<u8>,
      // lifetime extends to such point until which the compiler can trivially
      // prove that p has not been "consumed" or modified in any way
      ref : &'p u8
    }
    
    struct S2 {
      p : Vec<u8>
    }
    
    // OK
    fn foo(s2 : S2) -> S1 {
      let ref : &'s2.p = ...
      S1{ p : s2.p, ref }
    }
    
  • if the compiler "looses sight of" p or p is modified all &'p references end their lifetime

    fn bar(s1 : S1) {
      baz(s1.p); // move out of s1.p
      ... *s1.ref ... // compilation error, s1.ref's lifetime has ended
    

All cases when compiler cannot trivially prove things are safe are forbidden.

Example 1: passing &mut S2 around is okay. Modifying s2.p via &mut S2 is not okay. Using &mut *s2.p is invalid too. Indeed a fn up the call stack may later use s2.ref which has possibly become invalid.

Example 2: re-assigning all of *s2 where s2 : &mut S2 is okay since it preserves validity of s2.ref.

Libraries that find these restrictions too tight are welcome to opt-out via unsafe.

1 Like

You might take a look through Tracking issue for RFC 2115: In-band lifetime bindings · Issue #44524 · rust-lang/rust · GitHub and the corresponding RFC thread -- I vaguely remember some conversations about automatically creating lifetimes named for the corresponding parameters.

1 Like

The primary issue with self-references isn't the naming of the relevant lifetimes, it's the compiler support for handling self-references. That compiler support requires some substantial improvements to lifetime tracking, which are being worked on. Once those improvements are in place, we can start figuring out what the syntax for self-references looks like.

10 Likes

What's the plan how moving self-referential structs is handled? For example, in this case:

struct Graph<const Size: usize> {
    a: [i32; Size],
    b: Vec<(&'a i32, &'a i32)>,
}

When the struct is moved, the references might need to be updated, but this is nontrivial when the references are in a structure such as Vec.

1 Like

Most likely, by defining self-references using relative pointers, such that the compiled code for dereferencing a self-reference will use it as an offset from the location of the pointer.

1 Like

As to the concrete syntax, I think it's a terrible idea to mix names from different namespaces, and even more so to introduce tacit dependency between two different entities based solely on the fact that they share part of the identifier. That's completely counter-intuitive compared to how identifiers currently work, and even it weren't a breaking change, it would not be acceptable from a (re-)learning point of view.

1 Like

The syntax lacks a way to distinguish between a reference to data owned by the same struct, but allocated elsewhere under stable(ish) address, vs actual self-referential to the same object:

struct Indirect {
   p: Vec<u8>, ref: &u8
}

and

struct Direct {
   p: [u8; 10], ref: &u8
}

This makes a huge difference if the object is moved. It's safe to move self-referential object pointing to Vec (the heap), but not one that points to u8 in-line in the same struct. And an offset-based reference could be implemented for in-line data (a bare field, array, ArrayVec), but a relative offset wouldn't work for the Vec case (offset between Vec and struct would change when the struct moves).

4 Likes

I had a feeling last night I was missing something :slight_smile:

Quite right. The plan might still have some value. But it supports references into the heap only.

Breakage is solvable by (two?) editions. Ease of (re-)learning is subjective.

Let's hypothesize a certain syntax to specify a self-referencial type, it could be possible to make self-referencial mutable borrow?

I am asking because the soundness issue with Generators is still open, and I am wondering if the current stacked borrow model could work in the general case. For instance, with an unwieldy and totally inappropriate syntax:

struct A {
    a: u8,
    b: &selfref mut u8,
}

let mut a = A { a: 0, b: &selfref mut A::a };

I can obviously take &mut a, &mut a.a and &mut a.b. But the following

let indirect_a: &mut u8 = &mut *a.b;

Is this sound?

And what about the following:

let A { a_refmut, b_refmut } = &mut a;

In this case we are creating two mutable references pointing to the same u8, breaking aliasing rules.

Maybe it's just me, but I think that it is not only more than syntax (as already @josh said), but also more than compiler implementation. I would obviously like, like many other rustaceans, the possibility of creating self-referencial types in a complete safe way, but if the instance is somehow special (if you cannot perform destructuring, it's surely a strange struct), then I am not sure about the real usefulness of the feature.

Apologies for a bad topic name. My interest was in lifetimes that last as long as a specific heap allocation. Which could start in a callee and end in the caller if so desired.

@dodomorandi: guess the compiler would error on any usages of references that it cannot prove safe. Identifying a useful subset or all possible code sequences easily provable to be safe by the compiler will be an interesting task.

Surely it's the compiler that should trigger an error if something is not allowed. My point, however, is that the compiler just implements a valid set of rules that make safe Rust sound. If you take the simple destructuring case I made before, it is totally not trivial to imagine an additional set of rules that make current Rust valid and, at the same time, allow to handle destructuring self-mut-referencial types soundly. Only at that point it is possible to talk about the problems with the implementation in the compiler, which can be as hard as creating the right set of rules (see NLL, Polonius, Chalk...).

My humble opinion is that deciding a syntax is (warning: hyperbole ahead) just an implementation detail. Proposing a working sound model for self-referencial types coherent with the current stacking borrows model is the hard part.

1 Like

Would polonius be a step forward towards the compiler being able to handle self-references?

Polonious is "just" another implementation of the borrow rules as currently formulated. It does so in a much easier to extend way, but it doesn't fundamentally change what we say is sound.

2 Likes

Needless to say that Niko specifically mentions this case in his latest talk about Polonius as one possible case where it could help in the future