Self-referencing structs


#1

In Rust there are currently two ways to make sure that an object A doesn’t outlive an object B:

  • Add a lifetime parameter to A and store a reference to B in it: struct A<'a> { parent: &'a B }.
  • Store a Rc<B> or an Arc<B> in A: struct A { parent: Arc<B> }.

The first method is obviously the best: it is not only faster, but also doesn’t force the user to use an Rc or an Arc.

However, this first method has a big drawback: you can’t store the A and the B in the same struct, making the A object unusable in many situations.

This hasn’t really been a problem so far because objects in the standard library which depend on other objects (like A and B above) are only temporary objects. Iterators, MutexGuards, Ref, etc. are all temporaries and don’t last long. But if you want a child object that lasts longer but still shouldn’t outlive its parent, you will very quickly face this issue.

Unfortunately this will probably lead library writers to use the second method instead of the first one, which leads to lower performances, even though there shouldn’t be any technical limitation.

Technically

Technically, when you store an object in a Box, in an Rc or in an Arc, the object will stay at the same memory location even if you move its containing struct.

So there is no reason why you shouldn’t be able to write this as long as you don’t modify the object and as long as you destroy child before container:

struct Foo {
    container: Box<A>,
    child: &'lifetime? A,
}

If you never modify a Foo object, the A object contained in it will always stay at the same memory location, and the child member can safely point to the A.

You can even write this if you also ensure that Foo is never moved:

struct Foo {
    container: A,
    child: &'lifetime? A,
}

There are some limitations to writing this kind of code:

  • The container member of the struct must never be modified. You must not be able to write neither c.container.mut_something() nor c.container = ...
  • If Foo can be moved, then the container must be a Box, an Arc, an Rc, or any other object whose content doesn’t change memory location when Foo is moved.
  • child must be destroyed before container.

Is there a nice syntax?

The problem is that I didn’t come up with any real idea about how to make this work.

Here are some ideas that I think should help in making this possible:

  • Add a const modifier that can be applied to struct fields. A const field can never be modified after initialization.

  • Add a special 'self lifetime that can only be used within structures. For example: struct C { container: const Box<B>, child: &'self A } or struct C { container: const Box<B>, child: A<'self> }.

  • Add a new opt-out trait named Move which indicate objects that can be moved. It is similar to Send but much more restrictive.

  • Specify clearly in the language the order in which the elements of a struct are created and destroyed. The order of creation should be the order of declaration, and the order of destruction the inverse (just like in C++). This is to make sure that the referencer is destroyed before the referenced.

  • During the initialization of a struct, each expression can access the fields that are constructed before.

But then I’m stuck. What is 'self exactly? How would rust now what can be assigned to a &'self? What is the relations between const and Box/Arc/Rc with 'self?

My experiment

A quick note to say that I have been experimenting with a syntax extension that allows one to write self-referencing structs: https://github.com/tomaka/rust-self-ref

I’m currently facing some issues related to hygiene (during initialization I can’t manage to reference at the same time the structure itself and the surroundings), but apart from this the extension works.

A code that looks like this:

#[self_referencing]
struct MyStruct {
    a: Vec<int>,
    b: &'self_ref int
}

Will be turned into:

struct MyStruct {
    a: Box<Vec<int>>,
    b: Option<&'static int>
}

And this code:

self_ref_new!(MyStruct {
    a: vec![0, 1, 2, 3],
    b: self_ref._get_a().as_slice().get(0).unwrap()
});

Is turned into:

{
    let self_ref = MyStruct {
        a: box vec![0, 1, 2, 3],
        b: None
    };
    
    self_ref.b = Some(self_ref._get_a().as_slice().get(0).unwrap());
    self_ref
}

This is obviously very hacky, but is a small proof-of-concept that it should be possible.

Why open this topic?

I’d like to raise awareness that this lack of self-referencing structs will probably become a real issue in the future, when higher-level APIs are created.

I don’t have a solution myself, but my opinion (coming from a C++ background) is that self-referencing structs and associated items are the only two majors features that are lacking in rust right now.


#2

I ran into this when playing around with Piston.


#3

As far as I can tell, the only way to do this safely is with Rc and friends.

struct MyStruct {
   x: Rc<int>,
   storage: Vec<Rc<int>>
}

If you want to get mutable references to the data you will have to use an abstraction like RefCell, which will check that you are not violating Rust’s invariants dynamically:

struct MyStruct {
    x: Rc<RefCell<int>>,
    storage: Vec<Rc<RefCell<int>>>
}

The proposed 'self lifetime for contained pointers only upholds one aspect of Rust’s rules: & and &mut references cannot be NULL. However, it does not allow for upholding the other invariants relating to borrowing and mutability.

If you need a struct which actually has to refer to itself (not into itself) then you can do something similar:

struct MyStruct {
   val: int
   self: Option<Rc<RefCell<MyStruct>>>
}

impl MyStruct {
    fn new(val: int) -> Rc<RefCell<MyStruct>> {
        let mut this = Rc::new(RefCell::new(MyStruct { val: val, self: None }));
        this.borrow_mut().self = Some(self.clone());
        this
    }
}