On safe backlinks

A classic problem in Rust is data structures with backlinks. A tree where nodes have backlinks to their owner is tough. Rc/Weak works, but means extra run-time work and the possibility of a runtime panic. Ownership via "handles" which are keys to a HashMap or indices into a vector are really manual pointer manipulation. That's where we are now.

What's often wanted is single ownership with safe backlinks checked at compile time. Is that even possible? I think it is for the immutable case, but I'm not sure it could work for mutable.

So, a proposal for discussion:

A tree, where each struct has a Vec of child objects, is standard Rust. That works fine. Now, suppose we had language support for backlinks, like this:

#[derive(Owner)]  // new feature
///   Node of a tree
struct Node {
    name: String,
    children: Vec<Node>
}

impl Node {
    /// Print the ownership chain back to the root
    pub fn print_ownership_chain(&self) {
        println!("Node name: {}", self.name);
        // get_owner is generated by the derive above.
        while let Some(parent) = self.get_owner() {
            parent.print_ownership_chain();
        }
    }
}

So, the idea is that #[derive(Owner)] does this:

The structure gets a hidden field which contains

 Option<BackReference<Node>>

and the contents of that field are automatically maintained to point back at the owner, if it's another Node, or None, otherwise. When ownership transfers, that field must change. The compiler knows when an object is changing ownership. So that's an implementation problem, but it's not inherently unsafe.

Then there's get_owner(), generated by the derive. Signature:

fn get_owner(&Self) -> Option<&Self>

Deciding, at compile time, when it's allowed to call get_owner is the hard part. It's safe when no one has mutable access to the parent node. Can that be determined at compile time? Is that within the capabilities of the borrow checker model?

This is doing at compile time what people do at run time with Rc and Weak and .upgrade(). So it ought to be possible within the ownership model.

Is this theoretically possible? (And could it be extended to get_owner_mut(), returning a mutable reference?)

This is not possible to be as seamless as you'd hope, and Rust — indirectly — has strongly committed to keeping it that way.

The problem here is that mem::swap(&mut a, &mut b) exists. If you expose mutable references to any two nodes, you can swap their content.

swap performs moves, and moves are defined to be just plain memcpy, without any extra code running. There may be unsafe code out there that relies on this, so this is very unlikely to change in Rust. Any attempt at making it optional works very poorly with generics and/or requires entire ecosystem to opt-in to allowing new behavior, so that's unlikely to happen either.

4 Likes

Hm. Maybe if Swap were a default trait, like Send and Sync? So everything is "Swap", unless that trait has been explicitly removed. mem::swap would then require that its arguments have the Swap trait. What code would have to be aware of that?

Note that "pinning", for async, has much the same problem.

Yeah, that's the:

1 Like

There's already trait Unpin. That is required for move and swap. A struct without trait Unpin would be safe for back references. Would that work?

Update: no. "Unpin has no consequence at all for non-pinned data. In particular, mem::replace happily moves !Unpin data (it works for any &mut T, not just when T: Unpin)."

(NOT A CONTRIBUTION)

It works if the parent is wrapped in a Pin. This is how you could make this work: require the parent to be pinned before creating a child, which will have a back reference to the parent.

This has some undesirable consequences:

  • You can't move the owner - you would have to have some function for move, which updates the back references.
  • You can't safely get a mutable reference to a node - this is exactly how pin works, it restricts access to &mut so you can't call men::swap. You can use pin-project-lite or similar to easily and safely "project" through the pin and get mutable references to the data fields of the node, though.
  • You have a bunch of method signatures with like self: Pin<&mut Self> instead of just &mut self and other syntactic burdens of using Pin.

The most similar code to this that I know of is in the tokio synchronization primitives, which use intrusive linked lists to contain the tasks waiting on events, knowing that the nodes in the list are pinned in place. I don't know of anyone using it for back references like you describe, but it could be made to work.

You also need to worry about UB around mutability and aliasing when using back references like this.

2 Likes

If safe backlinks existed, how would a Drop impl for a tree look like? You'd have to pick some node destruction order, breadth-first, depth-first, postorder, whatever. In any case, once a node is destroyed you'd have dangling references to it, either forward links or backlinks.

So there seems to be two possibilities. Either your backlinks aren't directly dereferenceable, in which case you should call a method like .upgrade() and handle the possibility of a stale link (which is exactly the case with Rc today, and you don't like it). Or the compiler should be able to invalidate backlinks at compile time, without relying on any extra data, such as drop flags. I have honestly no idea how the latter option could work, it seems impossible.

(NOT A CONTRIBUTION)

You just need to drop the children before drop the parent, for example by clearing the vector in the parent's destructor.

EDIT: Since the back references are going to need to be raw pointers, you can just make sure their destructors don't dereference them and that's also fine.

OK, there's been progress. It looks like back-linked structs have to be pinned, which is a reasonable constraint.That means no back reference need ever be updated because the owner moved.

How would Drop work? Dropping of the owner causes drop of the owned struct, of course. The owner can't be borrowed via get_owner() when the owner is dropped. So how to check that at compile time?

If there's no call to get_owner() that returned a reference to an owner anywhere in the call chain, it looks like it's safe to drop the owner. That's a sufficient, but not necessary, condition, one which can be checked statically. It might be too strict. It means that if you're manipulating a tree, you have to be doing it from the root. You can't go down one branch, go back up, go down another branch, and drop something. Is there a use case for that?

(Goal here is to answer whether this is possible. Whether it is desirable comes later.)

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.