Self-referential structures, without "outer" pinning

mcy · November 4, 2019, 7:38pm

I remember a couple years ago there was a bit of noise about user-defined self-referential structs. Discourse helpfully suggests this topic, for example. Rental does a pretty good job of solving my problems, but it feels like overkill a lot of the time.

What's the state of the art on this, anyway? I can't count the number of times I build structs that look like

struct MyCoolGraph {
  // Can't pin on the "stack"... I think Centril's 
  // Freeze<T> is probably more correct?
  arena: Pin<Box<Arena<Node<'arena>>>>,
  root: Node<'arena>,
}

I know there has been a lot of arguing about 'self, whatever that means, in the past, but I'm particularly interested in the case where I have some arena-like type that generates otherwise unbound &mut Ts from an immutable source, and storing those in the same struct that owns the arena.

Mind, this isn't the only way to achieve this: id_arena is pretty amazing, and rental is actually fairly reasonable! But this is one of the few things that's completely painless and mostly-safe to do in C++, but painful and unintuitive enough in Rust that I'd like to see a more reasonable story.

RustyYato · November 5, 2019, 4:03am

I normally create the arena in the same place as the graph and just store a reference to the arena inside the graph.

struct MyCoolGraph<'arena> {
  arena: &'arena Arena<Node<'arena>>,
  root: Node<'arena>,
}

It's cleaner and clearly conveys the ownership semantics.

Also, you don't need the Pin<Box<_>>, because the Arena guarantees that all of the items are pinned in memory by giving you a reference to it for the same lifetime as the arena.

mcy · November 5, 2019, 4:12am

That's also what I usually do, but it means something very different. I want the graph to own its arena and be self-contained, so that the arena is an implementation detail; there's value for the arena lifetime to escape the data structure, since there's nothing useful to bind it to.

Also, you don't need the Pin<Box<_>> , because the Arena guarantees that all of the items are pinned in memory by giving you a reference to it for the same lifetime as the arena.

Yes, I do, for what I wrote to be sound, because 'arena needs to refer to the lifetime of that Arena value. If it were possible to swap that Arena for another Arena value, I could immediately drop the old Arena on the ground and create a bunch of dangling references. It's this kind of subtlety that rental deals with for you, but which I want a way to express in the core language; what I want could also be expressed with a theoretical Freeze<T> that provides only shared access to a wrapped T.

RustyYato · November 5, 2019, 4:31am

No, because if we had self-referential types, you would be borrowing the field arena in the field root, so you couldn't swap out arena. Just like you can't swap out

let mut arena = Arena::new();
let root: Node = make_root(&arena);
std::mem::replace(&mut arena, Arena::new()); // error[E0502]: cannot borrow `arena` as mutable because it is also borrowed as immutable
drop(root);

Any function other than the constructor function would have to assume that arena is borrowed by root (in order to prevent the issue you described), and in the constructor function the normal borrow checker will have your back.

Will this work? playground. Note: this does require interior mutability, but no Pin<_>.

Note: if you want to get rid of the interior mutability requirement, then you will have to use unsafe code, and raw pointers/NonNull to erase the lifetime. In that case, Pin<_> doesn't get you anything, because Pin<_> is meant to be a contract across module/api boundaries. If you control all of the unsafe bits (which you will have to in this case), then Pin<_> can't help you. It just add noise with all of the get_unchecked_mut and such.

Note: with self-referential types, the playground example I showed above would be almost the same. Just that we may not have to name 'arena explicitly. We would also not have to use interior mutability, because the borrow would be at the granularity of the fields, so we would know that root isn't borrowed, but arena is. Otherwise it would be the same.

mcy · November 5, 2019, 5:13am

Fine, that's another way of staying what I said, only I'm not assuming any semantics for a self-borrow, since that was the whole question I was asking in the first place.

You example is neat and not something I had thought of, but unfortunately now shifts the issue to insert taking &'arena self, which is the kind of thing that, in my experience, results in hard-to-resolve lifetime coherence problems later on, especially since Graph is invariant in 'arena. Graph still has a lifetime parameter, which is the whole problem I'm trying to avoid, anyway. In my experience, around this point I give up and just play fast and loose with unsafe as if I was writing C++, which is not really a place I'd like to be in.

RustyYato · November 5, 2019, 5:18am

Ok, it looks like I misunderstood

Very true, although in this case, it should be manageable because we are only using a shared reference, but that invariance might be a problem.

Just out of curiocity, why do you want Graph to own Arena? It doesn't seem to be worth the effort. Could you use a global arena, or would that also be bad?

mcy · November 5, 2019, 2:05pm

Encapsulation, mostly? I write a lot of graph-based code where I want to provide the caller with an API that doesn't depend on being implemented with an arena, e.g., a Box implementation strategy is not visible to the caller. Lifetimes are easier to manage when each big graph owns all of its memory at the top level, instead of depending on a global arena, which might lead to waste as graphs are created and destroyed.

I am somewhat disappointed at how much arenas in Rust seem left by the wayside, in general. I think as std::alloc becomes more stable I will be less sad, but this particular sadness feels more fundamental. A related example, which I've encountered in the real world in memory-constrained C++, is something like

struct Host {
  host_and_port: String,
  host: Cow<'host_and_port, str>,
}

Here, I want to express that, in most cases, host_and_port contains the host as a substring, but sometimes does not, for the purposes of saving memory from duplicating the allocated string.

Ixrec · November 5, 2019, 2:16pm

I think I missed something important: Isn't it fundamentally impossible to hide/encapsulate a self-referential struct from clients of your API without Boxing it? (I assume having Pins or unsafe in your public APIs doesn't count as "hiding") Even in C++, they'd have to be told not to move it around to avoid UB, right?

mcy · November 5, 2019, 2:33pm

Correct. A tacit implication here is that the only types we're self-referencing:

Contain indirection.
Cannot have that indirection modified without an &mut to them.

CAD97 · November 5, 2019, 4:08pm

The way Rowan handles this is that we have an immutable persistent arc-tree (GreenNode), and the tree API is exposed as roughly

struct SyntaxNode {
    root: Arc<GreenNode>,
    node: ptr::NonNull<GreenNode>,
    parent: ptr::NonNull<GreenNode>,
}

The safety invariant is that the pointers must be children owned by the root node. Accessors are then implemented that borrow from the pointers with the correct lifetime and allow walking of the tree.

This is basically exactly the rental technique (owned, stable deref pointer, fields point into it), just more explicitly. We could even use &'static GreenNode if we were more willing to lie to the borrow checker; it'd just be more error prone to use.

Ultimately, Pin doesn't really change the way you construct concrete versions of the "self field stable deref" borrow. You just have to make very sure not to allow invalidating your safety invariants. Once you've done so, use pointers to your data or 'static and just be sure not to expose references with lifetimes that outlive a borrow of your main object.

mcy · November 5, 2019, 4:26pm

This sounds like a lot of work, though, especially when you're declaring a lot of these graphs. I don't like writing unsafe code, and this is a big source of unsafe code for me.

I see two issues that seem kind of painful to resolve:

All of our examples depend on a container of type A which we:
- Hold as immutable, i.e. borrowing it mutably is forbidden.
- Contains "stable indirection". The meaning of this seems really, really hard to pin down, since it encompasses arenas and Box-like pointers, but not containers like Vec or HashMap, which are unstable (but potentially Vec<Box<T>>).
You can't avoid "stable indirection" without re-unleashing the bikeshed of ?Move, since if the self-borrowed value does not have stable indirection, we need to make the structure immovable (think: GC stack roots).

I spent a whole five minutes failing to express "stable indirection" as an unsafe trait and failed, so I think even the version without immoveable structs is a nightmare to formalize.

CAD97 · November 5, 2019, 10:26pm

https://crates.io/crates/stable_deref_trait ?

I think the only solution really is the rental pattern, and that's either done unsafely or with rental.

system · February 3, 2020, 10:30pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Minimal support for self-referential structures?	3	1512	March 2, 2022
New core type suggestion - StructP compiler	6	904	December 16, 2022
Proposal about expired references language design	32	3253	April 30, 2020
Improving self-referential structs language design	83	23667	March 25, 2019
Blog post: Indirect ownership, shallow borrow and self-referential data structures language design	6	1588	July 27, 2022

Self-referential structures, without "outer" pinning

Related topics