Improving self-referential structs

jpernst · February 18, 2017, 1:07am

I’m the author of a crate called rental, designed to enable the declaration of self-borrowing structs. I’ve tried as hard as I could to make it sound and usable. I believe it’s sound, but I’m still not 100% sure of that. Even if it is, though, the API is still painfully awkward and limited.

This pattern seems to emerge often enough that some crates have taken to offering RCed alternatives to their lifetime API in an attempt to circumvent this issue. I feel like that’s unfortunate and misses one of the key benefits of rust. As it stands, without self-borrows, internal references that should be an implementation detail will leak their lifetime params out to the API consumer, and their consumer, and so on.

After a brief discussion on IRC, it came to my attention that this appears to be an issue in implementing generators as well. I’m curious what the current thinking is on that, and if a general purpose solution has been conceived. I’m open to doing a significant amount of legwork to make this possible in a crate, but any language support at all could be very helpful. However, I lack the PLT background to offer any kind of comprehensive solution.

My first instinct would be to allow existential lifetimes to exist outside of closures, which would significantly improve rental’s API, but perhaps there’s an even better way. I’d simply like to start a discussion about this since I’ve found it to be a vexing point of friction in API design and implementation.

comex · February 18, 2017, 2:56am

See also: immovable types RFC. That would probably be a prerequisite for native support.

withoutboats · February 18, 2017, 3:02am

There’s some pretty clear usecase for this in tokio, in which the stack of a service’s future requires a number of reference counted pointers to make it work, but could be internal references into one heap allocation. I agree that immovable types are an important step toward this & I’ve been meaning to try to figure out exactly what needs to be done to make this work.

jpernst · February 18, 2017, 5:25am

It should also be possible to make a movable self-referential type as well, though. Both rental and owning_ref achieve this with a trait that marks container types as having a fixed, consistent Deref target. This allows the outer struct to be moved without invalidating inner self-references. Basically, anything that can allow the consumer of a struct to not have to be concerned with, or even aware of, internal self-referential implementation details, instead being able to treat it as an opaque value like any other.

withoutboats · February 18, 2017, 5:44am

Yes, the idea is definitely that you somehow allocate the immovable type into the heap and then move the pointer to it around.

jpernst · February 18, 2017, 6:01am

Ah, ok, that makes a lot of sense actually. Subscribed to that RFC thread and will be following it with interest.

yigal100 · February 18, 2017, 6:19am

This should be orthogonal to immovable types, otherwise we may prevent future use cases. For example, if the user wants to combine a moving/compacting GC with self referential types.

I’d prefer instead to implement this with offsets. As for syntax, we already have one special lifetime &'static T, we can extend this with another one for this case:

struct MyType {
    f1: i32,
    f2: &'self i32 // #1
}
let x = MyType { f1: 42, f2: &f1 }

at #1 we declare that f2 is a self-borrow of an i32, implemented as an offset inside the MyType struct.

jpernst · February 18, 2017, 6:34am

One concern I see with this is that having just one special 'self lifetime actually isn’t enough. You can have arbitrarily many levels of self-referentiality, and each “layer” adds a new, distinct lifetime. Such as:

struct A {
    a_val: i32
}

impl Drop for A { ... }

struct B<'a> {
    a_ref: &'a A,
    b_val: i32,
}

impl<'a> Drop for B<'a> { ... }

struct C<'a, 'b> {
    a_ref: &'a A,
    b_ref: &'b B,
    c_val: i32,
}

impl<'a, 'b> Drop for C<'a, 'b> { ... }

struct SelfRef {
    a: A,
    b: B<'self>,
    c: C<'self, 'self>, /// ???
}

Here there are two distinct “self” lifetimes at play. The one used to denote the lifetime of A, and another for the lifetime of B. They can’t be identical because of dropck. Either the compiler would have to infer this somehow, or we would need to be able to declare multiple special “self” lifetimes.

Even rental can’t handle this yet, and I’ve already had requests to somehow deal with it, since my other lib frequently results in multi-layer borrows like this that one would reasonably want to be able to store in a single struct.

jpernst · February 18, 2017, 7:17am

It occurs to me that what I’m basically trying to create is a stack frame inside of a struct. Perhaps there’s some way this could play into the generators proposal to facilitate some kind of “portable” stack frame. Rust’s entire lifetime system is rooted in the notion of stack frames and scoping, so if there were some way to generalize the notion of a frame, that could work as well.

yigal100 · February 18, 2017, 8:13am

How can there be two self lifetimes? 'self refers to the containing type, the lifetime of SelfRef and the above example should actually be valid.

regarding dropck, there’s an RFC to specify drop order. That means that self references must comply with that order. I don’t see why we’d need a more refined system that allows to specify lifetimes of specific fields inside the struct since the struct is an atomic unit. Otherwise, we’d “gain” the ability to have structs with uninitialized fields which is a separate can of worms.

jpernst · February 18, 2017, 8:41am

If all the field lifetimes are identical, then they can freely reference eachother in any way, including circularly. Or a mutable reference in B could be set to point to a field in C, even though C must be dropped first. Once struct fields can reference eachother, their relative lifetimes become relevant. Right now, Drop is effectively atomic since no struct field can be aware of any other, so you can consider them all to have the same lifetime, but once they can reference eachother, the “strictly outlives” semantics required for Drop impls becomes meaningful again.

yigal100 · February 18, 2017, 8:58am

That’s inaccurate. Rust has a specific drop order for struct fields and therefore the language can specify in accordance that self borrows must also obey the same order. That means, that if B tries to reference C that would be a compile-time error (“cannot self-borrow field that is declared after this field”).

So no, lifetimes are not identical, they’re just being controlled by the compiler, not the user.

Ixrec · February 18, 2017, 4:27pm

What about “self-referential” types where the self-reference is to owned heap memory rather than something contained directly in the struct?

struct PreParsedString {
  str: String,
  tokens: Vec<&str>
}

Not that I personally need this, but when I see people ask for self-referential structs “that work just fine in C++” it’s usually something like this, and iiuc this is why using offsets instead of actual pointers doesn’t quite solve the problem. I suppose we could distinguish between &'self which is an offset and &'selfheap which is a real pointer, but I don’t know how to #[derive(Clone)] in such a way that &'selfheap pointers would be guaranteed to work out correctly.

yigal100 · February 18, 2017, 4:34pm

Why can’t that example be represented as:

struct PreParsedString<'a> {
    str: &'a String,
    tokens: Vec<&'a str>
}

or some variation thereof? iow, the struct has references to both the original string and its tokens.

Storyyeller · February 18, 2017, 4:47pm

yigal100:

Why can't that example be represented as:
struct PreParsedString<'a> {
    str: &'a String,
    tokens: Vec<&'a str>
}
or some variation thereof? iow, the struct has references to both the original string and its tokens.

Because that still requires the String to be stored in a fixed location on the stack somewhere. The goal is to be able to pass it around arbitrarily, but right now you need crates like owning_ref or rental to do this.

Anyway, I think the "portable stack frames" idea has merit. It is often necessary to transform recursive code into iterative code in order to avoid running out of stack space. This involves simulating the stack on the heap (using a Vec<Box<Frame>> or similar). However, Rust's borrow system requires borrowed data to have lifetimes tied to values that live on the actual stack, meaning there is no safe way to do this.

beamspease · March 28, 2017, 5:12am

Just wanted to chime in here that this limitation has been a massive pain for me using the Alex Crichton’s ssh2 library. I’d ideally like to have a Connector struct that my Object struct takes ownership of, that uses a connect() function. However…

I couldn’t get things to work using Options, because the references received by pattern-matching are transient.
I can’t take ownership of it otherwise, because of the inability to create self-referential structs.
I struggled somewhat with std::rc::Rc to try and follow his suggestion, but was ultimately unsuccessful.

Eventually I was able to work out a careful balance by only passing a reference to the Connector struct, doing the handshake in Connector::new(), and creating the Sftp etc objects in the connect() function. Then I needed to make the connection optional and found myself hassling with lifetimes again because the handshake in new() requires a mutable reference to the Session, moving it to the connect function requires me to upgrade the connect function to a mutable reference, and this in turn resulted in conflicts because of multiple mutable borrows or some other reason I don’t fully understand yet.

I’m not sure if this intended to force good design practices, but it’s been a real hassle that seems to make providing a simple API impossible without using reference-counting, unsafe, or some obscure workaround. I don’t want to force someone to have to create multiple structs and adhere to a specific initialization order every time they use my Object.

I like the idea of being able to specify references to fields on a struct as it’s being created (with default 'self lifetimes unless otherwise specified), and either have the compiler automatically determine the order, or require the developer to specify them in an order such that the prerequisite fields are created first.

jpernst · March 28, 2017, 5:25am

I'm currently in the process of redesigning rental, and it will actually work basically as you describe. You'll be able to define a struct with any number of fields, each of which implicitly creates a lifetime of the same name, and can borrow from previous fields arbitrarily deep. I'm feeling good about the design and hopefully it will be ready soon.

beamspease · March 28, 2017, 5:41am

I strongly feel this should be a language feature. I can't think of a technical reason this shouldn't be doable from the POV of what's going on with the stack, although someone mentioned what sounds like a compiler limitation earlier in the thread. This is something I'd definitely put in the ergonomic bin. If creating a self-reference were as easy as adding '&' and the compiler adding an implicit 'self lifetime, I'd probably be able to figure it out in the first couple tries.

Having to hunt around for a library, and a tutorial on how to use that library, etc. just for this kind of thing detracts from the usefulness of the language because you get completely sidetracked from what you were doing. I appreciate the work, but this seems like one of those things that seems super-easy to someone super-experienced, but which is a major stumbling block for someone just getting started.

jpernst · March 28, 2017, 6:00am

Oh believe me, I totally agree. Rental exists as a stop gap until a hopeful eventual future time where it’s no longer necessary. In the meantime I’m trying very hard to make the new version feel as “natural” as possible, and will do a write up about it once it’s usable. Unfortunately 'self won’t work since it just doesn’t provide enough information to infer the lifetime relationships, but instead each field has its own lifetime that later fields can use.

nikomatsakis · March 30, 2017, 11:20am

Is your in-progress work for rental on a branch or some place I can take a look? As part of the canvas unsafe code effort I’d like to take a look at how the internals work.

Topic		Replies	Views
Proposal about expired references language design	32	3256	April 30, 2020
Blog post: Indirect ownership, shallow borrow and self-referential data structures language design	6	1591	July 27, 2022
Minimal support for self-referential structures?	3	1513	March 2, 2022
Self-borrowing data would allow erganomic generativity language design	5	530	November 18, 2024
Idea: on movable self-referential structs language design	10	1929	March 25, 2019

Improving self-referential structs

Related topics