Improving self-referential structs


#1

I’m the author of a crate called rental, designed to enable the declaration of self-borrowing structs. I’ve tried as hard as I could to make it sound and usable. I believe it’s sound, but I’m still not 100% sure of that. Even if it is, though, the API is still painfully awkward and limited.

This pattern seems to emerge often enough that some crates have taken to offering RCed alternatives to their lifetime API in an attempt to circumvent this issue. I feel like that’s unfortunate and misses one of the key benefits of rust. As it stands, without self-borrows, internal references that should be an implementation detail will leak their lifetime params out to the API consumer, and their consumer, and so on.

After a brief discussion on IRC, it came to my attention that this appears to be an issue in implementing generators as well. I’m curious what the current thinking is on that, and if a general purpose solution has been conceived. I’m open to doing a significant amount of legwork to make this possible in a crate, but any language support at all could be very helpful. However, I lack the PLT background to offer any kind of comprehensive solution.

My first instinct would be to allow existential lifetimes to exist outside of closures, which would significantly improve rental’s API, but perhaps there’s an even better way. I’d simply like to start a discussion about this since I’ve found it to be a vexing point of friction in API design and implementation.


#2

See also: immovable types RFC. That would probably be a prerequisite for native support.


#3

There’s some pretty clear usecase for this in tokio, in which the stack of a service’s future requires a number of reference counted pointers to make it work, but could be internal references into one heap allocation. I agree that immovable types are an important step toward this & I’ve been meaning to try to figure out exactly what needs to be done to make this work.


#4

It should also be possible to make a movable self-referential type as well, though. Both rental and owning_ref achieve this with a trait that marks container types as having a fixed, consistent Deref target. This allows the outer struct to be moved without invalidating inner self-references. Basically, anything that can allow the consumer of a struct to not have to be concerned with, or even aware of, internal self-referential implementation details, instead being able to treat it as an opaque value like any other.


#5

Yes, the idea is definitely that you somehow allocate the immovable type into the heap and then move the pointer to it around.


#6

Ah, ok, that makes a lot of sense actually. Subscribed to that RFC thread and will be following it with interest.


#7

This should be orthogonal to immovable types, otherwise we may prevent future use cases. For example, if the user wants to combine a moving/compacting GC with self referential types.

I’d prefer instead to implement this with offsets. As for syntax, we already have one special lifetime &'static T, we can extend this with another one for this case:

struct MyType {
    f1: i32,
    f2: &'self i32 // #1
}
let x = MyType { f1: 42, f2: &f1 }

at #1 we declare that f2 is a self-borrow of an i32, implemented as an offset inside the MyType struct.


#8

One concern I see with this is that having just one special 'self lifetime actually isn’t enough. You can have arbitrarily many levels of self-referentiality, and each “layer” adds a new, distinct lifetime. Such as:

struct A {
    a_val: i32
}

impl Drop for A { ... }

struct B<'a> {
    a_ref: &'a A,
    b_val: i32,
}

impl<'a> Drop for B<'a> { ... }

struct C<'a, 'b> {
    a_ref: &'a A,
    b_ref: &'b B,
    c_val: i32,
}

impl<'a, 'b> Drop for C<'a, 'b> { ... }

struct SelfRef {
    a: A,
    b: B<'self>,
    c: C<'self, 'self>, /// ???
}

Here there are two distinct “self” lifetimes at play. The one used to denote the lifetime of A, and another for the lifetime of B. They can’t be identical because of dropck. Either the compiler would have to infer this somehow, or we would need to be able to declare multiple special “self” lifetimes.

Even rental can’t handle this yet, and I’ve already had requests to somehow deal with it, since my other lib frequently results in multi-layer borrows like this that one would reasonably want to be able to store in a single struct.


#9

It occurs to me that what I’m basically trying to create is a stack frame inside of a struct. Perhaps there’s some way this could play into the generators proposal to facilitate some kind of “portable” stack frame. Rust’s entire lifetime system is rooted in the notion of stack frames and scoping, so if there were some way to generalize the notion of a frame, that could work as well.


#10

How can there be two self lifetimes? 'self refers to the containing type, the lifetime of SelfRef and the above example should actually be valid.

regarding dropck, there’s an RFC to specify drop order. That means that self references must comply with that order. I don’t see why we’d need a more refined system that allows to specify lifetimes of specific fields inside the struct since the struct is an atomic unit. Otherwise, we’d “gain” the ability to have structs with uninitialized fields which is a separate can of worms.


#11

If all the field lifetimes are identical, then they can freely reference eachother in any way, including circularly. Or a mutable reference in B could be set to point to a field in C, even though C must be dropped first. Once struct fields can reference eachother, their relative lifetimes become relevant. Right now, Drop is effectively atomic since no struct field can be aware of any other, so you can consider them all to have the same lifetime, but once they can reference eachother, the “strictly outlives” semantics required for Drop impls becomes meaningful again.


#12

That’s inaccurate. Rust has a specific drop order for struct fields and therefore the language can specify in accordance that self borrows must also obey the same order. That means, that if B tries to reference C that would be a compile-time error (“cannot self-borrow field that is declared after this field”).

So no, lifetimes are not identical, they’re just being controlled by the compiler, not the user.


#13

What about “self-referential” types where the self-reference is to owned heap memory rather than something contained directly in the struct?

struct PreParsedString {
  str: String,
  tokens: Vec<&str>
}

Not that I personally need this, but when I see people ask for self-referential structs “that work just fine in C++” it’s usually something like this, and iiuc this is why using offsets instead of actual pointers doesn’t quite solve the problem. I suppose we could distinguish between &'self which is an offset and &'selfheap which is a real pointer, but I don’t know how to #[derive(Clone)] in such a way that &'selfheap pointers would be guaranteed to work out correctly.


#14

Why can’t that example be represented as:

struct PreParsedString<'a> {
    str: &'a String,
    tokens: Vec<&'a str>
}

or some variation thereof? iow, the struct has references to both the original string and its tokens.


#15

Because that still requires the String to be stored in a fixed location on the stack somewhere. The goal is to be able to pass it around arbitrarily, but right now you need crates like owning_ref or rental to do this.

Anyway, I think the “portable stack frames” idea has merit. It is often necessary to transform recursive code into iterative code in order to avoid running out of stack space. This involves simulating the stack on the heap (using a Vec<Box<Frame>> or similar). However, Rust’s borrow system requires borrowed data to have lifetimes tied to values that live on the actual stack, meaning there is no safe way to do this.


#16

Just wanted to chime in here that this limitation has been a massive pain for me using the Alex Crichton’s ssh2 library. I’d ideally like to have a Connector struct that my Object struct takes ownership of, that uses a connect() function. However…

  1. I couldn’t get things to work using Options, because the references received by pattern-matching are transient.
  2. I can’t take ownership of it otherwise, because of the inability to create self-referential structs.
  3. I struggled somewhat with std::rc::Rc to try and follow his suggestion, but was ultimately unsuccessful.

Eventually I was able to work out a careful balance by only passing a reference to the Connector struct, doing the handshake in Connector::new(), and creating the Sftp etc objects in the connect() function. Then I needed to make the connection optional and found myself hassling with lifetimes again because the handshake in new() requires a mutable reference to the Session, moving it to the connect function requires me to upgrade the connect function to a mutable reference, and this in turn resulted in conflicts because of multiple mutable borrows or some other reason I don’t fully understand yet.

I’m not sure if this intended to force good design practices, but it’s been a real hassle that seems to make providing a simple API impossible without using reference-counting, unsafe, or some obscure workaround. I don’t want to force someone to have to create multiple structs and adhere to a specific initialization order every time they use my Object.

I like the idea of being able to specify references to fields on a struct as it’s being created (with default 'self lifetimes unless otherwise specified), and either have the compiler automatically determine the order, or require the developer to specify them in an order such that the prerequisite fields are created first.


#17

I’m currently in the process of redesigning rental, and it will actually work basically as you describe. You’ll be able to define a struct with any number of fields, each of which implicitly creates a lifetime of the same name, and can borrow from previous fields arbitrarily deep. I’m feeling good about the design and hopefully it will be ready soon.


#18

I strongly feel this should be a language feature. I can’t think of a technical reason this shouldn’t be doable from the POV of what’s going on with the stack, although someone mentioned what sounds like a compiler limitation earlier in the thread. This is something I’d definitely put in the ergonomic bin. If creating a self-reference were as easy as adding ‘&’ and the compiler adding an implicit 'self lifetime, I’d probably be able to figure it out in the first couple tries.

Having to hunt around for a library, and a tutorial on how to use that library, etc. just for this kind of thing detracts from the usefulness of the language because you get completely sidetracked from what you were doing. I appreciate the work, but this seems like one of those things that seems super-easy to someone super-experienced, but which is a major stumbling block for someone just getting started.


#19

Oh believe me, I totally agree. Rental exists as a stop gap until a hopeful eventual future time where it’s no longer necessary. In the meantime I’m trying very hard to make the new version feel as “natural” as possible, and will do a write up about it once it’s usable. Unfortunately 'self won’t work since it just doesn’t provide enough information to infer the lifetime relationships, but instead each field has its own lifetime that later fields can use.


#20

Is your in-progress work for rental on a branch or some place I can take a look? As part of the canvas unsafe code effort I’d like to take a look at how the internals work.