Improving self-referential structs

I don’t have a formal theory background - can you help me understand what a generative existential is?

I had run into similar issues trying to create a graph structure, and my solution which can currently only be implemented by ignoring a lot of Rust’s safety checks was potential borrows. Essentially you’d have a RefCell type structure with a pointer to the data and a lifetime that doesn’t represent a borrow but can be turned into one, at which point the regular checks are done at runtime. This is essentially the “fat lifetime” I thought I saw a mention of, but it doesn’t address foreign libraries like @spease is trying to use.

I’d like to see more ideas about ways to improve C interop without resorting to unsafe code. Issues like this where you need to wrap everything in Box or Rc make me think I might as well use C#, and when everything needs tedious unsafe workarounds then I’d might as well use C++. As it is, trying to use the Windows API in Rust makes C++ feel like a scripting language. I think it should and could be the other way around given some thought.

I’m not a type theory expert either, so I can’t offer a comprehensive explanation that I’m sure is error free. Instead, I’ll just phrase it as “what rental currently does, except not in a closure”. Rental replaces the self-ref lifetimes with lifetimes known only to the implementation, which means the compiler can’t assume anything about how long they’re valid for. This prevents data with incorrect lifetime from being smuggled in or out of the struct, since the compiler just assumes that no other lifetime is a valid match for it. What we’d need is language support for this kind of lifetime to exist outside of a closure, which is the only place where it’s currently possible to express this concept in the language.

Many months, I'm afraid.

The good news is that generators basically require some form of this to be egonomic, which is why immovable types is making progress.

3 Likes

I’ll also add that I personally never encounter a need for this, and I never feel constrained in the kinds of APIs I can express by not having this feature. The only exception is that I want async IO to be ergonomic, which requires self referential generators to desugar async functions into.

My impression of the feedback in this thread and others is that people who run into this wall are either operating on an API from another language environment which makes heavy use of long living but unowned intermediate values (e.g. OpenGL, Windows API) or are implementing patterns that would encourage that (e.g. object-oriented GUI patterns).

But I personally do not experience something like “I want to store a Vec and an Iter<'a> in the same struct” when I’m just writing Rust, free from the constraints of a non-Rust API. Maybe I just have blinders because of how the language has influenced me.

2 Likes

@withoutboats I’m sure that pure Rust APIs are not a problem. Do you think that it should be easy to consume C APIs though? I mean, I’m genuinely curious because I do find it tedious to work with them right now. Rust is very opinionated as a language, and that’s not a bad thing - learning it has done a lot for my understanding of concurrency in other languages. But there are some cases where that model just doesn’t quite fit as it is now, and while it’s possible to do (almost?) everything C can do, it’s not necessarily easy or intuitive, or the easy way is not as performant.

If I need to find a regularly maintained API wrapper or write one myself, I’m not going to do that. If it’s hard to consume a C API, I’m not going to do that either. And that’s not a bad thing. I still like and appreciate Rust for what it is good at either way. If this is a goal though, then I can plan on eventually replacing C and C++ with Rust, which I’d like to be able to do. If it’s just a matter of time and finding ways to merge the two approaches, that’s cool, I hope I’ll be able to help further that at some point.

I suppose it’s fair to say that I’m in the weeds with this feature, since basically everything I’ve used rust for, both as a hobby and in production, involved a significant amount of FFI and adapting foreign object models into rust. This is literally the first major problem I hit when learning rust 4 years ago, and it continues to hound me. It’s why I invested months of time on the total yak-shave called rental, and my inability to produce an ergonomically usable solution even after all that time has burnt me out on it to the point that I see everything through this lens now. Even so, the experience of working on rental has been profoundly valuable in my growth as a developer, so I still think it was worth it.

I do get the distinct impression that the rust zeitgeist in general doesn’t really experience this issue, so I’ll concede that I’m likely in the minority on this one. Perhaps I’m just not quite on the rust wavelength.

4 Likes

I too have never faced a real need to solve this problem until a couple of weeks ago, it was always possible to code around it before.

I think now I’ve hit the case where rental is essential, but I would be delighted to be proven wrong. My case involves Rust -> Js FFI, but the need for rental arises at the Rust side.

So, I have this owned object, file, which stores/ownes a bunch of AST nodes. Then, I have ananlysis object, which stores references to the file. This 'f lifetime is the underling lifetime of the file. Analysis can be thought of as a giant hash map from file’s node to some cached information. Analysis has a bunch of methods to query various bits of information about the file, so, on the Rust side, most of the functions take &Analysis<'f> argument (&const because analysis uses interior mutability for caches).

Now, I want to export some of analysis functions to JavaScript. In fact, I want JS to own a pair of file and analysis. So, I have this wrapper which bundles a rental-wrapped pair into an ARC. Is there any way to handle both file and analysis to JS without rental in this case?

PS: @jpernst, your work on rental is super awesome :heavy_heart_exclamation:

1 Like

I do agree that this is important, and I have what I think amounts to a pure-Rust use case.

In my use case, I have a raw bytes response for a network protocol, and I would like a stored parsed representation (several levels of nested enums, basically) next to the byte response.

I ended up just using unsafe with a transmuted lifetime because I didn’t want to pull in more complexity than needed, but I’ll probably re-evaluate rental at some point.

3 Likes

Thanks for posting about your use cases, I think this kind of thing is valuable to discuss and it’s always interesting reading about them. I regret that I haven’t kept a more detailed log of the use cases I’ve seen or that people have asked me about on IRC and such, since that would probably help motivate this feature more. However, I will say that for every case of someone using rental, there are several more of someone who was interested in what rental provided, but felt it was too heavy, complicated, or intimidating. So in that sense I believe the reverse-dependency list on rental is deceptively short, consisting only of those crates where the need was sufficiently high to justify the burden of using rental at all.

4 Likes

My understanding of the bytes types is that its cheap to split them up, because they used a shared reference count among subslices. Why didn’t you do that? Do you have benchmarks showing that incrementing that refcount would be too expensive?

It’s not about cloning the bytes; it’s about passing around the parsed result, which points into the bytes. If I want to keep a refcount-owning bytes reference together with the parsed stuff and return them to a library caller, how do you propose I do that without self-references?

3 Likes

I’m saying that you’re using the bytes crate, which is designed to allow shared ownership among multiple slices into the same buffer, you shouldn’t need to store borrowing references at all. The subslices could just be fully-owning Bytes types.

If you ask “how can I have borrowing references into data owned in the same struct” yes of course you need self references, but that’s tautological. Nothing in your use case requires that you design your type that way.

He’s using nom. If it parses out a slice you now have a reference. How are you going to own the raw Bytes and the parsed references pointing into it in the same struct? The fact that you can clone the bytes doesn’t help because you need to keep the bytes instance that you parsed from alive if you want the references to stay valid. In other words, you’re anchored to a specific instance of it and no amount of cloning will help.

4 Likes

I think my comments aren’t being understood. The bytes crate was designed as a solution to exactly the kind of problem that leads people toward self-referential structs: you want to parse up data logically but keep it in the same contiguous block of memory. It enables this using a single reference count shared among pointers into slices into that backing buffer, so you can fairly cheaply get an owned Bytes which points to a subsection of the backing memory.

Thus, instead of having a BytesMut and a struct with pointers into the BytesMut which have fake lifetimes and unsafe code, the user could (if they can afford incrementing a counter a couple times) use a Bytes and a bunch of Bytes types inside the parsed Response struct.

My only point is that I don’t feel the bytes crate is being leveraged in this example, and that’s what I would do instead of manufacturing fake 'statics (which is veering into the range of “potentially UB”).

Maybe nom does not enable you to use these APIs on Bytes? I don’t know much about nom.


EDIT: Some of these responses seem like they think I’m suggesting you should be doing Rc<Bytes> or something? I’m saying that Bytes is a type which uses reference counting internally and presents an API designed to solve this problem by not having &'a references at all.

I think we understood exactly what you were suggesting, but considered the inevitable conclusion that nom would need to be forked or provide a complimentary non-borrowing API to be impractical in the general case. The entire issue here is exactly that when this problem is encountered, you’re dependant on your upstream to fix it and provide a different API, as you’re unable to encapsulate it yourself. This burdens any crate author with providing complimentary borrowing and non-borrowing APIs, which in many cases is simply impractical. In Alto I chose to abandon borrowing and provide only a refcounted API because the surface was so large and providing both was more effort than it was worth. Bifurcating the ecosystem along this axis is not an ideal outcome.

5 Likes

I might be mistaken but IIRC Bytes::clone is far from trivial and does some atomic operations in some cases (using an internal Arc). I’m not convinced that cloning it to get sub-segments in cases like, say, line-rate(ish) network packet processing is going to be unnoticeable in CPU terms.

I also don’t recall what exactly it does on drop - obviously it decrements the refcount but what other bookkeeping does it do? I’ll go look it up later. But, with lots of clones created for a slice-heavy parser I can see a drop storm kicking in when the parsed object is dropped.

I can’t really speak to nom authoritatively but I think what you’re suggesting boils down to the entire ecosystem knowing about Bytes and working with it. So any lib that does zero-copy parsing would need to also (in addition to working with borrowed data) understand how to, instead of creating slices, clone and keep owned Bytes.

Compared to builtin slices, I suspect some (maybe substantial) ergonomics and performance will be lost. I don’t know what the alternative language support would entail but I suspect it would be more performant and ergonomic.

3 Likes

Both of you raise valid points about why self-referential structs may be preferable in the abstract (e.g. bifurcated APIs, possibly significant performance differences). But I wouldn’t want users taking out credit on a feature which today is extremely hypothetical. Safe self-referential structs are not a near term feature, whereas crates like bytes exist today.

3 Likes

That’s fair. All I really wanted in the near term is acknowledgement that the use cases are real and that it’s a problem worthy of a solution in keeping with rust’s zero-cost abstractions spirit. Any specifics beyond that can attend to themselves in time. I’m willing to maintain rental for the foreseeable future until a better solution emerges.

6 Likes

I can’t really speak to nom authoritatively but I think what you’re suggesting boils down to the entire ecosystem knowing about Bytes and working with it. So any lib that does zero-copy parsing would need to also (in addition to working with borrowed data) understand how to, instead of creating slices, clone and keep owned Bytes.

I agree-- I think, as things work today, parsing libraries should know and be compatible with the bytes crate. It should be possible to make libraries generic on the Buf/BufMut traits, which would allow them to accept and return types that can contain either a reference to a buffer or an owned Bytes/BytesMut.

I don't know, maybe. I'd like to see what that would look like. Say the parser today returns a &str. With the bytes crate, it would instead return a Buf that is the same subslice that &str references. But probably what you want it to return is AsRef<str> (or more generally, AsRef<[u8]>). So to that end, the Bytes type itself would not be surfaced in the return types. But I've not thought this through fully. It seems like some abstraction over owned vs borrowed underlying types could be made.

The performance concerns still stand, of course. But I take @withoutboats's point that it's moot to compare this performance vs support in the language, particularly since we don't even have a design for the language support (let alone the implementation and its performance implications).

But I'm with @jpernst - I'd like folks to acknowledge that this is a real problem (fortunately not too widespread overall but unfortunately a difficult one), and to continue brainstorming of how it might be solved at the language level. I'm hopeful that we don't have to resort to a library solution for what's otherwise a language shortcoming.

6 Likes