Internal references as a separate type

As I understand it the issue with something like:

struct Foo {
    bar: i32,
    bar_ref: &i32 // points to bar in the same instance
}

Is that bar_ref is invalidated when moving a Foo. However, if bar_ref were a relative pointer, this would not occur. Instead of bar_ref containing the absolute memory address, it would contain "-4" meaning "backup 4 bytes from the location of the reference itself to find what it points to." You can sort of fudge this today, where if say bar is a slice, then you make bar_ref be an index rather than reference, but this still requires you to manually do the indexing, and only works for slices.

Obviously a lot of things would need to be bikeshed (how do you declare this separate kind of ref type, how would you call Foo's constructor) but I'm wondering if this is a sound approach? Has this come up before?

I was thinking maybe it could be a special lifetime:

struct Foo {
    bar: i32,
    bar_ref: &'internal i32 // points to bar in the same instance
}
1 Like

I think I can post some uses for such a feature. But is there anything that you cannot do without this?

This sort of offset-based self reference idea is very well-trodden ground and the only simple answer is "this only works in some of the simplest cases, and those cases have plenty of other adequate solutions."

For longer answers, I believe the "Offset-based solutions" section of https://without.boats/blog/async-ii-narrowing-the-scope/ is a good start.

7 Likes

I think there's far more than just bikeshedding here, since one would need to figure out how to borrowck it.

For example, if that's a new type, what does Vec<&'internal i32> mean?

1 Like
struct Foo {
    bar: i32,
    bar_ref: RelativeCell(-4),
}

edit: It's not as easy as I thought. RelativeCell can't have Deref, because existence of "independent" &mut foo.bar would cause aliasing.

It the cell could be moved out of Foo, and that'd be game over. So this creates a new kind of type that is unmoveable and can't be copied, except when it's part of another moveable/copyable type?

So I think you could only safely get such reference with:

foo.bar_ref.get(&foo)

and the borrow checker wouldn't understand the mutable equivalent.

2 Likes

…which seems very surprising and probably presents its own weird problems and inconsistencies.

This is a nice idea. We could even support moving bar or bar_ref out of the struct, adjusting the offset if necessary.

This example is a problem, though:

struct Foo {
    bar1: i32,
    bar2: i32,
    bar_ref: &'internal i32,
}

How does the compiler know which field is borrowed by bar_ref? The compiler has to know this in order to move the fields.

@Aloso I think you would just be prohibited from moving out of fields of any type T so long as there exists at least one internal ref to type T. So the Foo as a whole could be moved, but not bar1 or bar2 individually.

Correct, but presumably there would be a mechanism to convert the internal ref to an absolute ref with the lifetime of the containing struct. This preserves moving of the struct always being a dumb memcpy, it only prevents you from moving the field out. If we were ever comfortable with a not-just-memcpy for one type, you could make moves of internal refs become absolute refs, but that doesn't seem as rusty as requiring an explicit conversion somewhere.

@lxrec I have trouble following some of the points in that article because I'm still ramping up my rust knowledge. In principal I understand the concern of monomorphization, because you would definitely have to generate different code for internal references, and I can appreciate that the current design of the compiler might make it difficult if currently it is always assumed that lifetimes can be erased before code generation (although that just seems like an argument for using a distinct type rather than just a distinct lifetime?). But I don't understand the point about Iter being parameterized by either a internal reference or non-internal reference -- why would an iterator need to be concerned with both? Aren't iterators always external to the data structure, so they would always use regular references? Or is the concern something like this?:

struct Foo {
    x: Vec<i32>,
    y: Iterator<&'internal, Item=i32> // points into x
}
1 Like

Okay, time to get blunt then.

People very, very often make Rust language suggestions that bottom out at "let's have a magical 'dowhatimean lifetime" which are completely unimplementable because lifetimes aren't actually magic, and thus the suggestion has a giant unfillable hole in it that the proposer doesn't realize is a hole. In fact, the reason I don't have direct answers to the questions in your last post is because you're (unknowingly, of course) asking me to fill in the holes in your own proposal, while I'm pretty sure they can't soundly be filled (without resorting to obvious non-starters like autoboxing).

What lifetimes actually are is nothing more than a promise to the compiler that some values outlive other values, and the compiler is required to check that the lifetime relationships in a function signature or type definition match all implementations and use sites thereof. Whatever 'internal or 'self or the other suggestions are trying to say simply doesn't fit into this framework (y clearly does not "outlive" Foo or x, for example), and would have to involve brand new rules unlike anything that exists in the Rust language today, and nobody has ever proposed actual rules that would make these implementable features.

You could argue Pin and move constructors (which I think is what you're talking about here) are kinda close to being such a proposal, but the very fact that those involve no extensions to the lifetime system kinda proves the point that they are or would be separate mechanisms.

Now, that doesn't mean that nothing along these lines is remotely possible. Even the blog post I linked earlier flat out said that offset types are obviously a thing we could do. The cost-benefit analysis for them just doesn't make a lot of sense when you understand what they would and wouldn't actually achieve.

8 Likes

By the way, would this proposal solve any problems that aren't already solved by the rental crate?

1 Like

One possible use for this (not that it's the only or even best way to do this), would be to pass C++ std::string by value. I doubt any offset reference will be powerful enough to do what people want outside of a few niches cases tho.

Hmm, offset pointers might not be a bad idea, depending on use cases.

@lxrec I think you may have misunderstood my last post. I'm not asking you to fill in any design holes, I'm trying to get clarification of what the link you gave me is saying in its discussion of offset ptrs. The Iterator example they give seems like a possible minimal example to demonstrate what a can of worms this is, but as it is I can't tell what they're saying.

rental is no longer maintained. And while it's fairly brilliant in how it accomplishes as much as it does using only existing language features, the limitations of those features do force it to be, well, kind of ugly and underpowered. A real language feature would be much nicer.

1 Like

I wouldn't go that far. To create a self-referencing type using Pin today, you have a choice between creating an async fn, which avoids novel lifetime features by virtue of being implemented as compiler magic, or using unsafe code, which avoids novel lifetime features by being unsafe.

I've thought about this for a while and I think there are two semi-orthogonal features needed for safe, general self-references.

One is immovable types, which sort of exists today in the form of Pin (but could be made more powerful as a builtin feature). This one is separate from the lifetime system.

The other is existential lifetimes. Existential lifetimes would basically simulate what rental or owning_ref already accomplish using HRTBs on Fn traits, just without having to put your code inside a closure. By themselves, they would enable the same things that rental and owning_ref do: "self-referencing" types where the buffer being referenced is behind an owning pointer, so the address stays stable even if the containing type is moved. In other words, things like this (pseudocode): struct A { buffer: Box<i32>, reference: &'buffer i32 }. While this is definitely not as straightforward as it sounds, I think it is possible and probably would fundamentally look like an extension to the lifetime system.

Combining the two features would let you have a struct where one field points directly to another field of the same struct.

I suppose you might be able to add relative pointers on top of this scheme, but it's probably not worth it when they aren't enough for most real use cases.

Anyway, this is all off topic because the original post is about relative pointers. So I'm not attempting to prove here that existential lifetimes are viable, only making an unsubstantiated assertion. :slight_smile: I just don't agree that the mechanisms needed to enable self-references would be entirely separate from the lifetime system.

6 Likes

I don't really agree with this summary. In my understanding, a new offset-based type that coerces to a reference is probably pretty feasible. We've never really explored it in any thorough way.

What doesn't work is trying to rewrite references to offsets based on lifetime inference determining that the reference is self-referential. That inference and rewrite is infeasible for us right now. However, Niko has mentioned that polonius's analysis may make it more feasible some day in the future.

But a new separate type let's say @T, which has no lifetime and is just a relative address into the same type, seems much more in reach. Further investigation would be needed. (Including investigating whether it would carry its weight in terms of user complexity!)

6 Likes

Would there be a practical difference between storing an offset (as suggested in this thread) vs. storing a reference, that gets recalculated whenever the struct is moved?

In my mind, the latter approach perfoms theoretically better, if the number of struct moves times the number of self-referential fields in the struct is less than the number of self-referential field accesses. Practically, I don't know if the impact of the additional addition involved when calculating the address has any measurable impact on performance.

If not, the only difference might be when it comes to implementing such a feature and one might be easier to do than the other, but my knowledge about the internals of Rust is nonexistent.

  1. The stored reference would be a usize; the offset could be much shorter. This could be significant if the offset was u8 or u16 and the usize was u128.

  2. A physical move of the object in memory or a clone would just memcpy the offset, whereas extra code would need to be emitted immediately after the memcpy to relocate the reference before the relocated object was available for use.

  3. The reference could be used as is, whereas relocation code would need to be emitted each time the offset needed to be converted (de-relativized?) to a reference, and the inverse offset-computation code would need to be emitted each time an object self-reference needed to be converted to and stored as an offset.

2 Likes

It would basically just have to be an offset, as Rust (safe and unsafe) code is written and optimized for the fact that moves are memcpys.

3 Likes