Internal references as a separate type

@lxrec I think you may have misunderstood my last post. I'm not asking you to fill in any design holes, I'm trying to get clarification of what the link you gave me is saying in its discussion of offset ptrs. The Iterator example they give seems like a possible minimal example to demonstrate what a can of worms this is, but as it is I can't tell what they're saying.

rental is no longer maintained. And while it's fairly brilliant in how it accomplishes as much as it does using only existing language features, the limitations of those features do force it to be, well, kind of ugly and underpowered. A real language feature would be much nicer.

1 Like

I wouldn't go that far. To create a self-referencing type using Pin today, you have a choice between creating an async fn, which avoids novel lifetime features by virtue of being implemented as compiler magic, or using unsafe code, which avoids novel lifetime features by being unsafe.

I've thought about this for a while and I think there are two semi-orthogonal features needed for safe, general self-references.

One is immovable types, which sort of exists today in the form of Pin (but could be made more powerful as a builtin feature). This one is separate from the lifetime system.

The other is existential lifetimes. Existential lifetimes would basically simulate what rental or owning_ref already accomplish using HRTBs on Fn traits, just without having to put your code inside a closure. By themselves, they would enable the same things that rental and owning_ref do: "self-referencing" types where the buffer being referenced is behind an owning pointer, so the address stays stable even if the containing type is moved. In other words, things like this (pseudocode): struct A { buffer: Box<i32>, reference: &'buffer i32 }. While this is definitely not as straightforward as it sounds, I think it is possible and probably would fundamentally look like an extension to the lifetime system.

Combining the two features would let you have a struct where one field points directly to another field of the same struct.

I suppose you might be able to add relative pointers on top of this scheme, but it's probably not worth it when they aren't enough for most real use cases.

Anyway, this is all off topic because the original post is about relative pointers. So I'm not attempting to prove here that existential lifetimes are viable, only making an unsubstantiated assertion. :slight_smile: I just don't agree that the mechanisms needed to enable self-references would be entirely separate from the lifetime system.

5 Likes

I don't really agree with this summary. In my understanding, a new offset-based type that coerces to a reference is probably pretty feasible. We've never really explored it in any thorough way.

What doesn't work is trying to rewrite references to offsets based on lifetime inference determining that the reference is self-referential. That inference and rewrite is infeasible for us right now. However, Niko has mentioned that polonius's analysis may make it more feasible some day in the future.

But a new separate type let's say @T, which has no lifetime and is just a relative address into the same type, seems much more in reach. Further investigation would be needed. (Including investigating whether it would carry its weight in terms of user complexity!)

5 Likes

Would there be a practical difference between storing an offset (as suggested in this thread) vs. storing a reference, that gets recalculated whenever the struct is moved?

In my mind, the latter approach perfoms theoretically better, if the number of struct moves times the number of self-referential fields in the struct is less than the number of self-referential field accesses. Practically, I don't know if the impact of the additional addition involved when calculating the address has any measurable impact on performance.

If not, the only difference might be when it comes to implementing such a feature and one might be easier to do than the other, but my knowledge about the internals of Rust is nonexistent.

  1. The stored reference would be a usize; the offset could be much shorter. This could be significant if the offset was u8 or u16 and the usize was u128.

  2. A physical move of the object in memory or a clone would just memcpy the offset, whereas extra code would need to be emitted immediately after the memcpy to relocate the reference before the relocated object was available for use.

  3. The reference could be used as is, whereas relocation code would need to be emitted each time the offset needed to be converted (de-relativized?) to a reference, and the inverse offset-computation code would need to be emitted each time an object self-reference needed to be converted to and stored as an offset.

2 Likes

It would basically just have to be an offset, as Rust (safe and unsafe) code is written and optimized for the fact that moves are memcpys.

3 Likes

I don't see why the past has to dictate the future. I've worked on and modified compilers since the mid-1960s. Adding the code I describe in 2) in my above post is less impactful to code generation than the alternative of adding the code I describe in 3) above. However, alternative 2) could have greater impact on memory footprint and on performance, depending on the way the code is written.

1 Like

Rust promises that moves are just memcpy. This is pretty fundamental, I don't think it can be changed.

Doing something else would either break compatibility with Rust 1.x (in a subtle way that can cause bugs in unsafe code), or require an explicit opt-in like T: MoveWithFixup, but that would be cumbersome to use and incompatible with existing code.

4 Likes

Breaking the assumption that memcpy is sufficient to move a value would break existing libraries that use unsafe code internally that relies on this assumption, violating Rust's commitment to stability.

We can't rely on the compiler emitting any sort of code to fix up objects after relocation, because libraries can already perform moves in unsafe or FFI code, in ways that are opaque to the compiler.

(This isn't dictated by "the past," but by code that exists now, in the present, and will continue to exist in the future.)

11 Likes

"Past dictating the future" is an unnecessarily strong and negative way of putting it. The thing is, a lot of code (unsafe as well as "just" safe code with related logical invariants) already relies on moves being bitwise copies, so just nilly-willy allowing this to break would have a massive impact.

I don't think we should take the "if you don't support this, you are holding back progress" viewpoint, as the benefits are not at all that clear.

1 Like

Replying to at least the above three posts:

If memcpy is guaranteed to produce a bit-identical value then no form of modification can occur on clone or moving an object in memory. End of that discussion.

I personally have never seen a need for internal references as a type—in theory getters and setters should be sufficient to generate a reference to a subpart of an object, and to compute and store the offset to that subpart relative to the starting address of the object. Safe Rust is somewhat deficient in the latter aspect, as threads in IRLO about computing offsets into an object have shown. IMO that's where the effort is needed, not in somehow storing "internal" references to an object within the same object.

With respect to 'lifetimes, clearly the getter produces a reference with the same 'lifetime as the base object. The setter computes and stores an offset, to which a 'lifetime per se does not apply.

3 Likes

clone is different from moving and is not guaranteed to be memcpy for non-Copy types. But yeah, no form of modification can occur when moving an object, and I agree it would be a bad idea to try to change that.

But that doesn't mean we can't have types with internal references. We already do, in the form of async fns. The trick is to prohibit moving altogether, which is currently done in a hacky way with Pin.

4 Likes

This may be an ignorant question, but I have long wondered if reference constraints in space (where an object resides) have ever been contemplated, in addition to lifetimes (?). Spitballing to illustrate what I mean, using @ to denote those constraints:

struct Foo {
    data1: Vec<i32>,
    data2: Vec<i32>,
    refd: & @self Vec<i32>,         // reference to something within the struct
    iter: Iter<'self, @self, i32>,  // location parameter in addition to lifetime
    outside: & @!self Vec<i32>,     // negative constraint: reference to outside Vec
}

fn alloca(size: usize) -> & @stack [u8] { ... }

unsafe fn transmute_ref<@loc, 'lt, T, U>(r: & @loc 'lt T) -> & @loc 'lt U { ... }

fn swap<@x, @y, T>(x: &mut @x T,  y: &mut @y T) where @x != @y { ... }

IOW, allowing location constraints to be passed along like lifetimes are currently. Useful pre-defined location constraints other than @self could be @heap, @stack, @tls, ...

This is of course just some vague idea, and I'm not sure if those could transitively be proven in a general case - or whether it'd be all that useful to begin with.

I'm not understanding why a relative pointer would prevent moving, without changing, the containing structure (by mem-coopy). I understand moving out of the struct being an issue, but why wouldn't that be the same as any other shared or exclusive reference in that you can't move something that has references?

If a relative pointer is stored as an offset and materialized on demand - then there's no issue. If the relative pointer is stored as an actual pointer, then moving the object would invalidate the pointer.

Aside: a variation of the idea to have space constraints exists in Chapel language. Chapel is an HPC language that allows to write a program that is executed over multiple localities (e.g. a cluster) and it has a similar syntax to specify the location of an expression. Though I doubt that it fits within Rust's scope.

https://chapel-lang.org/docs/primers/locales.html for more details

1 Like

I really don't think so. A "Relative" pointer would be +/- relative offset from the location in memory of the pointer. Moving the structure that contains the "Relative" pointer would not change the +/- value.

1 Like

Isn't this basically just C++'s pointer-to-member at that point? I wrote a proposal for it a while ago (which I can't find). It seems like we might as well just add "reference-to-field" and build this "self-reference" feature as sugar on top of that.

(Now I'm left wondering if there's a reasonable way to define "reference to enum field" at all...)

I don't see anywhere in this post or in the follow-up comments where it is made clear what problem this would solve. I think it might be more useful to start with a problem statement (with a code example) that states a clear goal of what is trying to be achieved. Then show how this proposal could solve that problem with a code example. Also, explain why no current feature can properly solve this problem. Only then can there be a sensible discussion on implementation.

1 Like