Internal references as a separate type

It would basically just have to be an offset, as Rust (safe and unsafe) code is written and optimized for the fact that moves are memcpys.

3 Likes

I don't see why the past has to dictate the future. I've worked on and modified compilers since the mid-1960s. Adding the code I describe in 2) in my above post is less impactful to code generation than the alternative of adding the code I describe in 3) above. However, alternative 2) could have greater impact on memory footprint and on performance, depending on the way the code is written.

1 Like

Rust promises that moves are just memcpy. This is pretty fundamental, I don't think it can be changed.

Doing something else would either break compatibility with Rust 1.x (in a subtle way that can cause bugs in unsafe code), or require an explicit opt-in like T: MoveWithFixup, but that would be cumbersome to use and incompatible with existing code.

4 Likes

Breaking the assumption that memcpy is sufficient to move a value would break existing libraries that use unsafe code internally that relies on this assumption, violating Rust's commitment to stability.

We can't rely on the compiler emitting any sort of code to fix up objects after relocation, because libraries can already perform moves in unsafe or FFI code, in ways that are opaque to the compiler.

(This isn't dictated by "the past," but by code that exists now, in the present, and will continue to exist in the future.)

11 Likes

"Past dictating the future" is an unnecessarily strong and negative way of putting it. The thing is, a lot of code (unsafe as well as "just" safe code with related logical invariants) already relies on moves being bitwise copies, so just nilly-willy allowing this to break would have a massive impact.

I don't think we should take the "if you don't support this, you are holding back progress" viewpoint, as the benefits are not at all that clear.

1 Like

Replying to at least the above three posts:

If memcpy is guaranteed to produce a bit-identical value then no form of modification can occur on clone or moving an object in memory. End of that discussion.

I personally have never seen a need for internal references as a type—in theory getters and setters should be sufficient to generate a reference to a subpart of an object, and to compute and store the offset to that subpart relative to the starting address of the object. Safe Rust is somewhat deficient in the latter aspect, as threads in IRLO about computing offsets into an object have shown. IMO that's where the effort is needed, not in somehow storing "internal" references to an object within the same object.

With respect to 'lifetimes, clearly the getter produces a reference with the same 'lifetime as the base object. The setter computes and stores an offset, to which a 'lifetime per se does not apply.

3 Likes

clone is different from moving and is not guaranteed to be memcpy for non-Copy types. But yeah, no form of modification can occur when moving an object, and I agree it would be a bad idea to try to change that.

But that doesn't mean we can't have types with internal references. We already do, in the form of async fns. The trick is to prohibit moving altogether, which is currently done in a hacky way with Pin.

4 Likes

This may be an ignorant question, but I have long wondered if reference constraints in space (where an object resides) have ever been contemplated, in addition to lifetimes (?). Spitballing to illustrate what I mean, using @ to denote those constraints:

struct Foo {
    data1: Vec<i32>,
    data2: Vec<i32>,
    refd: & @self Vec<i32>,         // reference to something within the struct
    iter: Iter<'self, @self, i32>,  // location parameter in addition to lifetime
    outside: & @!self Vec<i32>,     // negative constraint: reference to outside Vec
}

fn alloca(size: usize) -> & @stack [u8] { ... }

unsafe fn transmute_ref<@loc, 'lt, T, U>(r: & @loc 'lt T) -> & @loc 'lt U { ... }

fn swap<@x, @y, T>(x: &mut @x T,  y: &mut @y T) where @x != @y { ... }

IOW, allowing location constraints to be passed along like lifetimes are currently. Useful pre-defined location constraints other than @self could be @heap, @stack, @tls, ...

This is of course just some vague idea, and I'm not sure if those could transitively be proven in a general case - or whether it'd be all that useful to begin with.

I'm not understanding why a relative pointer would prevent moving, without changing, the containing structure (by mem-coopy). I understand moving out of the struct being an issue, but why wouldn't that be the same as any other shared or exclusive reference in that you can't move something that has references?

If a relative pointer is stored as an offset and materialized on demand - then there's no issue. If the relative pointer is stored as an actual pointer, then moving the object would invalidate the pointer.

Aside: a variation of the idea to have space constraints exists in Chapel language. Chapel is an HPC language that allows to write a program that is executed over multiple localities (e.g. a cluster) and it has a similar syntax to specify the location of an expression. Though I doubt that it fits within Rust's scope.

https://chapel-lang.org/docs/primers/locales.html for more details

1 Like

I really don't think so. A "Relative" pointer would be +/- relative offset from the location in memory of the pointer. Moving the structure that contains the "Relative" pointer would not change the +/- value.

1 Like

Isn't this basically just C++'s pointer-to-member at that point? I wrote a proposal for it a while ago (which I can't find). It seems like we might as well just add "reference-to-field" and build this "self-reference" feature as sugar on top of that.

(Now I'm left wondering if there's a reasonable way to define "reference to enum field" at all...)

I don't see anywhere in this post or in the follow-up comments where it is made clear what problem this would solve. I think it might be more useful to start with a problem statement (with a code example) that states a clear goal of what is trying to be achieved. Then show how this proposal could solve that problem with a code example. Also, explain why no current feature can properly solve this problem. Only then can there be a sensible discussion on implementation.

1 Like

Not sure.. but let me try:

struct Node {..}

struct Edge<'a> {
    start : &'a Node,
    end : &'a Node,
}

struct Graph {
    nodes : Vec<Node>,
    edges : Vec<Edge<'nodes>>,
}

Or

struct Node<'a> {
    prev : Option<&'a Node<'a>>,
    next : Option<&'a Node<'a>>,
}

struct List {
    nodes : Vec<Node<'nodes>>,
}

@comex, are any of these the existential lifetimes you referred to? Couldn't find much talk about them..

P.S. above structures could be easily created when one knows the number of nodes in advance; otherwise they seem a bit painful to construct or extend..

Yeah... the current way I would do that is by storing the indices but that is basically runtime pointers and are annoying.

1 Like

If List is a Vec of Nodes, why would you create a linked list of references rather than simply storing List-array indices in the prev and next fields of each Node?

The only time i've really thought about existential lifetimes is within the context of a generalization of streaming iterator where you can iterate over objects in batches.

Where streaming iterator the values lifetime is limited to the next iteration. In theory at least with existential lifetimes you could have a lifetime which is less than the iterators lifetime, but greater than the Iterator::Item lifetime. Presumably this requires an additional function to get at one of these marker objects, before a sequence of calls to next.

My GAT foo isn't strong enough to have attempted implement it, but it's the only thing that has caused me to think about existential lifetimes before.

I don't see any difference in the example given and the use of indexes into the vector of nodes. I can't see any benefit of this proposal from the example given.

It wouldn't. There's some crosstalk going on here. Basically, @Tom-Phinney suggested that in addition to relative pointers, another viable option was having real pointers which the compiler would relocate on move. This was shot down by others; in Tom's reply, he accepted it wasn't an option, then went on directly to dismissing the need to support internal references at all. I (perhaps overeagerly) interpreted that juxtaposition as implying that relocation-on-move was required to support internal references, so I responded pointing out the alternative of disallowing moves.

Anyway, relative pointers are a valid option that preserve the ability to move; they're just extremely limited, because both the pointer and the pointee must be directly within the same buffer, without any indirection on either side. You can't have e.g. a Vec of relative pointers, or a relative pointer into a Vec (unless the pointer is in the same Vec). That's why I think other approaches to self-references are more promising.

1 Like