Reading up on pointer provenance got me thinking about non-usize sized pointers and relative pointers - again. This time not with the goal of having self referential structs [1] but just to allow improving the memory layout. Rust already does something like this in the background at compile time: Every pointer exists of an origin (original) pointer to the allocated area and an offset into it. At compile time these are combined to a single address which is stored in memory.
But what if we (for a different pointer-like type) don't do this at compile time? When allocating something we get an address from the allocator/OS, which is combined with the offset 0 [2] to form the address that is stored in-memory. This is the case for Vec<T>
, [T]
and most other collections. We'd end up with the original pointer ("Anchor") stored as an address/pointer in-memory and the offset (as a usize or smaller) stored separately ("offset"/"index"). This is already done for arrays, exposing the offset as a "usize" [3], which can be stored in a smaller integer size (e.g. u32) to improve the memory layout. Additionally collections like Vec<T>
can be resized and are thus moved, making the use of normal *T
and &T
even less practical.
Therefore many suggest storing references to entries of a large collection as indicies into the array, which is unfortunate because you loose a bunch of guarantees done by the borrow checker. Having a @T
offset into a collection/allocation with pointer-like semantics + being bound to the allocation may improve that situation.
An way to offset into multiple arrays at the same time (i.e. not being bound to a single array/allocation) could improve the usability of this further, until then it would still be possible by casting Offset<Size,T>
to Size
and using normal index-based addressing, thus opting out of the guarantees Offset
has.
fn move<T>(value: T) -> T {
value
}
fn main() {
let mut data: vec![0,1,2,3,4,5,6,7,8,9]; // Lifetime: 'data
// Storing references
let a: &u8 = data[5]; // What we have now, borrows data: 'data
dbg!(a);
// Storing indicies
let a: usize = 5; // not bound to data or its lifetime
dbg!(data[a]);
// Storing offsets
// Stored like the index, but contains borrow checking like a
// reference, thus this cannot outlive `data`. For now a reference
// to the entire Vec, but could be done using references to
// individual items if supported (different topic).
// I'm not sure if the `[i]` syntax would work with this,
// hence I'm using a separte function.
// We likely need this `apply_offset` because `a` alone (at runtime)
// isn't sufficient to get to the real address.
let a: @u8 = data.offset(5); // Short-hand for `Offset<usize, u8>`
dbg!(data.apply_offset(a));
// Storing offsets in smaller types to improve size in memory.
let a: Offset<u16, u8> = data.offset(5);
dbg!(data.apply_offset(a));
// Due to being relative to the address of data you can even resize
// the vector, while offsets exist, but this is something you may not
// want, especially reducing the size while larger offsets exist.
data.resize(100, 0);
dbg!(data.apply_offset(a));
}
The example above could be done in a crate, but unless I'm mistaken that implementation wouldn't be able to ensure that the offset goes into the correct collection and thus wouldn't provide much benefit over using normal indicies. You could index into another array that still has a valid lifetime. Additionally there probably would be issues with the lifetime when resizing, which may not exist when this feature is supported by the std lib and the compiler itself.
The ability to resize does of course mean that an Offset
may be out-of-bounds and using it would return an Error or panic. Avoiding this may require per-item lifetimes or similar, but the same issue already exists for index-based references. This could perhaps be solved by having a "strong" and a "weak" variant, with weak allowing the allocation size to shrink while any "strong" offset allows moving and increasing the allocated size but not decreasing it. This would almost certainly not be possible in a crate, as it requires a different kind of borrow (or two).
Don’t get me wrong, we could probably have offset-based types. You could imagine an
@T
(no lifetime necessary) being an offset to aT
inside the containing struct. But this doesn’t do anything to support generators, which use regular code with references and then compiles it a self-referential struct, and needs to support arbitrary reference-based code.
As an aside: This may even allow/support self referential structs (if the lifetimes work out) in the future, as contrary to async and generators the code using this is aware of the existence of these.
What are your thoughts? Would this be useful to have, does it even make sense, do I have a wrong mental model of pointer provenance, should something like this be in the language, std or a crate (if possible)?