One of mainstream approaches to self-referential structs problem is introduction of immovable types. I would like to propose a slightly different, but compatible approach. (probably it was already proposed by someone, so I will be grateful for any links on previous related discussions) For additional context read this thread.
We’ll start with introduction of Move
trait, which is probably quite similar to C++ move constructor:
trait Move: Sized {
unsafe fn _move(&mut self, offset: isize);
}
default impl<T: Sized> Move for T {
unsafe fn _move(&mut self, offset: isize) {
let src = self as *const Self;
let dst = src.offset(offset) as *mut Self;
ptr::copy_nonoverlapping(src, dst, mem::size_of::<Self>());
}
}
Now (with some compiler magic) we will be able to write the following code:
struct Foo {
array: [u8; 16],
item: &'array u8,
}
impl Move for Foo {
unsafe fn _move(&mut self, offset: isize) {
// copy everything to `dst`
// apply `offset` to the reference in `item` field to keep it valid
}
}
We use 'array
lifetime to notify compiler that array
field can not be accessed through safe code. Without re-implementing Move
trait compiler will forbid use of field named lifetimes. This way struct author takes full responsibility for maintaining validity of all internal references. In case of heap allocated buffer Move
trait will contain only copy_nonoverlapping
, as there is no need to change reference in the field.
Additional nice point is that Move
trait will allow code to reliably zero out memory after move which is especially important for cryptographic applications:
type SecretKey([u8; 64]);
impl Move for SecretKey {
unsafe fn _move(&mut self, offset: isize) {
// copy data to `dst`
// zero out `src` using `ptr::write_volatile`
}
}
But there is still a problem (mentioned for example by @withoutboats here), our struct can contain other types parameterized over lifetimes, e.g. Iter<'a, u8>
and they can go arbitrarily deep. Because it’s a foreign type we cannot maintain it’s internall references in our implementation of Move
. Thus we need some interface to allow us to keep validity of references in it. For it we can introduce OffsetRefs
trait (name is temporary):
trait OffsetRefs {
const N: usize;
unsafe fn offset(&mut self, offsets: [isize; Self::N]);
}
N
here is equal to a number of internal references which this type keeps internally and in the most of cases will be equal to a number of lifetimes it parameterized over. So if Iter<'a, T>
will implement this trait we’ll be able to write:
struct Bar {
array: [u8; 16],
iter: Iter<'array, u8>,
}
impl Move for Bar {
unsafe fn _move(&mut self, offset: isize) {
// copy data to `dst`
// call `_move` on fields if necessary
// after move call `(*dst).iter.offset([offset])`
}
}
struct Baz<'b> {
a: &'b [u8],
array: [u8; 16],
field: G<'array, 'b>,
}
impl Move for Baz {
unsafe fn _move(&mut self, offset: isize) {
// copy data to `dst` with separate
// call `_move` on fields if necessary
// after move call `(*dst).field.offset([offset, 0])`
}
}
impl OffsetRefs for Baz {
const N = 1;
unsafe fn offset(&mut self, offsets: [isize; 1]) {
// shift `self.a` to `offsets[0]`
// call `self.field.offset([0, offsets[0]])`
}
}
Hopefully in the most cases it will be possible to auto-derive Move
implementation, so most of authors will not have to implement this trait manually.
The big drawback of this proposal is that moves can result in arbitrary code execution, but on the other hand it provides full control to the programmer over how data is moved, which in some cases can be quite useful. But of course manual implementation of Move
should be highly discouraged and used only in the absolutely necessary cases. Another drawback is that it will require to implement OffsetRefs
trait for types parameterized over lifetimes on non-heap allocated fields, otherwise it will not be possible to use them in self-referential context without introduction of immovable types.
Unresolved questions:
- Is there any corner or subtle cases which can not be soundly implemented by this approach?
- To what extent
Move
andOffsetRefs
can be autoderived? Is it possible to rely only on autoderivation and forbid manual implementation altogether? - How should look construction of self-referential structs?