One of mainstream approaches to self-referential structs problem is introduction of immovable types. I would like to propose a slightly different, but compatible approach. (probably it was already proposed by someone, so I will be grateful for any links on previous related discussions) For additional context read this thread.
We’ll start with introduction of Move trait, which is probably quite similar to C++ move constructor:
trait Move: Sized {
unsafe fn _move(&mut self, offset: isize);
}
default impl<T: Sized> Move for T {
unsafe fn _move(&mut self, offset: isize) {
let src = self as *const Self;
let dst = src.offset(offset) as *mut Self;
ptr::copy_nonoverlapping(src, dst, mem::size_of::<Self>());
}
}
Now (with some compiler magic) we will be able to write the following code:
struct Foo {
array: [u8; 16],
item: &'array u8,
}
impl Move for Foo {
unsafe fn _move(&mut self, offset: isize) {
// copy everything to `dst`
// apply `offset` to the reference in `item` field to keep it valid
}
}
We use 'array lifetime to notify compiler that array field can not be accessed through safe code. Without re-implementing Move trait compiler will forbid use of field named lifetimes. This way struct author takes full responsibility for maintaining validity of all internal references. In case of heap allocated buffer Move trait will contain only copy_nonoverlapping, as there is no need to change reference in the field.
Additional nice point is that Move trait will allow code to reliably zero out memory after move which is especially important for cryptographic applications:
type SecretKey([u8; 64]);
impl Move for SecretKey {
unsafe fn _move(&mut self, offset: isize) {
// copy data to `dst`
// zero out `src` using `ptr::write_volatile`
}
}
But there is still a problem (mentioned for example by @withoutboats here), our struct can contain other types parameterized over lifetimes, e.g. Iter<'a, u8> and they can go arbitrarily deep. Because it’s a foreign type we cannot maintain it’s internall references in our implementation of Move. Thus we need some interface to allow us to keep validity of references in it. For it we can introduce OffsetRefs trait (name is temporary):
trait OffsetRefs {
const N: usize;
unsafe fn offset(&mut self, offsets: [isize; Self::N]);
}
N here is equal to a number of internal references which this type keeps internally and in the most of cases will be equal to a number of lifetimes it parameterized over. So if Iter<'a, T> will implement this trait we’ll be able to write:
struct Bar {
array: [u8; 16],
iter: Iter<'array, u8>,
}
impl Move for Bar {
unsafe fn _move(&mut self, offset: isize) {
// copy data to `dst`
// call `_move` on fields if necessary
// after move call `(*dst).iter.offset([offset])`
}
}
struct Baz<'b> {
a: &'b [u8],
array: [u8; 16],
field: G<'array, 'b>,
}
impl Move for Baz {
unsafe fn _move(&mut self, offset: isize) {
// copy data to `dst` with separate
// call `_move` on fields if necessary
// after move call `(*dst).field.offset([offset, 0])`
}
}
impl OffsetRefs for Baz {
const N = 1;
unsafe fn offset(&mut self, offsets: [isize; 1]) {
// shift `self.a` to `offsets[0]`
// call `self.field.offset([0, offsets[0]])`
}
}
Hopefully in the most cases it will be possible to auto-derive Move implementation, so most of authors will not have to implement this trait manually.
The big drawback of this proposal is that moves can result in arbitrary code execution, but on the other hand it provides full control to the programmer over how data is moved, which in some cases can be quite useful. But of course manual implementation of Move should be highly discouraged and used only in the absolutely necessary cases. Another drawback is that it will require to implement OffsetRefs trait for types parameterized over lifetimes on non-heap allocated fields, otherwise it will not be possible to use them in self-referential context without introduction of immovable types.
Unresolved questions:
- Is there any corner or subtle cases which can not be soundly implemented by this approach?
- To what extent
MoveandOffsetRefscan be autoderived? Is it possible to rely only on autoderivation and forbid manual implementation altogether? - How should look construction of self-referential structs?