Improving self-referential structs


#61

We run into this same issue with gimli where we want to do zero-copy parsing of many DWARF things, but then run into self-referential issues when we want to bundle the raw mmap of the debug info with its parsed form (say to stick them in a filename -> debug-info hashmap together).

We ended up creating a Reader trait that is similar to byte's Buf and BufMut traits. One can implement Reader with reference counting behind the scenes, but we just provide an implementation for slices0. It works OK, but now the whole code base is generic over a Reader and it feels a lot more complicated than it did with raw references.

cc @philipc


With cpp_demangle, I saw this issue coming before I started writing any code and there is a single nice place to encapsulate all generic ownership (where as with gimli there is no clear single place that would be the thing that owns the data or not, it would need to be ref-counted).

I ended up making demangled symbols generic over some T: AsRef<[u8]> and then the internal parsed AST items store indices into the input rather than slices or anything like that:

/// A mangled symbol that has been parsed into an AST.
///
/// This is generic over some storage type `T` which can be either owned or
/// borrowed.
pub struct Symbol<T: AsRef<[u8]> {
    raw: T,

    // These structures store indices pointing into `raw.as_ref()` rather than
    // slices or reference-counted views..
    substitutions: subs::SubstitutionTable,
    parsed: ast::MangledName,
}

/// A `Symbol` which owns the underlying storage for the mangled name.
pub type OwnedSymbol = Symbol<Vec<u8>>;

/// A `Symbol` which borrows the underlying storage for the mangled name.
pub type BorrowedSymbol<'a> = Symbol<&'a [u8]>;

So, I’d like to conclude that:

  1. Yes, this is totally a real problem, and not just in generators and async IO domains.

  2. There are ways to work around this issue. Nevertheless, they are relatively painful for library authors, and require more effort than otherwise.

  3. If a library’s author doesn’t go through that effort, then library consumers are hosed if they need owned versions.


0 Well slice and endianity.


#62

Could there be some rules of thumb obtained from the experience of working with graph-like data structures? I’d expect them to be something like this:

  • If the whole thing is immutable and Send/Sync, access nodes with statically checked borrowing. Access to the parent chain could be passed into recursive processing calls (visitor pattern or otherwise) with something like struct<'a> ParentPath(Vec<&'a Node>).
  • If mutable, but not Sync, implement internal links with Rc/Weak and provide a RefCell-like node mutator API.
  • Mutable and Sync: it’d be nice to learn why such a design should be chosen in preference to combining the other two with conversion traits/methods between them.

#63

Or even without dynamic allocation, though that would make multi-level access cumbersome:

struct ParentChain<'a> {
    node: &'a Node,
    parent: Option<&'a ParentChain<'a>>
};

#64

I’m not sure the mutability or immutability of the structure has much of an impact on the self-referentiality problem, or at least that hasn’t been the case in my experience with rental so far. By a vast majority, most uses of rental use the shared-borrowing flavor.

If an API returns a type to you that is parameterized by lifetime in any way, then your possible uses of that type are constrained in the manner described above. Just using Rc or Arc in all cases, or being generic over some kind of vocabulary trait that you need to define and agree upon, is the only way to be sure that your library can be adapted to the widest range of use cases.

On a more meta ecosystem level, I’m just very sceptical that, even if a convention did exist that would alleviate this problem, that it would achieve universal buy in from all crates, particularly when that workaround requires significantly more effort on the part of the author. And when the failure mode is that your crate is possibly rendered unusable, I believe the cost is too high to just leave it to the ecosystem to work around this issue. However, I acknowledge that this is a matter of opinion.


#65

Hey, I am really new into Rust so I am still catching up with its language design. But I’ve got some ideas regarding this topic. These ideas are inspired by C++'s move constructor semantics.

Here is basic pseudocode which shows the new ideas.

struct Data {
	a: u64,
	b: u64,
	c: u64
}

struct A {
	a: Data,
	b: &'ref(a) u64 // New syntax here: ref-lifetime (1)
}

impl A {
	fn new() -> Self {
		A {
			a: Data { a: 1, b: 2, c: 3 },
			b: &.a.a // New syntax here: self-init-ref (2)
		}
	}
}

// A new built-in trait for custom move semantics after default move semantics
// (copying fields with no ref-lifetimes)
trait RefMove {

	// This method is called after default move semantics (memcpy of fields with no ref-lifetimes)
	// has occured. The state of the object is still incomplete prior `ref_move`, this has the consequence
	// that `ref_move` *has* to initialize all `ref-lifetime` fields.
	// Note: A new syntax could be introduced to show the incompleteness of an object initialization.
	fn ref_move(&mut self);
}

// Implement custom move semantics for A
impl RefMove for A {
	fn ref_move(&mut self) {
		// This method *must* complete the initialization of `self`

		// self.b // this produces a compiler error as `b` has not been initialized yet

		self.b = &self.a.a; // The initialization of self is not complete
	}
}

So, what we have here are several new things, two syntax additions and a special magic-compiler trait.

The first syntax addition (1) is called ref-lifetime and generates a new lifetime which is constrained by its operand.

The second syntax addition (2) is called self-init-ref (I am not really good with names) and its syntax is only valid in a struct initialisation expression. With it you can access previously initialised fields with a dot before its field-name.

Example:

A { a: 1, b: .a } In this example b would be initialized with the already initialized a field.

The last addition would be the RefMove trait (See the above pseudocode).


Immovable types and self-referencing structs
#66

This would probably have to be an unsafe trait.


#67

In case anyone here missed it: