We run into this same issue with gimli where we want to do zero-copy parsing of many DWARF things, but then run into self-referential issues when we want to bundle the raw mmap of the debug info with its parsed form (say to stick them in a filename -> debug-info hashmap together).
We ended up creating a Reader trait that is similar to byte's Buf and BufMut traits. One can implement Reader with reference counting behind the scenes, but we just provide an implementation for slices0. It works OK, but now the whole code base is generic over a Reader and it feels a lot more complicated than it did with raw references.
With cpp_demangle, I saw this issue coming before I started writing any code and there is a single nice place to encapsulate all generic ownership (where as with gimli there is no clear single place that would be the thing that owns the data or not, it would need to be ref-counted).
/// A mangled symbol that has been parsed into an AST.
///
/// This is generic over some storage type `T` which can be either owned or
/// borrowed.
pub struct Symbol<T: AsRef<[u8]> {
raw: T,
// These structures store indices pointing into `raw.as_ref()` rather than
// slices or reference-counted views..
substitutions: subs::SubstitutionTable,
parsed: ast::MangledName,
}
/// A `Symbol` which owns the underlying storage for the mangled name.
pub type OwnedSymbol = Symbol<Vec<u8>>;
/// A `Symbol` which borrows the underlying storage for the mangled name.
pub type BorrowedSymbol<'a> = Symbol<&'a [u8]>;
So, Iād like to conclude that:
Yes, this is totally a real problem, and not just in generators and async IO domains.
There are ways to work around this issue. Nevertheless, they are relatively painful for library authors, and require more effort than otherwise.
If a libraryās author doesnāt go through that effort, then library consumers are hosed if they need owned versions.
Could there be some rules of thumb obtained from the experience of working with graph-like data structures? Iād expect them to be something like this:
If the whole thing is immutable and Send/Sync, access nodes with statically checked borrowing. Access to the parent chain could be passed into recursive processing calls (visitor pattern or otherwise) with something like struct<'a> ParentPath(Vec<&'a Node>).
If mutable, but not Sync, implement internal links with Rc/Weak and provide a RefCell-like node mutator API.
Mutable and Sync: itād be nice to learn why such a design should be chosen in preference to combining the other two with conversion traits/methods between them.
Iām not sure the mutability or immutability of the structure has much of an impact on the self-referentiality problem, or at least that hasnāt been the case in my experience with rental so far. By a vast majority, most uses of rental use the shared-borrowing flavor.
If an API returns a type to you that is parameterized by lifetime in any way, then your possible uses of that type are constrained in the manner described above. Just using Rc or Arc in all cases, or being generic over some kind of vocabulary trait that you need to define and agree upon, is the only way to be sure that your library can be adapted to the widest range of use cases.
On a more meta ecosystem level, Iām just very sceptical that, even if a convention did exist that would alleviate this problem, that it would achieve universal buy in from all crates, particularly when that workaround requires significantly more effort on the part of the author. And when the failure mode is that your crate is possibly rendered unusable, I believe the cost is too high to just leave it to the ecosystem to work around this issue. However, I acknowledge that this is a matter of opinion.
You want a setup() method for your integration test. From checking several popular github project the idiom is to return a Struct e.g TestEnv from such a method owning the precondition instances. This is just not possible to do in one struct, if one of those preconditions has a reference.
Concrete example. Lets say you are implementing a rest client to a http api with reqwest crate. Lets call it CatsClient. reqwest is managing its own connection pool and you are supposed to init it once and pass a refernce to the Cats Client
reqwest::Client can be instantiated with default headers and perhaps soon also with a base_url.
For testing each of the CatsClient endpoints, you would like to have some default reqwest::Client and also a CatsClient instantiated with some default values for convenience.
Address of let http and http: http is different, so the reference in cats_client points to an old invalid location. The address also changes every time you move the cats_client, and Rust by design wants to move only using memcpy without having to fix internal addresses of objects.
Even if that was fixed (e.g by using Box::new(http)), it would be unsafe, because Rust would allow assignment to testenv.http = new_object that destroys the old object, making reference in CatsClient invalid again.
You have to wrap the HTTP client in Rc<Client> to have it in two places.
Not sure if thatās suitable for you, but you can ādisentangleā the owned values from the references, like this: https://play.rust-lang.org/?version=stable&mode=debug&edition=2015&gist=3f59786cc2a59f68e440742866573c70. Thereās a struct IntermediateEnv (sorry, bad at naming here), that holds the owned values and is constructed first, and then TestEnv can just hold references to it. It doesnāt seem totally usefull to me, but might just be enough to get your testing done.
struct MyType {
f1: i32,
f2: ^i32 // A weak reference
}
let mut x = MyType { f1: 42, f2: ^f1 };
let f = x.f2;
let y = ^f.clone(); //^f promotes f to either &i32 or &mut i32
//Explicitly promote f to &mut i32, only type check when x is mut
let mut v:&mut i32 = ^f;
//f is still valid as it is still "week", and this is a "copy", not a "move"
let g = f;
//error: immutable borrow whilst mutable borrow in scope
let u: &i32 = ^f;
//error: send a weak reference to a function alone, without the full object
std::mem::drop(f);
//OK, we send both the object and the weak reference
(|v,_| std::mem::drop(v))(x,f);
Reasons behind:
When initialize self-referential structs, we donāt want the usual borrow rules apply - this gives us maximal flexibility to create the struct.
When using those self references however, we need to apply the borrow rules to ensure safety.
So this design is to separate these requirements - by introducing a new reference type - the āweakā reference
A weak reference can be promoted to usual references and enforce borrow rules at the promotion site
A week reference can only be passed to another function as arguments with its owner.
where the ast stores slices that are relatively costly to compute, so Iād like to either compute them lazily or eagerly but only once (thus the āstore the String and add get_ast methodā will not work).
A quick and easy way out would be to replace the &strs with (usize, usize)s, and just index into source with them on demand.
Alternatively you could try the rental crate mentioned at the start of this threadāAFAIK itās still the state of the art in actual self-referential structs like this.
Or, you might try separating the String and the AST into separate structs and just letting the consumer of the API deal with it.
This is kind of what I'm doing now, but I must say that it feels quirky to force people to do:
let sources = vec![];
let bundle = L10nBundle::new();
for path in paths {
sources.push(read_file(path));
}
for source in sources {
let res = FluentResource::from_string(source);
bundle.add_resource(res);
}
bundle should not be storing the Strings because in practice many bundles are constructed out of overlapping FluentResource structs, so it's better to have a cache of parsed FluentResource structs and make bundle production cheap.
But then I have this beautiful struct that is exactly what I think I need to feed it a String and get the AST, except that it doesn't work in Rust. Aggghhh
It should, yes. Unfortunately, adding a lifetime param to your struct as above does not ensure that. The lifetime parameter isnāt deduced from any of the parameters of the from_string method, so the compiler has no idea what lifetime to use, and has to more or less conjure one from thin air. This is called an āunboundedā lifetime and is extremely dangerous when used with unsafe code, since the compiler has no idea where the lifetime comes from and thus canāt ensure that itās used correctly. 'static is a slightly better choice in that it wonāt arbitrarily bind to anything, but itās still a āfalseā lifetime, and can lead to UB if the API around it isnāt carefully controlled.
Dealing with all of this mess is what rental is for, so if you want to create such a struct, I strongly recommend using rental, as it will expose a safe API for the struct and you can be assured that the data will not be misused.
You can just as easily write a wrapper around the Vec<(usize, usize)> version to get &strs out, and the only performance impact is the cost of adding the index offset to the address of the string. I doubt its worth the considerable safety complexity to get rid of that addition.
(That is to say, if you store it as a vector of index/len pairs, &self.resource[tuple.0][...tuple.1] is literally just the string addr added to the first member of the tuple, and the optimizer can probably figure that out. Even if it can't, that's not a very expensive bit of code. I'd want real benchmarks before I start pretending something has a 'static lifetime)
I mean, this is a variant lifetime, so 'static is not safer - its equally unsafe. But the code with a new lifetime parameter is definitely not safer than using 'static, and it just puts a lifetime in the API unnecessarily.