Summary
The goal here is to allow a reference to a collection of owning smart-pointer-like types to be transmuted to a reference to a collection of borrows in constant time.
This consists of:
- An unsafe marker trait (
RefTransmute<T>) which indicates when a smart-pointer-like Ptr<T> can be transmuted to &T or to &[T].
- Implement this for
Box<T>. Modify the implementation of other types such as Arc<T>, Rc<T>, Vec<T> and String to allow this marker to be implemented for them.
- Implement
Borrow for container types such as Vec, HashMap, BTreeMap, HashSet, LinkedList, etc where the contained types implement RefTransmute, to allow constant time transmute from an owning structure to a borrowing structure
Rationale
In languages such a C or C++, it’s common for a collection to consist of pointers to the collected data (be it hash keys, values, set entries, list members, etc). The collection may own that data or be borrowing it - but either way, a user of the collection can borrow data from it without knowing or caring about the distinction (so long as it careful with lifetimes). This is possible because raw pointers don’t make the distinction between borrowing and ownership.
In Rust, the distinction between &Vec<Box<u32>> and &[&u32] is very apparent by their types. This means that a function which accepts a &[&u32] cannot be called directly with a &Vec<Box<u32>> or &[Box<u32>] - it would require an O(n) process to generate a Vec<&u32>, and return a &[&u32] slice from it.
However, the underlying representations of &[Box<u32>] and &[&u32] are the same - they’re all just pointers to u32. It is therefore safe to transmute from one to the other, so long as the lifetimes are properly preserved.
This applies to more complex structures than a Vec - a &HashMap<&u32, Box<Foo>> can be transmuted to &HashMap<&u32, &Foo>, for example.
In fact it can be extended to transmute vectors to slices, or String to str. Using the Borrow trait, it allows a lot scope for zero-cost impedance matching of types across interface boundaries.
RefTransmute Trait
pub unsafe trait RefTransmute<T> {}
This marker trait indicates which types can be transmuted into reference form. Specifically, when RefTransmute<T> is implemented for a pointer-like type Ptr<T>, then it means an instance of Ptr<T> can be directly transmuted to &'ptr T, where 'ptr is tied to the lifetime of the Ptr<T> instance.
As an extension, it can also be used to map from a vector-like type to the representation of a slice - Vec<T> to &[T], for example.
It is only a marker and has no methods - it simply says the transmute is possible, but does not implement it.
It is trivially implementable for references:
unsafe impl<'a, T> RefTransmute<T> for &'a T {}
(Question: Does this actually get the lifetimes right?)
RefTransmute for std types.
Box<T>
This is the simplest case: Box<T> already satisfies the condition for RefTransmute<T>, and so requires no further changes:
unsafe impl<T> RefTransmute<T> for Box<T> {}
(It defacto true that the representation of Box<T> and &T are the same now; this adds a hard constraint that it remain true.)
Arc<T> / Rc<T>
Arc<T> simply contains Shared<ArcInner<T>>. Shared is simply a wrapper around a raw pointer, so Arc overall has the same representation as a pointer. Unfortunately, the pointer is to ArcInner<T>, which does not point directly to the payload T - in fact it must be the last element as it may be a DST.
We can solve this if instead of pointing at the start of ArcInner, we always point Arc at the address of the payload ArcInner.data, and access the weak and strong counters via negative offsets with raw pointer arithmetic.
Doing this allows Arc<T> to implement RefTransmute<T>.
Rc<T> can be handled using the same technique.
Vec<T>
If the layout of Vec<T> can be constrained so that the prefix of the vector is the same representation of [T] (ie, pointer and number of valid elements), then simple AsRef can be implemented as a no-op transmute. As far as I can gather, the current layout is pointer, capacity, len, so it would need to be rearranged. This is awkward because of an inner type RawVec.
This would allow - say - &HashSet<Vec<u32>> to be transmuted to &HashSet<&[u32]>.
If T itself is a RefTransmute smart pointer, then a vector of smart pointers can be transmuted to a slice of references.
String
Similar to Vec<T>, if the memory layout of String is constrained to be the same as str (ie, [u8]), then a String can be transmuted to &str. This would allow &HashSet<String> to be transmuted to &HashSet<&str>.
Implement Borrow
Once we have a useful pool of types which implement RefTransmute, we can make use of them by implementing Borrow for the collection types.
This allows a function a function to declare:
fn wantset<H: Borrow<HashSet<&str>>(hash: &H) { ... }
to be called with
let h: HashSet<String> = makeset();
wantset(&h);
Normal collections such as maps, sets, lists will have the form:
impl<PK, PV, K, V> Borrow<HashMap<&K, &V>> for HashMap<PK, PV>
where PK: RefTransmute<K>,
PV: RefTransmute<V>,
{
fn borrow(&self) -> &HashMap<&K, &V> {
unsafe {
mem::transmute::<&HashMap<PK, PV>, &HashMap<&K, &V>>(self)
}
}
}
Rust types which have explicit borrowed counterparts will have multiple implementations:
impl<PT> Borrow<Vec<&T>> for Vec<PT> where PT: RefTransmute<T> {
// ...
}
impl<PT> Borrow<[&T]> for Vec<PT> where PT: RefTransmute<T> {
// ...
}
Open Issues
- Split
RefTransmute<T> into RefTransmute<T> and SliceTransmute<T>?
- Is it possible to meaningfully implement
RefTransmute for non-std types, esp for slices? Or does it necessarily require dependence on implementation details of slice?
- The
Borrow implementation seems to always conflict with a default one - can this be solved with specialization?
- Is
Borrow the best trait to use for this?
- Are there any cases where a collection could behave differently with a ref form vs an owned form?
- Are there any interactions with
Any or other library/language features?
- What other constraints are there on the layout of
Vec<T> and String (or other types) which would prevent the layouts proposed here?
- What haven’t I thought of?