[Pre-RFC] constant time conversion of owning collections to borrowing collections


#1

Summary

The goal here is to allow a reference to a collection of owning smart-pointer-like types to be transmuted to a reference to a collection of borrows in constant time.

This consists of:

  1. An unsafe marker trait (RefTransmute<T>) which indicates when a smart-pointer-like Ptr<T> can be transmuted to &T or to &[T].
  2. Implement this for Box<T>. Modify the implementation of other types such as Arc<T>, Rc<T>, Vec<T> and String to allow this marker to be implemented for them.
  3. Implement Borrow for container types such as Vec, HashMap, BTreeMap, HashSet, LinkedList, etc where the contained types implement RefTransmute, to allow constant time transmute from an owning structure to a borrowing structure

Rationale

In languages such a C or C++, it’s common for a collection to consist of pointers to the collected data (be it hash keys, values, set entries, list members, etc). The collection may own that data or be borrowing it - but either way, a user of the collection can borrow data from it without knowing or caring about the distinction (so long as it careful with lifetimes). This is possible because raw pointers don’t make the distinction between borrowing and ownership.

In Rust, the distinction between &Vec<Box<u32>> and &[&u32] is very apparent by their types. This means that a function which accepts a &[&u32] cannot be called directly with a &Vec<Box<u32>> or &[Box<u32>] - it would require an O(n) process to generate a Vec<&u32>, and return a &[&u32] slice from it.

However, the underlying representations of &[Box<u32>] and &[&u32] are the same - they’re all just pointers to u32. It is therefore safe to transmute from one to the other, so long as the lifetimes are properly preserved.

This applies to more complex structures than a Vec - a &HashMap<&u32, Box<Foo>> can be transmuted to &HashMap<&u32, &Foo>, for example.

In fact it can be extended to transmute vectors to slices, or String to str. Using the Borrow trait, it allows a lot scope for zero-cost impedance matching of types across interface boundaries.

RefTransmute Trait

pub unsafe trait RefTransmute<T> {}

This marker trait indicates which types can be transmuted into reference form. Specifically, when RefTransmute<T> is implemented for a pointer-like type Ptr<T>, then it means an instance of Ptr<T> can be directly transmuted to &'ptr T, where 'ptr is tied to the lifetime of the Ptr<T> instance.

As an extension, it can also be used to map from a vector-like type to the representation of a slice - Vec<T> to &[T], for example.

It is only a marker and has no methods - it simply says the transmute is possible, but does not implement it.

It is trivially implementable for references:

unsafe impl<'a, T> RefTransmute<T> for &'a T {}

(Question: Does this actually get the lifetimes right?)

RefTransmute for std types.

Box<T>

This is the simplest case: Box<T> already satisfies the condition for RefTransmute<T>, and so requires no further changes:

unsafe impl<T> RefTransmute<T> for Box<T> {}

(It defacto true that the representation of Box<T> and &T are the same now; this adds a hard constraint that it remain true.)

Arc<T> / Rc<T>

Arc<T> simply contains Shared<ArcInner<T>>. Shared is simply a wrapper around a raw pointer, so Arc overall has the same representation as a pointer. Unfortunately, the pointer is to ArcInner<T>, which does not point directly to the payload T - in fact it must be the last element as it may be a DST.

We can solve this if instead of pointing at the start of ArcInner, we always point Arc at the address of the payload ArcInner.data, and access the weak and strong counters via negative offsets with raw pointer arithmetic.

Doing this allows Arc<T> to implement RefTransmute<T>.

Rc<T> can be handled using the same technique.

Vec<T>

If the layout of Vec<T> can be constrained so that the prefix of the vector is the same representation of [T] (ie, pointer and number of valid elements), then simple AsRef can be implemented as a no-op transmute. As far as I can gather, the current layout is pointer, capacity, len, so it would need to be rearranged. This is awkward because of an inner type RawVec.

This would allow - say - &HashSet<Vec<u32>> to be transmuted to &HashSet<&[u32]>.

If T itself is a RefTransmute smart pointer, then a vector of smart pointers can be transmuted to a slice of references.

String

Similar to Vec<T>, if the memory layout of String is constrained to be the same as str (ie, [u8]), then a String can be transmuted to &str. This would allow &HashSet<String> to be transmuted to &HashSet<&str>.

Implement Borrow

Once we have a useful pool of types which implement RefTransmute, we can make use of them by implementing Borrow for the collection types.

This allows a function a function to declare:

fn wantset<H: Borrow<HashSet<&str>>(hash: &H) { ... }

to be called with

    let h: HashSet<String> = makeset();
    wantset(&h);

Normal collections such as maps, sets, lists will have the form:

impl<PK, PV, K, V> Borrow<HashMap<&K, &V>> for HashMap<PK, PV>
    where PK: RefTransmute<K>,
          PV: RefTransmute<V>,
{
    fn borrow(&self) -> &HashMap<&K, &V> {
        unsafe {
            mem::transmute::<&HashMap<PK, PV>, &HashMap<&K, &V>>(self)
        }
    }
}

Rust types which have explicit borrowed counterparts will have multiple implementations:

impl<PT> Borrow<Vec<&T>> for Vec<PT> where PT: RefTransmute<T> {
// ...
}

impl<PT> Borrow<[&T]> for Vec<PT> where PT: RefTransmute<T> {
// ...
}

Open Issues

  1. Split RefTransmute<T> into RefTransmute<T> and SliceTransmute<T>?
  2. Is it possible to meaningfully implement RefTransmute for non-std types, esp for slices? Or does it necessarily require dependence on implementation details of slice?
  3. The Borrow implementation seems to always conflict with a default one - can this be solved with specialization?
  4. Is Borrow the best trait to use for this?
  5. Are there any cases where a collection could behave differently with a ref form vs an owned form?
  6. Are there any interactions with Any or other library/language features?
  7. What other constraints are there on the layout of Vec<T> and String (or other types) which would prevent the layouts proposed here?
  8. What haven’t I thought of?

#2

Can you elaborate on how that (and the String variant) works? I don’t see any way to make it work: While Vec<T> may be changed to have a &[T]-like prefix, its size is necessarily different, and the size affects the memory layout of the container. This is more apparent with something like Vec<Vec<T>> vs Vec<&[T]>, but hash tables (and indeed any container I can think of) also need to know how large their elements are.


#3

Hm, I see. You’re saying that when you pass &HashSet<&[u32]> to a function, then it will monomorphize with that slice type rather than the real underlying type, and so will generate bad code for the actual representation. The collection is immutable so we’re not worried about updates being wrong, but if the implementation uses any kind of array where the stride depends on the full element size, it will traverse wrongly. But an implementation that boxes every element separately will be fine.

That suggests that rather than using plain &[T], we need a type that has the same in-memory size as the original container and implements AsRef<[T]> - but that’s a pretty good description of the original smart pointer type, which makes all this moot.

Hm, so that makes the slice aspect of this very dubious, but the plain pointer-ref aspect should still be OK. It does suggest that splitting RefTransmute and SliceTransmute into separate traits is the right choice, so that collections which can support transmuted slices (LinkedList?) can do so, independent of transmuted pointers.


#4

Is there a reason for pursuing this over a more general Coercible trait (postponed PR)?


#5

Thanks for the pointer. I did a search before writing this up, but I guess I didn’t get the right keywords.

Hm, it looks like it does cover the same ground (“mass borrowing”) - I think it is a superset of this proposal.


#6

The common way of doing this is to pass an iterator and if necessary use map() on the iterator to turn the contents into nested iterators, turn Box into &T, etc.

That works with current Rust and also works with any collection regardless of memory layout, while still generating optimal code due to monomorphization.

For HashMap, you can pass an Fn(K) -> Option instead of or in addition to an iterator if fast lookup is needed (likewise Fn(K) -> bool for HashSet).


#7

I’d like to see the Coercible trait or something like it. Another use case is when you want to convert a vector of a newtype wrapper back to the original type.