Collection Traits, Take 2

Before we finalize everything forever, I figured we should quickly reflect on what we want to collection traits to look like. So here’s a quick sketch of the minimal API I think we should provide. Some notable points:

  • Everything is associated types; nothing should plausibly be a collection of more than one type at once.
  • Very little inherits from anything else. This might be unergonomic (wrangling that foo is a Stack and Seq with the same Elem type). Not sure if we should mandate that e.g. all Deques are Sequences. We currently explicitly don’t provide indexing into DLists. Although I don’t really agree with that, I “get” why. Also it probably makes sense to mandate that Sequences and Maps have appropriate Index impls?
  • I use a notion of an “inherent” impl which is an idea that has been kicking around for some time. Basically, RingBuf would inherently expose the Deque interface even if the trait is unavailable, but not Stack or Queue. In this way we nudge you to picking a particular “view” of the collection and sticking with it. Deques are only Deques unless you explicitly ask for the other interfaces.
  • These traits have no support for “immutable” collections. They assume collections can be mutated. I wasn’t sure what the best way to handle immutable collections would be without including an Immutable and Mutable version of every single trait. Perhaps an associated Item (IsMutable=true?) and some where clauses could handle this well?
  • There traits can’t really be expressed correctly because of iterator types and something something HKT.
  • Mandating basic iter/iter_mut/into_iter support is going to be tricky because of weird cases like Bitv which you can’t get references into. Just having the Iterable traits on the side might be the right call here?
  • Stack plays a bit funky with Vec because it would have last and peek due to its slice-ness. Although maybe all Seq’s should have first and last?
trait Stack {
    type Elem;
    
    fn push(&mut self, elem: Elem);
    fn pop(&mut self) -> Option<Elem>;
    fn peek(&self) -> Option<&Elem>;
}
trait Queue {
    type Elem;

    fn enqueue(&mut self, elem: Elem);
    fn dequeue(&mut self) -> Option<Elem>;
    fn peek(&self) -> Option<&Elem>;
}
trait Deque {
    type Elem;

    fn push_back(&mut self, elem: Elem);
    fn push_front(&mut self, elem: Elem);
    fn pop_back(&mut self) -> Option<Elem>;
    fn pop_front(&mut self) -> Option<Elem>;
    fn front(&self) -> Option<&Elem>;
    fn back(&self) -> Option<&Elem>;
}
trait Seq {
    type Elem;

    fn insert(&mut self, index: usize, elem: Elem);
    fn remove(&mut self, index: usize) -> elem: Elem;
    fn get(&self, index: usize) -> Option<&Elem>;
    fn get_mut(&mut self, index: usize) -> Option<&mut Elem>;
}
trait Map {
    type Key;
    type Val;

    fn insert(&mut self, key: Key, val: Val) -> Option<Val>;
    fn remove<Q>(&mut self, key: &Q) -> Option<Val> 
        where Key: Borrow<Q>; 
    fn get_mut<Q>(&mut self, key: &Q) -> Option<&mut Val> 
        where Key: Borrow<Q>;
    fn get<Q>(&self, key: &Q) -> Option<&Val> 
        where Key: Borrow<Q>;
    fn contains_key<Q>(&self, key: &Q) -> bool 
        where Key: Borrow<Q> {
        self.get(key).is_some()
    }
}
trait SortedMap: Map {  
    type Range;
    type RangeMut;

    fn range<Q, R>(&self, Bound<Q>, Bound<R>) -> Range
        where Key: Borrow<Q> + Borrow<R>;

    fn range_mut<Q, R>(&mut self, Bound<Q>, Bound<R>) -> RangeMut
        where Key: Borrow<Q> + Borrow<R>;
}
trait Set {
    type Elem;

    type Union;
    type Intersection;
    type Difference;
    type SymmetricDifference;

    fn insert(&mut self, elem: Elem) -> bool;
    fn remove<Q>(&mut self, elem: &Q) -> bool
        where Key: Borrow<Q>;
    fn contains<Q>(&mut self, elem: &Q) -> bool
        where Key: Borrow<Q>;
    // all the set-theoretic ops
}
trait SortedSet: Set {
    type Range;
    type RangeMut;

    fn range<Q, R>(&self, Bound<Q>, Bound<R>) -> Range
        where Elem: Borrow<Q> + Borrow<R>;

    fn range_mut<Q, R>(&mut self, Bound<Q>, Bound<R>) -> RangeMut
        where Elem: Borrow<Q> + Borrow<R>;
}
trait PriorityQueue {
    type Elem;

    fn push(&mut self, elem: Elem);
    fn pop(&mut self) -> Option<Elem>;
    fn peek(&self) -> Option<&Elem>;
}
inherent impl Stack for Vec;
inherent impl Seq for Vec;

inherent impl Deque for RingBuf;
impl Stack for RingBuf;
impl Queue for RingBuf;
inherent impl Seq for RingBuf;

inherent impl Deque for DList;
impl Stack for DList;
impl Queue for DList;
inherent impl Seq for DList;

inherent impl Stack for Bitv;
inherent impl Seq for Bitv;

inherent impl Map for HashMap;

inherent impl Map for BTreeMap;
inherent impl SortedMap for BTreeMap;

inherent impl Map for VecMap;
inherent impl SortedMap for VecMap;

inherent impl Set for HashSet;

inherent impl Set for BTreeSet;
inherent impl SortedSet for BTreeSet;

inherent impl Set for BitvSet;
inherent impl SortedSet for BitvSet;

inherent impl PriorityQueue for BinaryHeap;
2 Likes

What’s the argument for not splitting the traits on mutability like the old ones were (e.g. Map and MutableMap)? Some types like the ones in the PHF library are totally immutable and it seems kind of weird to have to either not implement Map for it or have insert and remove that panic.

Java doesn’t have a mutable/immutable split and it causes a ton of unfortunate API designs like Guava’s ImmutableMap implementing Map and throwing in all mutation methods, or taking the other route and defining new interfaces that don’t interoperate with the rest of the Java ecosystem.

The situation’s a bit better in Rust since we have more control over what can be mutated when, but I think making the Mutable/Immutable distinction is still important.

3 Likes

I’m not sure that the bounds on the key types will work. For example, HashMap requires Key: Borrow<Q>, Q: Hash, right?

If maps had the entry API in their trait, that’d be really helpful.

The only argument against splitting them is ergonomics of the common case. But yeah, supporting immutable collections is definitely important. Easy enough to split them up.

Ah, yeah, definitely!

Ack, I'm not sure how to handle that with any present or future planned languages features.

I haven’t missed the collection traits ever since we lost them. I don’t want Rust to add this just for some kind of OOP cozy feeling of being a bit like Java.

It seems especially risky to introduce API methods this way that don’t exist today – there is little evidence that they are useful at all.

1 Like

Should we rethink our strategy for key equivalence then? The current setup never really solved one of the issues from the old Equiv system anyway: we’re still just hoping that Q hashes the same as K does.

The ideal setup in my head seems like having the K and Q types “downcast” to unify to something:

pub trait Downcast<T: ?Sized> {
    fn downcast(&self) -> &T;
}

impl<T> Downcast<T> for T {
    fn downcast(&self) -> &T { self }
}

impl Downcast<str> for String {
    fn downcast(&self) -> &str -> { &**self }
}

...

pub trait Map<K, V, Q> where K: Downcast<Q> {
    fn get<U>(&self, k: U) -> Option<&V> where U: Downcast<Q>;

pub struct HashMap<K, V, Q=K>{ ... }

impl Map for HashMap<K, V, Q> where K: Downcast<Q>, Q: Hash+Eq {
    ...
}

It does have the downside of adding an extra parameter to all of the data structures, which is a huge bummer.

@blake I’m confused, what does this have to do with an “OOP cozy feeling”? This is exactly the use case for traits - to extract common functionality to an interface so that you can write algorithms generic over any map or set. All of these methods already exist today. These traits are just going to extract the common API we’ve already defined in the collections RFC.

BTW, These traits should probably not be added until HKT, since I think it’s pretty crucial that they define the relevant iterator methods.

1 Like

I think the only thing we can require for map (and set) lookups is the key type itself. Omitting, for the moment, iteration, entries, mutability, and a collection supertrait, I propose something like:

pub trait Map: MapLookup<Self::K> {
    type K;
    type V;
    fn insert(&mut self, key: Self::K, value: Self::V) -> Option<Self::V>;
}
pub trait MapLookup<Q>: Map {
    fn get(&self, key: Q) -> Option<&Self::V>;
    fn get_mut(&mut self, key: Q) -> Option<&mut Self::V>;
    fn remove(&mut self, key: Q) -> Option<Self::V>;
    fn contains_key(&self, key: Q) -> bool { self.get(key).is_some() }
}

Then maps can parameterize over multiple keys according to whatever requirements they have:

impl<K, V, S, H> Map for HashMap<K, V, S>
    where K: Eq + Hash<H>, S: HashState<Hasher=H>, H: Hasher<Output=u64> {
    type K = K;
    type V = V;
    ...
}
impl<'a, K, V, S, H, Q> MapLookup<&'a Q> for HashMap<K, V, S>
    where K: Eq + Hash<H> + Borrow<Q>, S: HashState<Hasher=H>,
          H: Hasher<Output=u64>, Q: Eq + Hash<H> { ... }

This could be combined with sfackler’s suggestion of a downcast-like mechanism to ensure hashing consistency for HashMap, but lets things remain flexible for things like comparator-based maps.

Oh, I forgot to mention: this is a really cool idea!

I’d like to have entry API for other collections as well, e.g. Deque or so. As @Gankro correctly pointed out, in my use case the instance

match ringbuf.front().map(|obj| obj.cond()).unwrap_or(false) {
    return Some(ringbuf.pop_front().unwrap());
}

LLVM managed to optimize out the .unwrap() call. However, in other cases, where lifetimes are involved, an entry API is quite useful, e.g.:

let obj = ringbuf.front(); // `Option` handling omitted
[...]
if obj.cond() {
    return Some(ringbuf.pop_front().unwrap());
}

This (correctly) results in a lifetime error. With an entry API for RingBuf, this would look like:

let obj = ringbuf.front(); // Error handling omitted, assume this is a `OccupiedEntry`.
[...]
if obj.get().cond() {
    return Some(obj.remove());
}

(which looks nicer to me) and gets around the lifetime error.

I believe non-lexical borrows would be sufficient for your usecase. It would see that obj is not used after you enter the if body, and allow you to mutate the buffer.

I think.

“Can I get my keys back?” – this reddit thread gives a valid use case for getting the keys back out of the collection.

(Figured I’d post this here as this could influence these collection traits too.)

Based on my fever-dreams of hyper-overloaded indexing, here’s a different structuring of the traits (now with mutable variants):

trait Collection {
  fn len(&self) -> usize;
  fn is_empty(&self) -> bool { ... }
}
trait Get<Index> {
    type Output;
    fn get(&self, i: Index) -> Output;
}
trait GetMut<Index> {
    type Output;
    fn get(&self, i: Index) -> Output;
}
trait Insert<Index, Val> {
    type Output;
    fn insert(&self, i: Index, v: Val) -> Output;
}
trait Remove<Index> {
    type Output;
    fn remove(&self, i: Index, v: Val) -> Output;
}
trait Stack: 
    Collection 
{
    type Elem;
    fn push(&mut self, elem: Elem);
    fn pop(&mut self) -> Option<Elem>;
    fn peek(&self) -> Option<&Elem>;
}
trait Queue: 
    Collection 
{
    type Elem;
    fn push(&mut self, elem: Elem);
    fn pop(&mut self) -> Option<Elem>;
    fn peek(&self) -> Option<&Elem>;
}
// PriorityQueue same as Stack and Queue
trait Deque:
    Collection 
{
    type Elem;
    fn push_back(&mut self, elem: Elem);
    fn push_front(&mut self, elem: Elem);
    fn pop_back(&mut self) -> Option<Elem>;
    fn pop_front(&mut self) -> Option<Elem>;
    fn front(&self) -> Option<&Elem>;
    fn back(&self) -> Option<&Elem>;
}
trait Sequence: 
    Collection
    Get<usize, Option<&Elem>>.
    Index<usize, Elem>, 
{
    type Elem;
}
trait MutableSequence: 
    Sequence, 
    GetMut<usize, Option<&mut Elem>>,
    IndexMut<usize, Val>
    Insert<usize, Elem, ()>,
    Remove<usize, Option<Elem>>, 
    IndexMut<usize, Elem>,
{}
trait Map:
    Collection
    Get<&Key, Option<&Val>>, 
    Index<&Key, Val>,
{
    type Key;
    type Val;
    
    fn contains_key(&self, &Key) { ... }
}
trait MutableMap:
    Map,
    GetMut<&Key, Option<&mut Val>>,
    IndexMut<&Key, Val>>,
    Insert<Key, Val, Option<Val>>,
    Remove<&Key, Option<Val>>,
{}
trait SortedMap: 
    Map 
{}
trait SortedMutableMap: 
    SortedMap, 
    MutableMap,
{}
// Set mirrors Map structure accordingly

Notes:

  • I’m hand-waving HKT and self-lifetime stuff here, assuming it’s solved when we get to this point.
  • I’m handwaving that you can refer to associated type positionally in generics for convenience
  • Stack, Queue, and Deque have no Mutable variant because I don’t think that’s useful.
  • Better composition! \o/
  • Insert is a bit awkward because the semantics of Sequence insertion and Map insertion are different (sequence shifts, Map replaces). Still, I want a trait so that you can overload it to insert e.g. an IntoIterator at an index for sequences. Maybe there should be two traits. Or maybe maps shouldn’t use the trait (I don’t think they can use overloaded insert quite as much?)
  • I have, perhaps dubiously, given Stack and Queue identical method names. This may be a problem if you have a concrete Deque in scope and both Stack and Queue imported. Otherwise I think it’s a nice consistency bonus.
  • I’m not sure what to guarantee iteration-wise in these traits. In particular Bitv and BitvSet can’t return anything by-reference (modulo the static TRUE: bool = true trick). It’s possible for e.g. a Trie to have similar behaviour for its keys. Especially if we go forward with a Integer-like-key trait.

Sorry for necroposting, but what’s with this? I find it very nice to have a common traits for Maps, Seqs etc. I guess, this aren’t going to be implemented, since it’s past 1.0, but can you comment on the fortune of the feature, as it’s not clear from the last posts here?

The electic crate has some experimental collections traits, but the feature as a whole is somewhat blocked on higher kinded type support (see http://apasel422.github.io/eclectic/eclectic/#a-note-on-trait-objects).

1 Like

I’ve been looking at being able to reform the std collection tests as being trait tests so that anyone implementing Set could run them as ‘free’ tests over their implementation to have some comfort that their implementation is conventional.

Obviously this depends on the existence of aforementioned traits. Will look into the above crate - thanks!

1 Like

I’d like to know the worst case allowed algorithmic complexity of the operations (time and space) exposed by these traits, so that if I want to insert an element in each sequence of a Vec<Box<Seq>> I have at least enough information to reason about what this actually means.

3 Likes

@gnzlbg as these are traits there hopefully will be several alternative implementations probably with radically different performance trade-offs. If we can do trait tests then we should also be able to do trait benches which could highlight the different performance characteristics of the various std and crate implementations.

Long term I think it would be great to see collection implementations that could switch their implementation once past certain thresholds (E.g. C#'s hybrid dictionary) or as we’re generally after zero cost abstractions we use runtime / test performance data to select ideal collection implementations for the workload (Maybe with a Collections.lock file like a Cargo.lock file).

Anyway for now am playing with trait tests ( https://github.com/gilescope/iunit ) and seeing how far we can push the concept.