Collection Traits, Take 2


#10

I think the only thing we can require for map (and set) lookups is the key type itself. Omitting, for the moment, iteration, entries, mutability, and a collection supertrait, I propose something like:

pub trait Map: MapLookup<Self::K> {
    type K;
    type V;
    fn insert(&mut self, key: Self::K, value: Self::V) -> Option<Self::V>;
}
pub trait MapLookup<Q>: Map {
    fn get(&self, key: Q) -> Option<&Self::V>;
    fn get_mut(&mut self, key: Q) -> Option<&mut Self::V>;
    fn remove(&mut self, key: Q) -> Option<Self::V>;
    fn contains_key(&self, key: Q) -> bool { self.get(key).is_some() }
}

Then maps can parameterize over multiple keys according to whatever requirements they have:

impl<K, V, S, H> Map for HashMap<K, V, S>
    where K: Eq + Hash<H>, S: HashState<Hasher=H>, H: Hasher<Output=u64> {
    type K = K;
    type V = V;
    ...
}
impl<'a, K, V, S, H, Q> MapLookup<&'a Q> for HashMap<K, V, S>
    where K: Eq + Hash<H> + Borrow<Q>, S: HashState<Hasher=H>,
          H: Hasher<Output=u64>, Q: Eq + Hash<H> { ... }

This could be combined with sfackler’s suggestion of a downcast-like mechanism to ensure hashing consistency for HashMap, but lets things remain flexible for things like comparator-based maps.


#11

Oh, I forgot to mention: this is a really cool idea!


#12

I’d like to have entry API for other collections as well, e.g. Deque or so. As @Gankro correctly pointed out, in my use case the instance

match ringbuf.front().map(|obj| obj.cond()).unwrap_or(false) {
    return Some(ringbuf.pop_front().unwrap());
}

LLVM managed to optimize out the .unwrap() call. However, in other cases, where lifetimes are involved, an entry API is quite useful, e.g.:

let obj = ringbuf.front(); // `Option` handling omitted
[...]
if obj.cond() {
    return Some(ringbuf.pop_front().unwrap());
}

This (correctly) results in a lifetime error. With an entry API for RingBuf, this would look like:

let obj = ringbuf.front(); // Error handling omitted, assume this is a `OccupiedEntry`.
[...]
if obj.get().cond() {
    return Some(obj.remove());
}

(which looks nicer to me) and gets around the lifetime error.


#13

I believe non-lexical borrows would be sufficient for your usecase. It would see that obj is not used after you enter the if body, and allow you to mutate the buffer.

I think.


#14

“Can I get my keys back?” – this reddit thread gives a valid use case for getting the keys back out of the collection.

(Figured I’d post this here as this could influence these collection traits too.)


#15

Based on my fever-dreams of hyper-overloaded indexing, here’s a different structuring of the traits (now with mutable variants):

trait Collection {
  fn len(&self) -> usize;
  fn is_empty(&self) -> bool { ... }
}
trait Get<Index> {
    type Output;
    fn get(&self, i: Index) -> Output;
}
trait GetMut<Index> {
    type Output;
    fn get(&self, i: Index) -> Output;
}
trait Insert<Index, Val> {
    type Output;
    fn insert(&self, i: Index, v: Val) -> Output;
}
trait Remove<Index> {
    type Output;
    fn remove(&self, i: Index, v: Val) -> Output;
}
trait Stack: 
    Collection 
{
    type Elem;
    fn push(&mut self, elem: Elem);
    fn pop(&mut self) -> Option<Elem>;
    fn peek(&self) -> Option<&Elem>;
}
trait Queue: 
    Collection 
{
    type Elem;
    fn push(&mut self, elem: Elem);
    fn pop(&mut self) -> Option<Elem>;
    fn peek(&self) -> Option<&Elem>;
}
// PriorityQueue same as Stack and Queue
trait Deque:
    Collection 
{
    type Elem;
    fn push_back(&mut self, elem: Elem);
    fn push_front(&mut self, elem: Elem);
    fn pop_back(&mut self) -> Option<Elem>;
    fn pop_front(&mut self) -> Option<Elem>;
    fn front(&self) -> Option<&Elem>;
    fn back(&self) -> Option<&Elem>;
}
trait Sequence: 
    Collection
    Get<usize, Option<&Elem>>.
    Index<usize, Elem>, 
{
    type Elem;
}
trait MutableSequence: 
    Sequence, 
    GetMut<usize, Option<&mut Elem>>,
    IndexMut<usize, Val>
    Insert<usize, Elem, ()>,
    Remove<usize, Option<Elem>>, 
    IndexMut<usize, Elem>,
{}
trait Map:
    Collection
    Get<&Key, Option<&Val>>, 
    Index<&Key, Val>,
{
    type Key;
    type Val;
    
    fn contains_key(&self, &Key) { ... }
}
trait MutableMap:
    Map,
    GetMut<&Key, Option<&mut Val>>,
    IndexMut<&Key, Val>>,
    Insert<Key, Val, Option<Val>>,
    Remove<&Key, Option<Val>>,
{}
trait SortedMap: 
    Map 
{}
trait SortedMutableMap: 
    SortedMap, 
    MutableMap,
{}
// Set mirrors Map structure accordingly

Notes:

  • I’m hand-waving HKT and self-lifetime stuff here, assuming it’s solved when we get to this point.
  • I’m handwaving that you can refer to associated type positionally in generics for convenience
  • Stack, Queue, and Deque have no Mutable variant because I don’t think that’s useful.
  • Better composition! \o/
  • Insert is a bit awkward because the semantics of Sequence insertion and Map insertion are different (sequence shifts, Map replaces). Still, I want a trait so that you can overload it to insert e.g. an IntoIterator at an index for sequences. Maybe there should be two traits. Or maybe maps shouldn’t use the trait (I don’t think they can use overloaded insert quite as much?)
  • I have, perhaps dubiously, given Stack and Queue identical method names. This may be a problem if you have a concrete Deque in scope and both Stack and Queue imported. Otherwise I think it’s a nice consistency bonus.
  • I’m not sure what to guarantee iteration-wise in these traits. In particular Bitv and BitvSet can’t return anything by-reference (modulo the static TRUE: bool = true trick). It’s possible for e.g. a Trie to have similar behaviour for its keys. Especially if we go forward with a Integer-like-key trait.

#16

Sorry for necroposting, but what’s with this? I find it very nice to have a common traits for Maps, Seqs etc. I guess, this aren’t going to be implemented, since it’s past 1.0, but can you comment on the fortune of the feature, as it’s not clear from the last posts here?


#17

The electic crate has some experimental collections traits, but the feature as a whole is somewhat blocked on higher kinded type support (see http://apasel422.github.io/eclectic/eclectic/#a-note-on-trait-objects).


#18

I’ve been looking at being able to reform the std collection tests as being trait tests so that anyone implementing Set could run them as ‘free’ tests over their implementation to have some comfort that their implementation is conventional.

Obviously this depends on the existence of aforementioned traits. Will look into the above crate - thanks!


#19

I’d like to know the worst case allowed algorithmic complexity of the operations (time and space) exposed by these traits, so that if I want to insert an element in each sequence of a Vec<Box<Seq>> I have at least enough information to reason about what this actually means.


#20

@gnzlbg as these are traits there hopefully will be several alternative implementations probably with radically different performance trade-offs. If we can do trait tests then we should also be able to do trait benches which could highlight the different performance characteristics of the various std and crate implementations.

Long term I think it would be great to see collection implementations that could switch their implementation once past certain thresholds (E.g. C#'s hybrid dictionary) or as we’re generally after zero cost abstractions we use runtime / test performance data to select ideal collection implementations for the workload (Maybe with a Collections.lock file like a Cargo.lock file).

Anyway for now am playing with trait tests ( https://github.com/gilescope/iunit ) and seeing how far we can push the concept.


#21

This doesn’t help you if you are writing generic code and take any type that implements any of these traits.

If we can do trait tests then we should also be able to do trait benches which could highlight the different performance characteristics of the various std and crate implementations.

Worst-case algorithmic doesn’t necessarily correlate with performance. I want to know that if my application works in a reasonable time for N inputs, that using these traits does not mean that it can blow up if suddenly it gets 2*N or 3*N inputs. I want to know this independently of how efficient my implementation is.

It will also limit which collections can implement which traits. For example, is it valid to implement Map for a sorted vector with O(N) insert? If a function in an API returns -> impl Map I really doubt that people will expect the Map to have O(N) insert. Most people will expect it to have O(1) or O(logN) worst-case complexity, and there is a pretty big difference between them. For example, inserting in a loop into a reasonably-implemented HashMap has ~O(N) worst-case complexity, but doing so into a sorted vector has O(N^2) worst-case complexity. This is pretty much the difference between whether inserting a million elements happens almost instantaneously or takes forever.


#22

Is it possible to record such constraints in the type system as marker traits? Maybe trait InsertOrderN : InsertOrder1, and then do + InsertOrderN on the where clause.


#23

@Gankro did a pretty good job in this comment: Collection Traits, Take 2 where basically the complexities that the different operations should have are clear, and some operations are separated into their own traits. For example, Insert is its own trait, and Map requires it, while Seq has its own Insert method which is different because it takes an index. However, Seq could also implement Insert with the same complexity as Map, because without an index that would just mean “insert somewhere”, and Seqs can typically insert somewhere in O(1): Vec and Deque can insert at the end in amortized O(1), and any List can insert at the front in O(1). If the insert trait would specify that Insert::insert has at worst O(logN) complexity, that trait can be implemented for a lot of collections, but for example it cannot be implemented for a sorted vector, and thus a sorted vector cannot implement Map, which kind of would match my expectations - the proper way to use a sorted vector when inserting in a loop is to bulk insert at the end + sort + unique which is O(NlogN) and the API is necessarily going to clash with that of collections that can just insert in O(1) or O(logN) and thus can just insert the elements one by one in a loop.

So in my opinion, just adding docs to @Gankro’s proposal specifying a worst case complexity that implementations have to ensure would be alright. If some implementation violates that, well, then its a bad implementation, but no unsafety can result from that.

Enforcing this at the type-level is necessarily going to complicate @Gankro’s proposal. One thing we could do is, instead of having just one Insert trait, having many traits for different complexity classes:

mod worst_1 { 
    // ^ replace worst_1 with a suitable name,
    // I am bad at naming
    trait Insert;
}
mod worst_logN { trait Insert; }
mod worst_N { trait Insert; }

and then building the “collection” traits on top of these, for example:

// So Map Insert has at least O(logN) worst case compleixt
trait Map: worst_logN::Insert; 
trait Seq: worst_1::Insert;

Finally we could twist this further such that a type that implements worst_logN::Insert automatically also implements worst_N::Insert, so that if you have code that requires worst_N::Insert you can pass it Maps and Seqs without issues. Blanket impls could be one way to do this, but I always felt that designs with many blanket impls turn out brittle.

Personally, I don’t know if going this far is worth it. In particular, this lets you do non-sensical things like implementing Insert for a collection multiple times for the different complexity guarantees. Maybe adding blanket impls could make this an error, but specialization would probably make this possible again.

I like @Gankro’s proposal a lot. It is simple, and adding some comments saying which complexity implementors should uphold when implementing the traits would be enough for me. If overspecification is a concern, we don’t really have to make the bounds too tight either, I’d be fine with not using O(1) anywhere and making O(logN)the tightest complexity class that we specify. That would already allow many traits to be implemented for both sequences and associative collections and honestly the difference between O(1) and O(logN) is really blurry in practice, to the point that I at least don’t care that much about it (O(1) => 1 but O(log1000000) => 6 which is really close to 1 in the range of inputs [0, 1000000]).


#24

I believe this is the right choice. As I argued here – though perhaps not as clearly as I could’ve – I think that persistent collections in Rust should implement the same interfaces as everything else. You basically have:

#[derive(Clone)]
struct PersistentVec<E> {
    data: Arc<Handle<E>>
}

and then push is:

fn push(&mut self, element: E) {
    let new_data = self.data.add(element);
    self.data = new_data;
}

Note that it may make sense to have fn add(self, element: E) -> Self (I’ve wanted that so many times), but it would make sense to have that for both collections. (I would make it fn(self) and not fn(&self) – you can always do foo.clone().add(...) if you want that, and it allows for more efficiency in the case of a persistent collection that is consumed the only handle).


#25

I like the idea of documenting the expected performance characteristics of things that impl the traits, but I don’t like the idea of making it mandatory. Maybe your crate is accidentally quadratic if some user uses a non compliant map structure, but that should be their choice. For example in things I have been working on (cargo) we clone some data structures way more often than we make them. I could see it being an advantage to use a data structure that is accidentally quadratic on setup if it also had O(1) clone.

In general if something is memory safe then rust makes the correct choice eazy and other choices possible.


#26

Another approach to document complexity is using associative types. First of all, define a set of complexities:

pub trait Complexity {}

// Could additionally implement `min_complexity`
mod max_complexity {
    pub trait Constant: super::Complexity {}
    pub trait LogN: super::Complexity {}
    pub trait N: super::Complexity {}
    pub trait NLogN: super::Complexity {}
    pub trait NSquare: super::Complexity {}
    pub trait PowerN: super::Complexity {}
    pub trait NFactorial: super::Complexity {}

    impl<T> NFactorial for T where T: PowerN {}
    impl<T> PowerN for T where T: NSquare {}
    impl<T> NSquare for T where T: NLogN {}
    impl<T> NLogN for T where T: N {}
    impl<T> N for T where T: LogN {}
    impl<T> LogN for T where T: Constant {}
}

mod complexity {
    macro_rules! impl_complexity {
        ($complexity:tt) => {
            pub struct $complexity;
            impl Complexity for $complexity {}
            impl min_complexity::$complexity for $complexity {}
            impl max_complexity::$complexity for $complexity {}
        };
    }
    
    impl_complexity!(Constant);
    impl_complexity!(LogN);
    impl_complexity!(N);
    impl_complexity!(NLogN);
    impl_complexity!(NSquare);
    impl_complexity!(PowerN);
    impl_complexity!(NFactorial);
}

With this, the collection trait would look similar to this:

trait Insert<I>: Collection {
    type Complexity: Complexity;

    fn insert(&mut self, index: I, element: Self::Elem);
}

If we now implement this e.g. for Vec:

impl<T> Insert<usize> for Vec<T> {
    type Complexity = complexity::N;

    fn insert(&mut self, index: usize, element: T) {
        Vec::insert(self, index, element)
    }
}

Now, for every operation, we can require a maximum complexity:

fn requires_n_square<C, T>(collection: &mut C, element: T)
where
    C: Insert<T>,
    C::Complexity: max_complexity::NSquare,
{
    // ...
}

fn requires_constant<C, T>(collection: &mut C, element: T)
where
    C: Insert<T>,
    C::Complexity: max_complexity::Constant,
{
    // ...
}

Now, let’s call them:

let mut vec = vec![1, 2, 3];
requires_n_square(&mut vec, 0);
// requires_constant(&mut vec, 0);
// ^~~ the trait `max_complexity::Constant` is not implemented for `complexity::N`

This has some major advantages:

  • it’s intuitive to use
  • the error message is unique and understandable
  • it’s optional

See it on the playground.


#27

Would it be possible to add a mutable priority queue trait? I’m really fond of python’s heapdict which is a mutable priority queue and is quite useful in those applications where you need to mutate the priority of an item.


#28

For the SortedMap trait, is the sort always forced to be on the key, or can it be on the value instead? If so, how is this specified?


#29

I don’t see, why this is &self instead of &mut self. Could you elaborate on this? Also: Why returning Output instead of a reference?