Collection Traits, Take 2

This doesn't help you if you are writing generic code and take any type that implements any of these traits.

If we can do trait tests then we should also be able to do trait benches which could highlight the different performance characteristics of the various std and crate implementations.

Worst-case algorithmic doesn't necessarily correlate with performance. I want to know that if my application works in a reasonable time for N inputs, that using these traits does not mean that it can blow up if suddenly it gets 2*N or 3*N inputs. I want to know this independently of how efficient my implementation is.

It will also limit which collections can implement which traits. For example, is it valid to implement Map for a sorted vector with O(N) insert? If a function in an API returns -> impl Map I really doubt that people will expect the Map to have O(N) insert. Most people will expect it to have O(1) or O(logN) worst-case complexity, and there is a pretty big difference between them. For example, inserting in a loop into a reasonably-implemented HashMap has ~O(N) worst-case complexity, but doing so into a sorted vector has O(N^2) worst-case complexity. This is pretty much the difference between whether inserting a million elements happens almost instantaneously or takes forever.

1 Like

Is it possible to record such constraints in the type system as marker traits? Maybe trait InsertOrderN : InsertOrder1, and then do + InsertOrderN on the where clause.

1 Like

@Gankra did a pretty good job in this comment: Collection Traits, Take 2 - #15 by Gankra where basically the complexities that the different operations should have are clear, and some operations are separated into their own traits. For example, Insert is its own trait, and Map requires it, while Seq has its own Insert method which is different because it takes an index. However, Seq could also implement Insert with the same complexity as Map, because without an index that would just mean "insert somewhere", and Seqs can typically insert somewhere in O(1): Vec and Deque can insert at the end in amortized O(1), and any List can insert at the front in O(1). If the insert trait would specify that Insert::insert has at worst O(logN) complexity, that trait can be implemented for a lot of collections, but for example it cannot be implemented for a sorted vector, and thus a sorted vector cannot implement Map, which kind of would match my expectations - the proper way to use a sorted vector when inserting in a loop is to bulk insert at the end + sort + unique which is O(NlogN) and the API is necessarily going to clash with that of collections that can just insert in O(1) or O(logN) and thus can just insert the elements one by one in a loop.

So in my opinion, just adding docs to @Gankra's proposal specifying a worst case complexity that implementations have to ensure would be alright. If some implementation violates that, well, then its a bad implementation, but no unsafety can result from that.

Enforcing this at the type-level is necessarily going to complicate @Gankra's proposal. One thing we could do is, instead of having just one Insert trait, having many traits for different complexity classes:

mod worst_1 { 
    // ^ replace worst_1 with a suitable name,
    // I am bad at naming
    trait Insert;
}
mod worst_logN { trait Insert; }
mod worst_N { trait Insert; }

and then building the "collection" traits on top of these, for example:

// So Map Insert has at least O(logN) worst case compleixt
trait Map: worst_logN::Insert; 
trait Seq: worst_1::Insert;

Finally we could twist this further such that a type that implements worst_logN::Insert automatically also implements worst_N::Insert, so that if you have code that requires worst_N::Insert you can pass it Maps and Seqs without issues. Blanket impls could be one way to do this, but I always felt that designs with many blanket impls turn out brittle.

Personally, I don't know if going this far is worth it. In particular, this lets you do non-sensical things like implementing Insert for a collection multiple times for the different complexity guarantees. Maybe adding blanket impls could make this an error, but specialization would probably make this possible again.

I like @Gankra's proposal a lot. It is simple, and adding some comments saying which complexity implementors should uphold when implementing the traits would be enough for me. If overspecification is a concern, we don't really have to make the bounds too tight either, I'd be fine with not using O(1) anywhere and making O(logN)the tightest complexity class that we specify. That would already allow many traits to be implemented for both sequences and associative collections and honestly the difference between O(1) and O(logN) is really blurry in practice, to the point that I at least don't care that much about it (O(1) => 1 but O(log1000000) => 6 which is really close to 1 in the range of inputs [0, 1000000]).

1 Like

I believe this is the right choice. As I argued here -- though perhaps not as clearly as I could've -- I think that persistent collections in Rust should implement the same interfaces as everything else. You basically have:

#[derive(Clone)]
struct PersistentVec<E> {
    data: Arc<Handle<E>>
}

and then push is:

fn push(&mut self, element: E) {
    let new_data = self.data.add(element);
    self.data = new_data;
}

Note that it may make sense to have fn add(self, element: E) -> Self (I've wanted that so many times), but it would make sense to have that for both collections. (I would make it fn(self) and not fn(&self) -- you can always do foo.clone().add(...) if you want that, and it allows for more efficiency in the case of a persistent collection that is consumed the only handle).

6 Likes

I like the idea of documenting the expected performance characteristics of things that impl the traits, but I don’t like the idea of making it mandatory. Maybe your crate is accidentally quadratic if some user uses a non compliant map structure, but that should be their choice. For example in things I have been working on (cargo) we clone some data structures way more often than we make them. I could see it being an advantage to use a data structure that is accidentally quadratic on setup if it also had O(1) clone.

In general if something is memory safe then rust makes the correct choice eazy and other choices possible.

1 Like

Another approach to document complexity is using associative types. First of all, define a set of complexities:

pub trait Complexity {}

// Could additionally implement `min_complexity`
mod max_complexity {
    pub trait Constant: super::Complexity {}
    pub trait LogN: super::Complexity {}
    pub trait N: super::Complexity {}
    pub trait NLogN: super::Complexity {}
    pub trait NSquare: super::Complexity {}
    pub trait PowerN: super::Complexity {}
    pub trait NFactorial: super::Complexity {}

    impl<T> NFactorial for T where T: PowerN {}
    impl<T> PowerN for T where T: NSquare {}
    impl<T> NSquare for T where T: NLogN {}
    impl<T> NLogN for T where T: N {}
    impl<T> N for T where T: LogN {}
    impl<T> LogN for T where T: Constant {}
}

mod complexity {
    macro_rules! impl_complexity {
        ($complexity:tt) => {
            pub struct $complexity;
            impl Complexity for $complexity {}
            impl min_complexity::$complexity for $complexity {}
            impl max_complexity::$complexity for $complexity {}
        };
    }
    
    impl_complexity!(Constant);
    impl_complexity!(LogN);
    impl_complexity!(N);
    impl_complexity!(NLogN);
    impl_complexity!(NSquare);
    impl_complexity!(PowerN);
    impl_complexity!(NFactorial);
}

With this, the collection trait would look similar to this:

trait Insert<I>: Collection {
    type Complexity: Complexity;

    fn insert(&mut self, index: I, element: Self::Elem);
}

If we now implement this e.g. for Vec:

impl<T> Insert<usize> for Vec<T> {
    type Complexity = complexity::N;

    fn insert(&mut self, index: usize, element: T) {
        Vec::insert(self, index, element)
    }
}

Now, for every operation, we can require a maximum complexity:

fn requires_n_square<C, T>(collection: &mut C, element: T)
where
    C: Insert<T>,
    C::Complexity: max_complexity::NSquare,
{
    // ...
}

fn requires_constant<C, T>(collection: &mut C, element: T)
where
    C: Insert<T>,
    C::Complexity: max_complexity::Constant,
{
    // ...
}

Now, let’s call them:

let mut vec = vec![1, 2, 3];
requires_n_square(&mut vec, 0);
// requires_constant(&mut vec, 0);
// ^~~ the trait `max_complexity::Constant` is not implemented for `complexity::N`

This has some major advantages:

  • it’s intuitive to use
  • the error message is unique and understandable
  • it’s optional

See it on the playground.

4 Likes

Would it be possible to add a mutable priority queue trait? I’m really fond of python’s heapdict which is a mutable priority queue and is quite useful in those applications where you need to mutate the priority of an item.

For the SortedMap trait, is the sort always forced to be on the key, or can it be on the value instead? If so, how is this specified?

I don't see, why this is &self instead of &mut self. Could you elaborate on this? Also: Why returning Output instead of a reference?

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.