Libcollections traits: a standard interface for all collections

Edit: I’ve started a proposal document that everyone is free to comment on, and submit changes to:

##main file is here ##history is here

#Original Post: It has come to my attention that some of the Rust collections do not expose their primary interface through any trait.

Offenders as of this post include:

  • PriorityQueue
  • EnumSet
  • BTree
  • LruCache

After raising the issue on #rust-internals, I was informed this is partially intentional while everything is figured out. I’m starting this thread as a place to discuss thoughts and concerns about traits provided by libcollections.

Why Traits?

I think it’s important to first establish why exactly we want standard collection traits, so we’re all on the same page about goals and priorities.

Ease of use

If a structure implements a standard trait, then it’s usually much easier to pick up and use than one with a random proprietary one. This holds true for both explicit traits and general conventions between similar but distinct traits.

Easy Replacement

Collections are a rare example of a perfectly replaceable structure. Most code, at least at the pure implementation level, shouldn’t care if the stack it is using is backed by a linked list or an array. Consequently, it should be easy to replace one with the other when design or performance issues demand it.

Testing and Guarantees

One frustrating detail of the current libcollection codebase, and consequently any third party data structure codebase, is that there’s a lot of duplication of testing efforts. Every implementer of the Map interface does basically the same thing from an external perspective, and the fact that most of the tests are basically the same reflects this.

Having good traits with well defined interactions allows us to write common tests once and reuse them elsewhere. Exporting these tests would also be a great benefit to third-party developers as it reduces their development times, and provides them with an official means to verify that their designs conform to the contracts of the traits they implement.

The Current Traits

  • Collection
  • Deque
  • Map
  • MutableMap
  • Set
  • MutableSet

There’s also currently an RFC to add MultiSet traits, and a request up for MultiMap as well.

The ones that exist are pretty minimal, and what they provide I find to be pretty unobjectionable. They cleanly boil down the essence of these collections well. However, the most obvious problem with the current trait situation is simply an absence of traits.

Off the top of my head, the following traits are probably a good idea to have in one form or another:

  • Queue
  • Stack
  • PriorityQueue
  • List (random access by index)
  • Sorted Variants (methods like find_with are cropping up in a few places)

Future of Traits, and Issues

More than anything, I’d like to see better documentation of how distinct traits should interact. For instance, Collections that implement Default should probably return an empty instance as their Default. Sets shouldn’t grow their len() if you insert the same element twice, but Deques should. These are obvious things to everyone, but making them explicit might be worthwhile. Particularly in the form of centralized tests.

Centralized benchmarks would also be handy for comparing performance characteristics of implementations.

On PriorityQueues

Priority Queues are an interesting problem. The basic interface is fairly easy. It’s just a queue. However the various extensions and usecases for them complicates this. For one, people often don’t have a struct that directly implements Ord, or it if does, it might not be the ordering they want (the most common case is that you simply want the reversed ordering).

  • Should users be expected to wrap their structures in something that implements the right Ord? Or can they provide a custom “comparator” in the style of Java and C#?
  • Should Priority Queues take a key and value, or just a value?

There is also the issue of common Priority Queue extensions. The venerable Dijkstra’s Algorithm generally uses a Priority Queue that supports the decrease key operation. For many structures, this operation is a total non-starter. For instance an unmodified binary heap requires a linear search to perform this operation. Many other heaps are similar. Off the top of my head, there are two ways to provide decrease key: have an internal HashMap from Keys to internal references, or return a token upon insertion to be used in calls to decrease-key. These tokens would have to have some heap allocated shared state with the collection to be safe and stay up to date. These approaches would have very different results on any trait that tried to capture this operation. Thoughts?

On Adaptive Structures and Mutability

Rust’s mutability system introduces some unique challenges for interface designers. On one hand, it makes sense that an operation like “contains” should work even if you don’t have mutable access to the structure. On the other hand, adaptive structures like splay trees have to mutate on search in order to be correct. There are several solutions to this:

  • Adaptive structures cannot implement the traits (dumb)
  • Adaptive structures get their own traits (dumb?)
  • More _mut combinatoric explosions (dumb?)
  • Adaptive structures use internal mutability to satisfy these constraints

I also have some problems with mutability as it currently seems to necessitate the duplication of the (relatively) complex logic of collections. One suggestion I’ve presented elsewhere has been to provide a mechanism for generic mutability, in the same vein as generic lifetimes.

On Granularity and Coupling

Are there guidelines for when to break operations into different traits? Should traits be strictly orthogonal (i.e. a Set isn’t required to follow the implicit len() contract I discussed before), or should their be contracts between related ones? When should traits depend on others? For instance a splay tree can logically implement MutableMap except for the Map portions, but it can’t implement MutableMap because of them.

Higher-Kinded Types

Obligatory reference to HKT being relevant to design issues.

Conclusion

Anyway, this is just the stuff that’s at the forefront of my mind on the issue. I’m sure there’s much more to discuss, and that I’ve probably gotten some things about Rust completely wrong (as always). Please provide any thoughts you have on the matter! Let’s make a great standard interface for Rust Collections!

9 Likes

Are the Collection and Mutable traits in final form? If we compare Java’s Collection interface to Collection + Mutable in Rust, it seems that we are missing some operations. Since Collection in Rust is supposed to “represent the abstract idea of a container.”, it seems that you should always be able to check whether the container contains a given element. Also, for Mutable collections, you should always be able to insert something into the container and remove something from the container. I would think Collection and Mutable would look like this:

pub trait Collection<T> {
    fn size(&self) -> uint;

    fn is_empty(&self) -> bool { ... }

    fn contains(&self, value: &T) -> bool;
}

(I also changed len to size, since it doesn’t make sense to me to talk about the “length” of an arbitrary collection)

pub trait Mutable<T>: Collection<T> {
    fn clear(&mut self);

    fn insert(&mut self, value: T);

    fn remove(&mut self, value: &T);
}

I would also expect that we would be able to check whether a Collection<T> contains_all the elements inside any other Collection<T>, and that for a Mutable<T> you can insert_all the elements from another Collection<T> and remove_all the elements that are contained another Collection<T>, but that doesn’t seem possible currently.

I think this is unfortunately not realistic, since we may want different collections to return different values upon insertion or removal. But could these operations be present in some other Trait so they can be standardized across collections?

1 Like

Java Collection is a bit weird because it’s only direct-element storage. Maps aren’t Collections in Java, but they are in Rust (for now). I prefer the Rust perspective on the matter at the moment, though there’s definitely something special about key-value stores compared to value stores.

I’m unclear on the value of a perfectly generic insert/remove/contains, especially since remove and contains will be horribly inefficient for many structures (priority queues and list are a notable example). Do we want to force structures to support “bad” methods? I guess “search + delete, I don’t care about the performance consequences” is a fairly common pattern… hmm…

insert_all is sort of covered by Extendable, although it doesn’t seem like there’s a generic way to demand a move_iter. Perfectly generic remove_all has similar problems to remove. Though for some reason it make more sense to me (Subtractable?).

It would make more sense for every Collection to support a larger amount of methods, with the expectation that not all of these methods would be appropriate for every collection implementer, but methods that would be possible for every collection to support.

Can we assemble a list of concrete collections that probably should be in the collections crate? It may be easier to design the interfaces once we have a better idea of everything that needs to be supported.

Can some forum admin make a wiki post here so we can brainstorm this? @cmr?

@nightpool what is the benefit of wiki post, exactly?

Mainly? It seemed like a good way to test out the feature. Wiki posts are editable by any logged in user (above trust level 0?) We can throw revisions back and forth too.

Ok, it’s worth trying!

Java collections library is not an example of design worth copying. It is weird and inconsistent; that Map is not a Collection was already mentioned, and that’s only one of its problems. It also does not distinguish between mutable and immutable collections.

I think that Scala collection library is much better; it has clean structure and separation between different kinds of collections. Even if exact hierarchy may be not feasible for Rust to have, it is a nice source of ideas.

2 Likes

##Brainstorming:

Collections for libcollections

  • HashSet, HashMap
  • TreeSet, TreeMap
  • BitSet
  • EnumSet (is this just a BitSet on the inside?)
  • Vector
  • Ring Buffer
  • BinaryHeap, PairingHeap
  • some kind of Stack? there’s nothing like this in libcollections right now.
  • Fixed size arrays, which are a language item

This is just a random grab bag of things out of the current libcollections, and the Java and Scala libraries. Feel free to add things!

You're probably right, it was a gut reaction I had late at night. However I don't think it needs to be coupled to the entirely unobjectionable len and clear operations, that can be sanely and efficiently implemented by just about anything. Should we maybe break up the standard collection interface into lots of tiny bits, and then have some super-nice aggregate Collection trait in the same vein as Num? It's sort of the exact opposite of the Java architecture. Instead of every structure being a collection, only the most robust structures are collections. One can then be viewed as a Stack, Collection, or Stack + Collection.

This goes back to my uncertainty on how granular we want the traits to be.

Do we want both a DList (appropriate for a Deque), and SList (appropriate for a Queue or Stack)?

I would also nominate PriorityQueue to be renamed as BinaryHeap so PriorityQueue can be the trait it implements.

1 Like

I’m thinking you’re right. Java takes a very set-like view of what a collection is, and I definitely don’t think it’s productive to discuss whether a Map philosophically is or is not a collection. Scala’s collections library seems to be an excellent place to start. Even better would be to survey various collection libraries, making a list of all the structures we might want in libcollections, and then trying to organize them with traits. Are you aware of any other good examples to look at? Boost and the C++ standard library seem to be worth looking at. Also, Google’s Guava library for Java.

No, I’m a Java/Scala programmer, so I’m not well aware about libraries in other languages. Apart from these I worked with C/C++ (a long time ago and not for long), Delphi (even earlier than with C), Go, Haskell, D, JS and now Rust, and unfortunately I hadn’t seen more convenient and well-thought collections library than the one in Scala.

Guava is not really a collections library; sure, it does contain some very useful implementations of standard Java collection interfaces (I used them all the time in Java projects), and it adds a couple of new interfaces, e.g. multimap/multiset, bi-map, table, but still it is just an extension of standard Java collections library.

Feel free to add things to my post everyone! Its a wiki, so every logged in user should be able to edit it. We'll use that one as a baseline for implementations, then we'll brainstorm traits, ok?

Add some of those to the wiki post, ok? I took a survey from my memory of the Java collections library and what rust had, but guava has a ton of useful collections/collection implementations.

Hmm, just noticed that the PriorityQueue in std is a max-heap. Is that what we want? It means insert-all, pop-all outputs in reverse-sorted order (at least wrt the std sort function). That seems counter-intuitive to me. I’d think you’d want the “natural” ordering if nothing else has been specified, no?

If we had a PriorityQueue trait and then MinHeap and MaxHeap be impls of this then that wouldn’t be a problem.

But in general max heaps are useful for e.g. job scheduling. I don’t see a good reason to prefer one over the other by default.

Maybe we could have a Sorted trait that PriorityQueues and SortedMap/SortedSets provide? It could have three static constructor functions:

  • from_fn
  • natural_ord
  • rev_ord

Where from_fn takes a generic comparator for the type of interest, and natural/rev_ord require that the type implements Ord? Default I guess could pick natural_ord?

I’ve been working on hacking out nicer set of traits for libcollections, at least as a starting point, but I’ve gotten kind of stuck on iterator-based interfaces. It seems natural to enforce that some kinds of collections should provide some kinds of iterators, so that given a generic instance, you can request an iterator. A lot of default methods could also be provided for plugging together collections if we had this. But it seems impossible to write a good trait for this, at least when the iterator is by value. It seems doable using trait objects, but I’m guessing people wouldn’t be too happy with the structure having to maintain an internal collection of all iterators it has, or heap-allocating the iterators.

Does anyone have any thoughts on this?