Edit: I’ve started a proposal document that everyone is free to comment on, and submit changes to:
##main file is here ##history is here
#Original Post: It has come to my attention that some of the Rust collections do not expose their primary interface through any trait.
Offenders as of this post include:
- PriorityQueue
- EnumSet
- BTree
- LruCache
After raising the issue on #rust-internals, I was informed this is partially intentional while everything is figured out. I’m starting this thread as a place to discuss thoughts and concerns about traits provided by libcollections.
Why Traits?
I think it’s important to first establish why exactly we want standard collection traits, so we’re all on the same page about goals and priorities.
Ease of use
If a structure implements a standard trait, then it’s usually much easier to pick up and use than one with a random proprietary one. This holds true for both explicit traits and general conventions between similar but distinct traits.
Easy Replacement
Collections are a rare example of a perfectly replaceable structure. Most code, at least at the pure implementation level, shouldn’t care if the stack it is using is backed by a linked list or an array. Consequently, it should be easy to replace one with the other when design or performance issues demand it.
Testing and Guarantees
One frustrating detail of the current libcollection codebase, and consequently any third party data structure codebase, is that there’s a lot of duplication of testing efforts. Every implementer of the Map interface does basically the same thing from an external perspective, and the fact that most of the tests are basically the same reflects this.
Having good traits with well defined interactions allows us to write common tests once and reuse them elsewhere. Exporting these tests would also be a great benefit to third-party developers as it reduces their development times, and provides them with an official means to verify that their designs conform to the contracts of the traits they implement.
The Current Traits
- Collection
- Deque
- Map
- MutableMap
- Set
- MutableSet
There’s also currently an RFC to add MultiSet traits, and a request up for MultiMap as well.
The ones that exist are pretty minimal, and what they provide I find to be pretty unobjectionable. They cleanly boil down the essence of these collections well. However, the most obvious problem with the current trait situation is simply an absence of traits.
Off the top of my head, the following traits are probably a good idea to have in one form or another:
- Queue
- Stack
- PriorityQueue
- List (random access by index)
- Sorted Variants (methods like find_with are cropping up in a few places)
Future of Traits, and Issues
More than anything, I’d like to see better documentation of how distinct traits should interact. For instance, Collections that implement Default should probably return an empty instance as their Default. Sets shouldn’t grow their len() if you insert the same element twice, but Deques should. These are obvious things to everyone, but making them explicit might be worthwhile. Particularly in the form of centralized tests.
Centralized benchmarks would also be handy for comparing performance characteristics of implementations.
On PriorityQueues
Priority Queues are an interesting problem. The basic interface is fairly easy. It’s just a queue. However the various extensions and usecases for them complicates this. For one, people often don’t have a struct that directly implements Ord, or it if does, it might not be the ordering they want (the most common case is that you simply want the reversed ordering).
- Should users be expected to wrap their structures in something that implements the right Ord? Or can they provide a custom “comparator” in the style of Java and C#?
- Should Priority Queues take a key and value, or just a value?
There is also the issue of common Priority Queue extensions. The venerable Dijkstra’s Algorithm generally uses a Priority Queue that supports the decrease key operation. For many structures, this operation is a total non-starter. For instance an unmodified binary heap requires a linear search to perform this operation. Many other heaps are similar. Off the top of my head, there are two ways to provide decrease key: have an internal HashMap from Keys to internal references, or return a token upon insertion to be used in calls to decrease-key. These tokens would have to have some heap allocated shared state with the collection to be safe and stay up to date. These approaches would have very different results on any trait that tried to capture this operation. Thoughts?
On Adaptive Structures and Mutability
Rust’s mutability system introduces some unique challenges for interface designers. On one hand, it makes sense that an operation like “contains” should work even if you don’t have mutable access to the structure. On the other hand, adaptive structures like splay trees have to mutate on search in order to be correct. There are several solutions to this:
- Adaptive structures cannot implement the traits (dumb)
- Adaptive structures get their own traits (dumb?)
- More _mut combinatoric explosions (dumb?)
- Adaptive structures use internal mutability to satisfy these constraints
I also have some problems with mutability as it currently seems to necessitate the duplication of the (relatively) complex logic of collections. One suggestion I’ve presented elsewhere has been to provide a mechanism for generic mutability, in the same vein as generic lifetimes.
On Granularity and Coupling
Are there guidelines for when to break operations into different traits? Should traits be strictly orthogonal (i.e. a Set isn’t required to follow the implicit len() contract I discussed before), or should their be contracts between related ones? When should traits depend on others? For instance a splay tree can logically implement MutableMap except for the Map portions, but it can’t implement MutableMap because of them.
Higher-Kinded Types
Obligatory reference to HKT being relevant to design issues.
Conclusion
Anyway, this is just the stuff that’s at the forefront of my mind on the issue. I’m sure there’s much more to discuss, and that I’ve probably gotten some things about Rust completely wrong (as always). Please provide any thoughts you have on the matter! Let’s make a great standard interface for Rust Collections!