Traits for common data structures

That sounds quite specialized and not the sort of thing that std would contain. And as stated above a proposal like this should come with some sort of trait definition.

2 Likes

I agree that my ideas are in too early a stage to be a real proposal, and also that they're probably better suited to a third-party crate. I was simply offering a counterpoint to @H2CO3's assertion:

There exist other useful abstractions over large classes of collection types than simply Iterator.

2 Likes

There's Iterator, Extend and Clone, as well as the various From/Into traits for converting collection types. (These all basically assume a collection has no structure or that it is iterable.)

Could you give an example of another operation that is shared by large classes of collections?

The primary one that I can think of is iterating over a specified range of values, for a somewhat nebulous definition of "range".

I guess that what I'm really interested in, however, is a modular multi-index collection, where each index might have different storage types and query capabilities. To be generic over possible index stores, we'd need some common idea of "retrieve this subset", where the actual description of that subset is determined by the collection type.

So, maybe something like this (off the top of my head):

trait Queryable<'a, Q> {
    type Item: 'a;
    type Iter: Iterator<Item = Self::Item>;
    fn query(&'a self, which: Q)->Self::Iter;
}

impl Queryable<usize> for Vec<T> {...}
impl Queryable<impl RangeBounds<usize>> for Vec<T> {...}
impl Queryable<impl Borrow<K>> for HashMap<K,V> {...}
impl Queryable<impl Borrow<K>> for BTreeMap<K,V> {...}
impl Queryable<impl RangeBounds<K>> for BTreeMap<K,V> {...}

impl Queryable<SomeQueryType> for NicheGeospatialIndex<...> {}
// etc...
3 Likes

That sounds like a very cool crate idea.

1 Like

That's very similar to variations of Index that return owned outputs.

A while ago I wrote a library that used lots of HashMaps. To see if a BTreeMap would perform better, I used a type alias everywhere:

type Map<K, V> = HashMap<K, V>;

Then replacing it with a BTreeMap is very simple. However, this is only possible because HashMap and BTreeMap have such similar APIs. Also, it doesn't lend itself well to benchmarking, because you can't benchmark your code using HashMap and BTreeMap at once.

I think Map and Set traits would be useful (I could have written the Map trait myself, but I was too lazy). I'd also like Stack, Insert and Drain traits (for Vec, VecDeque, LinkedList, ArrayVec, SmallVec, etc.). I don't think they have to be in the standard library, as long as they're in a crate that is used by all the popular collections.

Most other common operations on collections are already covered by traits (Extend, FromIterator, IntoIterator, Default, Index, AsRef).

3 Likes

My hypothesis is that, in the context of Rust, there do not exist any broadly useful collection traits that cover a wide range of functionality. Just speaking from personal experience, I can't really think of any time in the past several years where I found myself wishing that such a thing existed.

However, I would love to see that hypothesis tested. Starting a crate that tries to define collection traits seems like a good idea to see whether it can get ecosystem buy-in.

15 Likes

-> List

-> ArrayList/Arrays

-> List + MemoryReservable.

I think exponential combinatorics can be reduced with the intersection of many traits.

Further, I find a simple super trait hierarchy of collections would be a nice addition to stdlib in combination with the min_specialization feature.

Yes, it's likely that there are some missing parts but you can't anyway pull out maximum performance with the occurrence of private modifiers in structures unless modifying them over transmutation.

The java collection interface hierarchy is decent prior art.

E.g. Collection would roughly correspond to IntoIterator + Extend + Eq + Hash + IntoParallelIterator + DrainFilterable + ExactSize.

Map types don't implement Collection directly but have a method entrySet which returns a view which does and operates on entries, so maps can be treated as collections too if needed.

Additionally there are subinterfaces for ordered Map types which allow for floor/ceil/next/previous operations more ergonomically than Rust's BTreeMap::range.

On the rust side there's an unstable implementation for linked_list::Cursor which might be generalizable to also allow bidirectional traversal and modification of BTreeMap/Set in O(1) steps rather than half-open range, clone-key, drop-range, modify, reacquire-range dance with O(log n) that the current API requires. Now that we're moviong closer to stable GATs it should be more ergonomic to create mutating views into collections.

3 Likes

eclectic (+ link to docs.rs/eclectic) was a crate for this, but it hasn't been touched in four years. So it's pretty old as prior work, and past the proof of concept stage.

I'd advise going for small crates if this is attempted again. It's an uphill battle to get others to depend on one's crate with interfaces - split it into one interface per crate (like, one for maps).

1 Like

Other prior art: Rust 0.11.0 had several traits for different types of collections: collections - Rust

These were removed in RFC 235 because:

  • Most important: the traits do not provide iterator methods like iter . It is not possible to do so in a clean way without higher-kinded types, as the RFC explains in more detail below.
  • The split between mutable and immutable traits is not well-motivated by any of the existing collections.
  • The methods defined in these traits are somewhat anemic compared to the suite of methods provided on the concrete collections that implement them.
4 Likes

Perhaps decent, but by no means very good. Many Java Collection implementations are notorious for having many methods that just throw UnsupportedOperationException, something that's (by necessity) explicitly permitted by Collection's contract, but definitely not something we want in Rust.

Another obvious model for data structures is, of course, C++'s Container framework. C++ concepts, until very recently, weren't actual entities reified in and enforced by the language, but just requirements described in English, and in practice many or most concrete data structures in C++ do not actually 100% faithfully fulfil those requirements. And now (since C++20) that we finally have reified concepts in the language, it is telling that containers are not actually conceptified as of now.

The difficulty of finding a covering set of traits for a large number of different types of data structures, that is both (nearly) minimal and maps to an intuitive set of categories, seems to be a difficult problem indeed due to the large variety of data structures in existence. The member function table of C++ containers (note all the grey cells in the table) demonstrates this quite well.

2 Likes

Sure, I wasn't suggesting to just adopt its approach whole-sale. It just shows that there is a large set of common API surface that can be extracted.

E.g. there's not much reason to use inheritance when we can instead define trait aliases combining several orthogonal traits for commonly used combinations. Collections which do not fit neatly into some predefined alias could still implement a subset of the traits.

1 Like

What existing problems have already been solved in Rust by using container traits? There must be some.

I know I've been responsible for some very wonky graph traits in petgraph, and they rely so much on iterators that they only exist in a quite limited form. Not a great place to look. To a degree they solve the problem of making algorithms generic across multiple representations, but it's a bit limited - example: only shared reference access through iterators. And it looks a lot like a bunch of different trait for different aspects, no single "Graph" trait.

I'd argue that we don't need a single "Graph" trait, just as we don't need a "Collection" trait. A bunch of different traits for different aspects is more flexible and useful.

1 Like

What are the basic features ppl need from collections?

  1. Something they can take key-value pairs out of?
  2. Something they can insert key-value pairs into?
  3. Something they can iterate?
  4. Others?
  5. Should Vec provide (1)? Should HashMap provide (3)?
  6. What if one wants both Vec and HashMap to have both (1) and (3), and iterate both as key-value pairs? (e.g. for compatibility with Lua)
  7. Should we have a collections WG?

Another reasonable source of inspiration could be Swift's Collection Protocol & its hierarchy.

I think the primary things Swift's Collection itself has are (the equivalents of) to_iter(&self) -> impl Iterator and Index<Self::Index>.
The fact that its indices are an associated type is important, because indexing is generally O(1) unless you are explicitly working with a Lazy collection, and e.g: Dictionary.Values (hash_map::Values) implements Collection.

The primary problems with adapting it would probably be due to Swift implicitly handling clones and lack of lifetimes, so it would probably require lifetime GATs or excessive cloning.


I particularly like the RangeReplaceableCollection protocol, it only adds the requirement of replaceSubrange(_:with:), but has default implementations for append, insert, remove, and their variants. Do note that I have read complaints on the Swift forums that people have wished that replaceSubrange(_:with:) returned the new range of indices.

1 Like

Actually, I think breaking up the collection traits like this is probably going to be the best available method. As @H2CO3 pointed out:

So instead of trying to define one trait to rule them all, define new small and orthogonal traits that are easily composed with one another. Look at alga as a guide; they've developed a moderately large set of traits that describe algebraic structures, as well as common compositions of traits. The result is that rust's type system can be leveraged to do a certain amount of compile-time checking of different types based on the traits that they implement. I can see something similar being done for an abstract_collections crate.

Note: don't reinvent the wheel here; if the standard library, or some other well-known crate already has the right trait in it, reuse and expose that crate's trait in your crate. That makes writing extension traits much easier.

3 Likes

Why Eq and Hash, why not Ord? How would you efficiently implement entrySet for HashMap or HashSet? The very reason they don't implement Ord and Hash is that they would need to be sorted in order to do anything deterministic/useful. How would ExactSize work with lazy and/or probabilistic collections such as Bloom filters or HyperLogLogs, that either produce elements on-demand or have only an approximate representation of their data? How would IntoParallelIterator work with collections that inherently can't be parallelized?

All of this seems to stem from a naïve, ad-hoc view of collections and data structures, with no real need and no real-world use cases backing it.

I'm a long-time Swift user and my personal experience with Swift's Collection hierarchy is that it's way more annoying than useful. I basically never needed, wanted, or could abstract over a Collection, and the way I have to fish out individual methods from the documentation because they are defined at different depths in the hierarchy is particularly unhelpful. Admittedly, this might partly be a documentation issue, but sometimes, it considerably slows down development, and I never felt it solved an actual problem.

3 Likes