Libcollections traits: a standard interface for all collections

Ultimately, we want to provide “associated types” as part of a trait. Something like:

trait Set<A> {
    type Items<'a>: Iterator<&'a A>;
    type MoveItems: Iterator<A>;
    fn iter(&self) -> Items;
    fn move_iter(self) -> MoveItems;
    ...
}

Then each set implementation could provide its own iterator types, and it would all be unboxed.

Unfortunately, trait objects are a problematic workaround not just because of allocation concerns, but also because you can’t call generic methods on them, which rule out a lot of useful Iterator functionality.

So, sadly, I think iterators on these traits will probably need to await a language feature like associated types (there are a couple of other possibilities.) I’m not sure when the feature is likely to happen, but we’re actively thinking about it.

Oh man, the notion of associated types is really cool! (RFC for those interested). I know I’ve wanted something like that tons of times before in other languages.

I think for now I’ll settle for for_each style methods, with the hope that associate types will save the day be 2.0.

1 Like

Okay so I’ve skimmed through a bunch of collection libraries, and here’s my first draft of a new set of traits.

I had to break up some things into separate trait modifiers (searchable, sorted) to capture all the possibilities. In particular, searchable is necessary to determine if an element is in a container, because e.g. that can’t be done on a perfectly generic Vec.

Purposefully excluded multiset/map for laziness/repetitiveness.

Actual names of things are completely arbitrary, some are extra-verbose just to get the idea across without having to write notes.

Probably riddled with genuine mistakes.

Set trait needs expanding: new implementors: EnumSet, BitFlags

fn intersection(&self, other: &Self) -> Self;
fn intersects(&self, other: &Self) -> bool;
fn union(&self, other: &Self) -> Self;
fn difference(&self, other: &Self) -> Self;
fn symmetric_difference(&self, other: &Self) -> Self;

I think SortedMap should be called SortMap because it’s not required to be sorted beforehand, but it’s “Sortable”.

That's an interesting perspective on the matter. I had intended SortedMap as "I internally maintain an ordering of the elements, and therefore it is relatively cheap to perform operations associated with ordering". So your idea is to e.g. allow HashMap to impl a very inefficient SortMap? Or are there some structures you know of that support fast order-based operations that don't maintain an internal sorting?

I thought the only requirement was for the values to be PartialOrd so that the map could be Sortable. But it makes more sense to implement it when the worst-case time of min/max is better than O(n).

A trie can support fast order-based algorithms.

Isn't a trie to trees what non-comparison sort is to comparison sort?

Anyway, I've taken in some notes from people, and corrected a bunch of errors, so:

Here's my v0.2 proposal

I've rustdoc'd almost everything to better explain why some things are what they are. Also gave some examples of relevant concrete collections to some types/methods. I also try to note places that are dubious or uncertain. Gave reasonable default implementations where possible. They might be wrong.

Docs become a bit less thorough as it goes on, as e.g. the Map interfaces are basically the same as the Container interfaces, but, you know, with keys and values. I also don't feel like I need to explain the "Stack" trait, because, seriously, it's a stack.

1 Like

I’ve moved development of my proposal to a branch on my repo, so that people can track changes, comment on lines, and submit changes/issues. I’ll also get bloody notifications if people do comment.

##main file is here ##history is here

(I’ve added some stuff since 0.2, as the history suggests)

Gankro, thanks for taking the reins on this - I was just about to suggest somebody do so.

A general comment: don’t get sucked in by the allure of categorizing things for the sake of categorization. We’ve over-traitified things in the past and then painfully reversed those decisions. How many of these traits are people going to actually use generically (that’s a rhetorical question)?

Comments on your v0.2 proposal:

  • Mutable is a very generic name. MutableCollection might be better.
  • This introduces the term ‘container’, committing to neither ‘collections’ nor ‘container’ but both.
  • Furthermore, it seems to use the terms interchangeably…
  • Option, Result, Cell, Rc, etc. are also ‘containers’ but are pretty unrelated to what’s going on here.
  • Unstructured vs. UnstructuredContainer.
  • Adding back internal iterators is not going to fly.
  • This seems to introduce many methods that don’t currently exist, so it’s not clear to me which parts of this proposal are unification and which parts are just new features.

I’d also recommend Scala as the best thing to imitate. Relatedly, even though we don’t have too many persistent/immutable collections at the moment I’d like to see traits for them, or at least the names designed in away so they can be added later.

Thanks for the feedback! I’m definitely not focusing too hard on names right now, since I consider them less of a substantive issue than the overall categorization. Some methods names are more verbose than I expect they should be right now, to just make everything as clear as possible.

I agree that Container and Collection are really close names, and ambiguous if they were accepted as is. More than happy to change Mutable to MutableCollection, I just assumed you guys had picked that name for a reason and left it as is. UnstructuredContainer’s also fine with me.

Internal iterators… I don’t love adding them. But without them there’s just no way to connect up collections generically. I can axe that stuff, but then we lose a lot of default and generic implementations. The *_all methods can be changed to take Iterators, at least. Should we really wait for HKT/Associated-types/whatever to have the ability to generically link together collections like that? If you guys say so, I’ll leave it, but I think that it’ll suck in the interim.

I hear you on breaking things down into too small parts, but part of the issue is that some operations can only be made available in certain contexts. For instance if I put insert in the same trait as remove, then e.g. a Vec can’t support insert unless its contents support Eq, because only a Vec<T:Eq> can possibly support remove. Forcing someone to support Eq for something they just want to push/pop with seems pretty awful.

All the Searchable*, Sortable*, and Unstructured* variants have to exist as far as I can tell because of this. I have avoided breaking some of the more concrete types like Deque/Queue/Stack into Mutable/Immutable variants because they’re basically stupid immutably. Are there any traits you think should be tossed?

Edit, Just to be clear: Some of these traits aren’t even necessarily intended to be used generically, they literally can’t be put together without reducing the usefulness of the concrete type due to the way trait constraints are declared.

I am a bit dubious on the Resizable/Mutable split I currently have for Lists, though.

Should I be trying to make unifications/additions more clear? Almost everything is just a unification, though. Sorted* and Stack/Queue/PriorityQueue really just are blatant omissions.

I'm using the Scala Collections and Java Collections as my primary basis I've never used Scala before, though. Unfortunately both Scala and Java have some things in them that let you make more assumptions and have cleaner/simpler collection hierachies. We've already got a solid split between Mutable and Immutable in the traits, though.

Feel free to submit a Pull Request to my repo if you have any concrete ideas!

The current immutable traits currently just contain reads. Persistent data structures have writes with signatures that are incompatible with their mutable equivalents. e.g.

fn add (P<Self>, T) -> P<Self>;

P being some smart pointer.

I have officially capitulated on the matter of providing some way to generically iterate collections. The core devs clearly don’t want internal iteration, unless there’s an extraordinarily good reason to include it. “It would be handy to have it right now” is not sufficient.

They of course want generic iteration eventually, but are more than willing to wait past 1.0 to get it right.

Specification updated accordingly.

[quote=“Gankro, post:34, topic:158”] The core devs clearly don’t want it, [/quote] This seems like a misunderstanding(?) It should be wanted, but it’s not possible to express in current Rust.

Oh and fyi, the old iteration protocol used fn for_each(&self, |&T| -> bool) -> bool the second bool return allows composing such loops well. That doesn’t mean that I’m in favor of going backwards in time by bringing back the protocol. Sidenote of the second degree: the .all iterator method still implements this protocol.

Sorry, I wrote the post ambiguously. They of course want generic iteration, just not internal iteration, which is the only possibility today (and for a non-trivial portion of the foreseeable future). Edited it for clarity.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.