Traversable Traits

#Problem

Currently, there is no way in Rust to specify a trait to the effect of “this has an iterator”. This makes it impossible to generically acquire an iterator, and consequently makes it impossible to generically glue together collections. If and when associated types land, this shouldn’t be an issue. But that’s post 1.0, and for now, we have nothing.

But, it is possible to generically tell something to iterate over itself using a provided function. Enter: the Traversable traits (shout outs to Scala for the name).

#Traversable

The traversable traits are a very simple shim. They each provide a single method for “give me some code to run on all of my elements”. They can be implemented today by just dropping this code in iter.rs:

pub trait Traversable <T> {
    fn foreach(&self, f: |&T| -> bool);
}
pub trait MutTraversable <T> {
    fn foreach_mut(&mut self, f: |&mut T| -> bool);
}
pub trait MoveTraversable <T> {
    fn foreach_move(self, f: |T| -> bool);
}

(the bool is to signal that you wish to break early)

And here’s a full implementation for a Vec:

impl <T> Traversable<T> for Vec<T> {
    fn foreach(&self, f: |&T| -> bool) {
        for x in self.iter() {
            if f(x) { break; }
        }
    }
}

impl <T> MutTraversable<T> for Vec<T> {
    fn foreach_mut(&mut self, f: |&mut T| -> bool) {
        for x in self.mut_iter() {
            if f(x) { break; }
        }
    }
}

impl <T> MoveTraversable<T> for Vec<T> {
    fn foreach_move(self, f: |T| -> bool) {
        for x in self.move_iter() {
            if f(x) { break; }
        }
    }
}

Super easy. Now we can write methods for collections like insert_all, which takes a collection, and inserts all of its contents. In fact, we can take anything that implements this interface. Like this:

fn insert_all <C: MoveTraversable<T>> (&mut self, other:C){
    other.foreach_move(|x|{
        self.insert(x); false
    });
}

Once RFC 24 (which is a 1.0 blocker) lands, we should also be able to drop this into iter.rs, and have all iterators be traversable:

impl <'a, T, I:Iterator<&'a T> + Clone> Traversable <T> for I {
    fn foreach(&self, f: |&T| -> bool) {
        for x in self.clone() {
            if f(x) { break; }
        }
    }
}

impl <'a, T, I:Iterator<&'a mut T> + Clone> MutTraversable <T> for I {
    fn foreach_mut(&mut self, f: |&mut T| -> bool) {
        for x in self.clone() {
            if f(x) { break; }
        }
    }
}

impl <T, I:Iterator<T>> MoveTraversable <T> for I {
    fn foreach_move(mut self, f: |T| -> bool) {
        for x in self {
            if f(x) { break; }
        }
    }
}

Then we’ll have generic methods which take either a collection or an iterator, and process their contents.

Here’s all of this working right now in the playpen. Currently without RFC 24, you can only impl all iterators, or individual concrete structures. In this example I’ve opted to impl iterators since that’s way cooler.

Bonus Round

Once associated types land, and we can implement Iterable traits, we can make Iterable inherit Traversable, and provide default Traversable impls for Iterables. Then we can rip out all the individual Traversable implementations, and kindly direct people to use the more flexible and effecient Iterables, without breaking any code! APIs can individually be migrated to use Iterable where desirable. Pretty much everything that is Traversable should be Iterable anyway.

Do we want this?

So it’s like 3am right now, so I’ve probably overlooked something serious, but this seems super easy and useful to implement (literally the entire implementation is in this post). So does everyone agree that this is worthwhile? Or is this an awful idea? Anything you’d like to see included/removed/changed?

2 Likes

Is this not a reintroduction of internal iterators, which is what Rust used to use as the basis for its for loops?

For further discussion on internal versus external iterators, I recommend the write-up here: [rust-dev] The future of iterators in Rust

(Not that I’m immediately opposed to the idea. Though I am somewhat opposed to just plugging something in with the expectation that it will be ripped out after associated items are put in.)

Ah, I wasn't aware that Rust previously had this kind of thing, but it was ripped out. Still, it's not unprecedented to have both. As alluded to, Scala has Traversable, as well as Iterable interfaces, and the latter inherits the former.

The language feature wouldn't necessarily be ripped out after associated items either, I simply note that it would be very easy to transition almost anything that really wants to use Iterable from Traversable.

Are the following details in the archived email you link to still relevant (at least, in the form I propose)? I'm not familiar with this level of internal representation detail.

This also has borrow checking implications, because the compiler can't assume those statements actually cause flow control to leave the loop immediately

and

Unlike internal iterators, external iterators only run one iteration at a time, so a for loop designed for them would always be able to succeed with break and return.

I'm on the fence about supporting iterator-style adapters for traversables. On the one hand, it's clearly possible, and potentially desirable. On the other hand, it's probably going to be messier than iterators, as discussed by the email, and I sort of like traversable as a simple construct for "I would just like to traverse the contents".

I would definitely like to see both internal and external iteration supported; (and I was disappointed to see ‘do’ notation dropped)

Internal iteration is interesting to me for data-parallel. par_foreach(), par_map() . If you write in this style with pure-closures then its’ easier to parallelise. This is finer-grain than concurrency(and solving different problems). But you need it in a state where you can change back & forth, since you’d tweak it for the granularity and costs that you have.

we did a lot of this manually on consoles even in the single threaded case (deep pipelines and simd). The mindset for optimising even for a single-thread was to find parallelism, and either purity or independence between invocations. Specialized versions of collections or iterators could handle padding for groups of 4 invocations or whoever. this would also suit GPGPU code.

The external iterators are great, sure. the chaining ability is great. But internal has other appeal & uses.

Apple’s Obj-C has an optimisation called “Fast Enumeration” which is like an external iterator, except the iterator returns arrays of values instead of single values. So the iterator function is called fewer times, and the ‘for’ can zoom through the returned arrays with inline code. This seems like a worthwhile optimisation, but I guess it may complicate the APIs?

https://www.mikeash.com/pyblog/friday-qa-2010-04-16-implementing-fast-enumeration.html

This would allow a different kind of iteration than the one we have today: An iterator that passes a reference to its internals, for example a text lines iterator that only has a single buffer and shares a string slice on each iteration. This is now allowed because we guarantee that the lives of the iterator elements don’t overlap – That would be a great addition!

Why three traits though? That’s like spilling the type parameterization into the structure of the traits.

trait Traversable<A> {
    fn advance(self, |A| -> bool) -> bool;
}

this should be enough – you can implement

  • Traversable<&A> for &Vec<A> and for &[A]
  • Traversable<&str> for &mut BufferedReader and so on

Because we want to have structures that have all three kinds of internal iteration, and the names would conflict if they just implemented the same trait 3 times with different bounds. Also the self argument is different in each one. I don't think you can genericize one function in a way that that'd work?

This strikes me as bad, because it suggests that the structure stores the iterator on itself.

Most types today already have different kinds of iterators, not just &, &mut and by-value, but also range iterators in the TreeMap etc. Either way there will be more traversals possible than available method names in the trait.

That said, I think you have a point, and I think that my proposal wouldn't work very well with &mut X as the Self type.

After discussing this with the devs, this proposal is a pretty strong No.

I suggest that anyone who believes this to be truly useful outside of the fact that “it would be handy to have right now or for 1.0”, provide a rigorous argument for the value of internal iteration.

Personally, I just wanted it to shim in generic iteration, so I’ve got nothing left to say on the matter.

provide a rigorous argument for the value of internal iteration.

it's more suited to parallelism and you might not parallelise everything simultaneously. you write it in a form where it can be parallelized - then you profile it to determine the correct granularity and type of parallelising. (threading? SIMD? GPGPU?). then swap .par_*() iterators as appropriate

I don't see why you'd want to impose "everything should be internal" or "everything should be external". Surely the language should provide both tools, and you choose whats appropriate.

They handle different cases.

1 Like

Does the println!() with a {} matcher cover the ‘toString()’ usecase?

To be more concrete, can we get away with not defining a method that gives an easy-to-visualize view into the collection because the println() macro will display these different collections in a uniform way to the user (ie me, when I am debugging).

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.