.iter() could be skipped when performing a combinator on Vector

let numbers = vec![1, 2, 3, 4, 5];
let even_numbers: Vec<u32> = numbers.iter()
    .filter(|&n| n % 2 == 0)
    .collect();

Since we already know that numbers is a collection, we can skip calling .iter() while performing such intermediate operation. Hence suggested idea is:

let numbers = vec![1, 2, 3, 4, 5];
let even_numbers: Vec<u32> = numbers
    .filter(|&n| n % 2 == 0)
    .collect();

This approach is already implemented by other languages (JavaScript, Scala, ...), therefore we can bring such functionality to Rust.

The challenge is getting the type system semantics for this correct; you need some set of type system rules (probably intermediated by traits) such that the compiler can desugar this the way you want it to.

For example, nums.iter().filter(…).copied().collect() is a nicer equivalent of:

Iterator::collect(Iterator::copied(Iterator::filter(
        <[u32]>::iter(<Vec<u32> as Deref>::deref(&nums)),
        |&n| n % 2 == 0,
    )));

To make this a useful proposal, you'd need to explain what the rules the compiler follows are to get from nums.filter(…) to Iterator::filter(<[u32]>::iter(<Vec<u32> as Deref>::deref(&nums)), …) - for current Rust, this is a combination of autoref and method call search that can be translated to a single unambiguous set of function calls by the compiler.

The reason why filter is implemented on the Iterator trait instead of the collection itself is that its implementation can be abtracted over any iterator. By doing so, we don't have to implement filter on all existing collection types.

Also, generaly collections have three methods to create an iterator into_iter, iter and iter_mut which explicitly state if we want to take the ownership of the values or borrow them mutably or not.

6 Likes

Why would that be the case?

  • a collection is not an iterator, as it doesn't know at which point the iteration currently is

    • this means you cannot have filter in your example be the filter on Iterator, it would have to be a different one.
  • why should iter be called, and not iter_mut, into_iter, drain or something else?

7 Likes

What does this have to do with unsafe code?

(You tagged this as "Unsafe Code Guidelines".)

1 Like

That was unintentional. Apology. Fixed

The trouble with the somevec.filter(..) syntax is that it's ambiguous whether we should insert .iter(), .iter_mut() or .into_iter() implictly. Maybe one could look at what the closure accepts, so that

somevec.filter(|&n| ..) becomes somevec.iter().filter(|&n| ..)

somevec.filter(|&mut n| ..) becomes somevec.iter_mut().filter(|&mut n| ..)

So in practice this could work just fine I guess?

If you want to prototype this, you could write a crate with an extension trait and publish it to crates.io. People would just need to use your trait and then they would have a filter method on Vec that work like this.

4 Likes

But I do think that some generic iter method which abstract over the three of them would be a nice change. Assuming this is possible of course..

That just raises the question: if we don't leave this as an "exercise for the reader", what would that look like, precisely?

2 Likes

What could, perhaps, make more sense would be to define filter and map directly on the collection, producing a collection of the same type. I.e. drop both the .iter() and .collect() calls. If you don't need the flexibility and explicitness of .into_iter(), .iter() and .iter_mut(), you probably don't need the full flexibility and laziness of iterators in the first place.

In that case the signatures of methods are easy to write:

pub trait Filter
where
    Self: Sized + IntoIterator + FromIterator<<Self as IntoIterator>::Item>,
{
    fn filter(self, f: impl FnMut(&Self::Item) -> bool) -> Self {
        self.into_iter().filter(f).collect()
    }
}

impl<T> Filter for T
where
    T: Sized + IntoIterator + FromIterator<<T as IntoIterator>::Item>,
{}

pub trait Map<R>
where
    Self: Sized + IntoIterator,
{
    type Output: FromIterator<R>;
    fn map(self, f: impl FnMut(Self::Item) -> R) -> Self::Output {
        self.into_iter().map(f).collect()
    }
}

impl<T, R> Map<R> for Vec<T> {
    type Output = Vec<R>;
}

impl<K, V, R> Map<(K, R)> for BTreeMap<K, V>
where
    K: Ord + Eq,
{
    type Output = BTreeMap<K, R>;
}

This can be useful when the API shape requires passing around vectors and maps, rather than iterators. Of course, this isn't particularly rusty, so I don't expect such extension traits to get into stdlib.

1 Like

Also important to note is that at least for vectors the filter function already exists as retain()

6 Likes

Many people would take that as an invitation to slap a series of map and filter calls on a collection, causing a bunch of intermediary collections to be created. Considered perfectly idiomatic, for example, in Scala. If this does not perform well, many conclude they need lazy collections.

5 Likes

.iter() makes clear both the borrowing behavior and the fact that the operation is lazy and doesn't allocate (until collected). Removing it would not only not be a convenience but would be actively harmful by obscuring important details about the nature of the operation.

4 Likes