Optimization for `std::vec::Drain` to directly access an indexed element

Missing Optimization for iterating through Drain

The following code is a little bit bad, but bear with me

pub fn get(mut v: Vec<i32>, i: usize) -> i32 {
    v.drain(..).nth(i).take().unwrap()
}

This is functionally identially to v[i] and also idential to

v.into_iter().nth(i).take().unwrap()

Where as the latter is successfully optimized to just an indexed access, the former still has a loop even with -C opt-level=3. Here's the disassembled code Compiler Explorer. .LBB1_3 will loop i times.

I have no idea how compiler optimization works and I am just getting started playing with compilers, but I feel like this is a optimization that is possible and wouldn't be too difficult.

Motivation

Originally I wanted to find a way to take ownership the n'th element from a vector and drop the other rest. My friend suggested using drain. It works as intended but I didn't know if it would be performant. I realised if I want to take out two elements, I will have to iterate through the iterator after taking out the first element, which might not be good if the two elements are far apart. But I think it's fine because if the elements need to be dropped (e.g. v is Vec<Vec<i32>>) the code needs to iterating through everything to drop everything eventually so I'm not paying for extra time. But for things that don't need to be dropped like i32, I wondered if the compiler could optimize it into just an indexed access.

Now I realised that I could have just used into_iter(). Oh well... but anyways I think it would be nice to have the same optimization on Drain as well.

Investigation Steps

I thought I would document how I tried to understand what the assemly code does. Just reading the assumbly code was a bit hard, so I wanted to step through it with a debugger. Also, I wanted to just compile the function without a main and call it with some custom data because I didn't want to give the compiler any more information to do better optimizations. Eventually I figured out that I could compile the rust code into an object file with

rustc -g -C opt-level=3 --crate-type=staticlib --emit=obj a.rs

And write some C code like this

#include <stdio.h>
#include <stdlib.h>

extern int get(long long *arr, long long index);

int main() {
    long long n = 100;
    int* numbers = malloc(n * sizeof(int));
    for (long long i = 0; i < n; i++) {
        numbers[i] = i+1;
    }

    long long arr[] = {n, numbers, n};
    long long index = n / 2;
    int result = get(arr, index);

    printf("Result: %d\n", result);
    return 0;
}

And then link them together with

clang -g run.c a.o -o executable /home/codespace/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd-22be60875a4ac8d7.so

I know this code is excutiatingly painful to read and I could have written the C code in rust. Sorry about that :smiley:

There are multiple aspects. First of all, it's not all the compiler doing optimizations, it also depends on the library code. vec::IntoIter has a bunch of custom and fine-tuned Iterator impls that Drain doesn't have since it's less commonly used, it uses the Iterator default impls instead. So this could be easily fixed by implementing advance_by and nth on Drain.

Then there's the issue that Drain does more work: while IntoIter takes ownership and can discard the allocation and everything once you're done, Drain is designed to keep the original Vec intact and shift the remaining elements - if any - into a valid position. That's all extra work that has to be eliminated in the special case where you use drain(..).

3 Likes

I think we could add the optimizations to Drain, but also, we could add a lint on drain(..) and suggest into_iter().

3 Likes

.drain(..) can serve the purpose of preserving the Vec's allocation/capacity to be reused afterwards.

3 Likes

The lint could fire only if the vec is also dropped afterwards (like in this example).

8 Likes

Maybe a separate drain_all method could be introduced, if drain(..) is too general to be easily optimized? I feel like I see drain(..) at least as much as I see drain called with any other argument.

1 Like

if drain(..) is too general to be easily optimized?

I don't think this has been claimed anywhere in this thread. It's not as easy as intoiter, but that doesn't mean there aren't any low-hanging fruits.