Missing Optimization for iterating through Drain
The following code is a little bit bad, but bear with me
pub fn get(mut v: Vec<i32>, i: usize) -> i32 {
v.drain(..).nth(i).take().unwrap()
}
This is functionally identially to v[i]
and also idential to
v.into_iter().nth(i).take().unwrap()
Where as the latter is successfully optimized to just an indexed access, the former still has a loop even with -C opt-level=3
. Here's the disassembled code Compiler Explorer. .LBB1_3
will loop i
times.
I have no idea how compiler optimization works and I am just getting started playing with compilers, but I feel like this is a optimization that is possible and wouldn't be too difficult.
Motivation
Originally I wanted to find a way to take ownership the n'th element from a vector and drop the other rest. My friend suggested using drain
. It works as intended but I didn't know if it would be performant. I realised if I want to take out two elements, I will have to iterate through the iterator after taking out the first element, which might not be good if the two elements are far apart. But I think it's fine because if the elements need to be dropped (e.g. v
is Vec<Vec<i32>>
) the code needs to iterating through everything to drop everything eventually so I'm not paying for extra time. But for things that don't need to be dropped like i32
, I wondered if the compiler could optimize it into just an indexed access.
Now I realised that I could have just used into_iter()
. Oh well... but anyways I think it would be nice to have the same optimization on Drain
as well.
Investigation Steps
I thought I would document how I tried to understand what the assemly code does. Just reading the assumbly code was a bit hard, so I wanted to step through it with a debugger. Also, I wanted to just compile the function without a main and call it with some custom data because I didn't want to give the compiler any more information to do better optimizations. Eventually I figured out that I could compile the rust code into an object file with
rustc -g -C opt-level=3 --crate-type=staticlib --emit=obj a.rs
And write some C code like this
#include <stdio.h>
#include <stdlib.h>
extern int get(long long *arr, long long index);
int main() {
long long n = 100;
int* numbers = malloc(n * sizeof(int));
for (long long i = 0; i < n; i++) {
numbers[i] = i+1;
}
long long arr[] = {n, numbers, n};
long long index = n / 2;
int result = get(arr, index);
printf("Result: %d\n", result);
return 0;
}
And then link them together with
clang -g run.c a.o -o executable /home/codespace/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd-22be60875a4ac8d7.so
I know this code is excutiatingly painful to read and I could have written the C code in rust. Sorry about that