Currently, &Vec<T>
is a pointer point to Vec<T>
, which is a tuple of (ptr,len,cap)
.
Thus, indexing an element from &Vec would dereference pointer twice.
fn indexing_Vec(rhs:Vec<i32>,r:usize) -> i32 {
assert!(rhs.len()>r);
// movq (%rbx), %rdi // first dereference, obtain a ptr to `&[i32]`
// movl (%rdi,%rax,4), %ebx // second dereference, got rhs[r]
rhs[r]
// movl %ebx, %eax
}
Full Assembly generate by godbolt with `-C opt-level=3`
indexing_Vec:
.Lfunc_begin1:
.file 5 "/app" "example.rs"
.loc 5 2 0
.cfi_startproc
.cfi_personality 155, DW.ref.rust_eh_personality
.cfi_lsda 27, .Lexception0
pushq %r14
.cfi_def_cfa_offset 16
pushq %rbx
.cfi_def_cfa_offset 24
pushq %rax
.cfi_def_cfa_offset 32
.cfi_offset %rbx, -24
.cfi_offset %r14, -16
movq %rdi, %rbx
.Ltmp12:
.loc 5 3 13 prologue_end
cmpq %rsi, 16(%rdi)
.loc 5 3 5 is_stmt 0
jbe .LBB1_1
.loc 5 0 5
movq %rsi, %rax
.loc 5 4 5 is_stmt 1
movq (%rbx), %rdi
.Ltmp13:
.loc 1 487 1
movq 8(%rbx), %rsi
.Ltmp14:
.loc 5 4 5
movl (%rdi,%rax,4), %ebx
.Ltmp15:
.loc 2 241 25
testq %rsi, %rsi
.loc 2 241 12 is_stmt 0
je .LBB1_5
.Ltmp16:
.loc 3 435 16 is_stmt 1
shlq $2, %rsi
.Ltmp17:
.loc 4 250 22
movl $4, %edx
callq *__rust_dealloc@GOTPCREL(%rip)
.Ltmp18:
.LBB1_5:
.loc 5 5 2
movl %ebx, %eax
addq $8, %rsp
.cfi_def_cfa_offset 24
popq %rbx
.cfi_def_cfa_offset 16
popq %r14
.cfi_def_cfa_offset 8
retq
Same things happened to &Vec<T>
:
fn indexing_ref_Vec(rhs:&Vec<i32>,r:usize) -> i32 {
assert!(rhs.len()>r);
// movq (%rdi), %rax // first dereference, obtain a ptr to `&[i32]`
// movl (%rax,%rsi,4), %eax // second dereference, got rhs[r]
rhs[r]
}
Full Assembly generate by godbolt with `-C opt-level=3`
indexing_ref_Vec: .Lfunc_begin2: .loc 5 7 0 .cfi_startproc pushq %rax .cfi_def_cfa_offset 16 .Ltmp20: .loc 5 8 13 prologue_end cmpq %rsi, 16(%rdi) .loc 5 8 5 is_stmt 0 jbe .LBB2_2 .loc 5 9 5 is_stmt 1 movq (%rdi), %rax movl (%rax,%rsi,4), %eax .loc 5 10 2 popq %rcx .cfi_def_cfa_offset 8 retq .LBB2_2: .cfi_def_cfa_offset 16 .loc 5 8 5 leaq .L__unnamed_1(%rip), %rdi leaq .L__unnamed_3(%rip), %rdx movl $31, %esi callq *core::panicking::panic@GOTPCREL(%rip) ud2 .Ltmp21: .Lfunc_end2: .size indexing_ref_Vec, .Lfunc_end2-indexing_ref_Vec .cfi_endproc
.section .text.indexing_array,"ax",@progbits .globl indexing_array .p2align 4, 0x90 .type indexing_array,@function
indexing_array: .Lfunc_begin3: .loc 5 12 0 .cfi_startproc pushq %rax .cfi_def_cfa_offset 16 .Ltmp22: .loc 5 13 13 prologue_end cmpq $5, %rsi .loc 5 13 5 is_stmt 0 jae .LBB3_1 .loc 5 14 5 is_stmt 1 movl (%rdi,%rsi,4), %eax .loc 5 15 2 popq %rcx .cfi_def_cfa_offset 8 retq
But for array with either fixed size or a reference, we only need 1 dereference:
#[no_mangle]
fn indexing_array(rhs:[i32;5],r:usize) -> i32 {
assert!(rhs.len()>r);
// movl (%rdi,%rsi,4), %eax
rhs[r]
}
#[no_mangle]
fn indexing_ref_array(rhs:&[i32],r:usize) -> i32 {
assert!(rhs.len()>r);
// movl (%rdi,%rdx,4), %eax
rhs[r]
}
Full Assembly generate by godbolt with `-C opt-level=3`
.size indexing_ref_Vec, .Lfunc_end2-indexing_ref_Vec .cfi_endproc
.section .text.indexing_array,"ax",@progbits .globl indexing_array .p2align 4, 0x90 .type indexing_array,@function
indexing_array: .Lfunc_begin3: .loc 5 12 0 .cfi_startproc pushq %rax .cfi_def_cfa_offset 16 .Ltmp22: .loc 5 13 13 prologue_end cmpq $5, %rsi .loc 5 13 5 is_stmt 0 jae .LBB3_1 .loc 5 14 5 is_stmt 1 movl (%rdi,%rsi,4), %eax .loc 5 15 2 popq %rcx .cfi_def_cfa_offset 8 retq .LBB3_1: .cfi_def_cfa_offset 16 .loc 5 13 5 leaq .L__unnamed_1(%rip), %rdi leaq .L__unnamed_4(%rip), %rdx movl $31, %esi callq *core::panicking::panic@GOTPCREL(%rip) ud2 .Ltmp23: .Lfunc_end3: .size indexing_array, .Lfunc_end3-indexing_array .cfi_endproc
.section .text.indexing_ref_array,"ax",@progbits .globl indexing_ref_array .p2align 4, 0x90 .type indexing_ref_array,@function
indexing_ref_array: .Lfunc_begin4: .loc 5 17 0 .cfi_startproc pushq %rax .cfi_def_cfa_offset 16 .Ltmp24: .loc 5 18 13 prologue_end cmpq %rdx, %rsi .loc 5 18 5 is_stmt 0 jbe .LBB4_1 .loc 5 19 5 is_stmt 1 movl (%rdi,%rdx,4), %eax .loc 5 20 2 popq %rcx .cfi_def_cfa_offset 8 retq .LBB4_1: .cfi_def_cfa_offset 16 .loc 5 18 5 leaq .L__unnamed_1(%rip), %rdi leaq .L__unnamed_5(%rip), %rdx movl $31, %esi callq *core::panicking::panic@GOTPCREL(%rip) ud2 .Ltmp25: .Lfunc_end4:
Code above shows a potential performance gain by changing the ABI of Vec<T>
.
My proposal is, make Vec<T>
to a raw pointer(with phantom marks if necessary) directly, and store metadata in the new memory.
e.g.,
struct Vec<T>{
ptr:ArrayPtr<T>,
}
#[repr(C)]
struct Array<T,const N:usize>{
capacity:usize,// (must equals to N)
len:usize, // element length
data:[T;N]
}
finally, the most important thing:
#[repr(C)]
struct ArrayPtr<T>{
ptr: NonNull<T>, // point to Array<T,N>.data
}
thus, we could use
unsafe{*((usize*)ArrayPtr<T>.ptr-1)}
and
unsafe{*((usize*)ArrayPtr<T>.ptr-2)}
to obtain the len and cap seperately.
--
If we change those API, then Vec<T>
could be stored into a register directly, thus no need to dereference some pointer to get the real pointer.
What's more, we could play some magic on &Vec<T>
, make &Vec<T>
just a copy of Vec<T>
(without ownership), thus we won't dereference twice.
Currently, Rust have promised:
Most fundamentally,
Vec
is and always will be a (pointer, capacity, length) triplet. No more, no less. The order of these fields is completely unspecified, and you should use the appropriate methods to modify these. The pointer will never be null, so this type is null-pointer-optimized.
But Rust also said that
Due to its incredibly fundamental nature,
Vec
makes a lot of guarantees about its design. This ensures that it’s as low-overhead as possible in the general case, and can be correctly manipulated in primitive ways by unsafe code.
Thus making such change might be fine.
Is it possible to make such ABI-change in Rust 2024?