No more VLAs?


#1

In past I’ve asked something like (a more principled variant of) C Variable Length Arrays for Rust, but lately they are removing them all from Linux kernel, so I am not sure it’s a good idea to add them to Rust:

But the slide 8 is suspicious, to me it seems too much added code. So I’ve tried the code myself:

#include <stdio.h>
#include <string.h>

void call_me1(char *stuff, int step) {
    char buf[10];
    strncpy(buf, stuff, sizeof(buf) - 1);
    buf[sizeof(buf) - 1] = '\0';
    printf("%d:[%s]\n", step, buf);
}

void call_me2(char *stuff, int step) {
    char buf[step];
    strncpy(buf, stuff, sizeof(buf) - 1);
    buf[sizeof(buf) - 1] = '\0';
    printf("%d:[%s]\n", step, buf);
}

The asm shows a smaller difference:

.LC0:
    .string "%d:[%s]\n"

call_me1(char*, int):
    push    rbx
    mov     edx, 9
    mov     ebx, esi
    sub     rsp, 16
    mov     rsi, rdi
    lea     rdi, [rsp+6]
    call    strncpy
    lea     rdx, [rsp+6]
    mov     esi, ebx
    mov     edi, OFFSET FLAT:.LC0
    xor     eax, eax
    mov     BYTE PTR [rsp+15], 0
    call    printf
    add     rsp, 16
    pop     rbx
    ret

call_me2(char*, int):
    push    rbp
    mov     rbp, rsp
    push    r12
    push    rbx
    movsx   rbx, esi
    lea     rax, [rbx+15]
    and     rax, -16
    sub     rsp, rax
    lea     rdx, [rbx-1]
    mov     rsi, rdi
    mov     rdi, rsp
    call    strncpy
    mov     rdx, rsp
    mov     esi, ebx
    mov     edi, OFFSET FLAT:.LC0
    xor     eax, eax
    mov     BYTE PTR [rsp-1+rbx], 0
    call    printf
    lea     rsp, [rbp-16]
    pop     rbx
    pop     r12
    pop     rbp
    ret

#2

VLA may cause stack overflow or clashing.

The allocation size should be checked by a program anyway, or unbounded allocations would always be an issue regardless of the array type (static, VLA, heap) you use.

Similar thing applies to stack clash; it’s mitigated by most compilers anyway.

VLAs are slow

Still faster than allocating heap. Though, I can’t really defend; we tend to know the maximum size and stack memory isn’t that precious.


#3

I really like VLA’s in C (although it’s partly because C lacks any usable Vec alternative).

However, the assembly example for the VLA in that PDF looks too awful. Isn’t it just a compiler bug/deficiency that can be fixed? Why is there a div instruction in the VLA code!?

Even if VLA’s have some overhead, it shouldn’t be compared to fixed-size arrays, but constructs like Vec/SmallVec/ArrayVec, all of which have their overheads too.


#4

I am not quite sure why compiler is generating this stupid code.

I tried the code https://godbolt.org/z/HKwYka

The assembly shown in the slides is actually with -O0. But this is almost cheating, because nobody would use -O0 if they really care about the performance.

It seems even only -O1 will make the compiler generate reasonable code. So I don’t quite get the point of the slides, because -O0 almost means directly translate the semantics and discussing the quality of -O0 code seems pointless


#5

In Rust, VLAs could help with avoiding initialization overhead for large buffers of which only a tiny part is actually used. Right now, optimizing this manually needs some code duplication.


#6

Can you give an example?