No more VLAs?

In past I’ve asked something like (a more principled variant of) C Variable Length Arrays for Rust, but lately they are removing them all from Linux kernel, so I am not sure it’s a good idea to add them to Rust:

But the slide 8 is suspicious, to me it seems too much added code. So I’ve tried the code myself:

#include <stdio.h>
#include <string.h>

void call_me1(char *stuff, int step) {
    char buf[10];
    strncpy(buf, stuff, sizeof(buf) - 1);
    buf[sizeof(buf) - 1] = '\0';
    printf("%d:[%s]\n", step, buf);
}

void call_me2(char *stuff, int step) {
    char buf[step];
    strncpy(buf, stuff, sizeof(buf) - 1);
    buf[sizeof(buf) - 1] = '\0';
    printf("%d:[%s]\n", step, buf);
}

The asm shows a smaller difference:

.LC0:
    .string "%d:[%s]\n"

call_me1(char*, int):
    push    rbx
    mov     edx, 9
    mov     ebx, esi
    sub     rsp, 16
    mov     rsi, rdi
    lea     rdi, [rsp+6]
    call    strncpy
    lea     rdx, [rsp+6]
    mov     esi, ebx
    mov     edi, OFFSET FLAT:.LC0
    xor     eax, eax
    mov     BYTE PTR [rsp+15], 0
    call    printf
    add     rsp, 16
    pop     rbx
    ret

call_me2(char*, int):
    push    rbp
    mov     rbp, rsp
    push    r12
    push    rbx
    movsx   rbx, esi
    lea     rax, [rbx+15]
    and     rax, -16
    sub     rsp, rax
    lea     rdx, [rbx-1]
    mov     rsi, rdi
    mov     rdi, rsp
    call    strncpy
    mov     rdx, rsp
    mov     esi, ebx
    mov     edi, OFFSET FLAT:.LC0
    xor     eax, eax
    mov     BYTE PTR [rsp-1+rbx], 0
    call    printf
    lea     rsp, [rbp-16]
    pop     rbx
    pop     r12
    pop     rbp
    ret
1 Like

VLA may cause stack overflow or clashing.

The allocation size should be checked by a program anyway, or unbounded allocations would always be an issue regardless of the array type (static, VLA, heap) you use.

Similar thing applies to stack clash; it's mitigated by most compilers anyway.

VLAs are slow

Still faster than allocating heap. Though, I can't really defend; we tend to know the maximum size and stack memory isn't that precious.

1 Like

I really like VLA’s in C (although it’s partly because C lacks any usable Vec alternative).

However, the assembly example for the VLA in that PDF looks too awful. Isn’t it just a compiler bug/deficiency that can be fixed? Why is there a div instruction in the VLA code!?

Even if VLA’s have some overhead, it shouldn’t be compared to fixed-size arrays, but constructs like Vec/SmallVec/ArrayVec, all of which have their overheads too.

1 Like

I am not quite sure why compiler is generating this stupid code.

I tried the code https://godbolt.org/z/HKwYka

The assembly shown in the slides is actually with -O0. But this is almost cheating, because nobody would use -O0 if they really care about the performance.

It seems even only -O1 will make the compiler generate reasonable code. So I don’t quite get the point of the slides, because -O0 almost means directly translate the semantics and discussing the quality of -O0 code seems pointless

6 Likes

In Rust, VLAs could help with avoiding initialization overhead for large buffers of which only a tiny part is actually used. Right now, optimizing this manually needs some code duplication.

Can you give an example?

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.