Regarding pattern matching

leonardo · December 31, 2024, 4:07pm

(I'm back here after some hiatus).

I've seen that this code:

#[inline(never)]
pub fn foo1(ch: &[u8; 13]) -> ([u8; 5], [u8; 5]) {
    let a = &ch[0 .. 5].try_into().unwrap();
    let b = &ch[8 .. 13].try_into().unwrap();
    (*a, *b)
}

Gives reasonable asm, but it contains some visible unwraps:

foo1:
    mov    rax, rdi
    mov    ecx, dword ptr [rsi]
    movzx  edx, byte ptr [rsi + 4]
    mov    edi, dword ptr [rsi + 8]
    movzx  esi, byte ptr [rsi + 12]
    mov    byte ptr [rax + 4], dl
    mov    dword ptr [rax], ecx
    mov    byte ptr [rax + 9], sil
    mov    dword ptr [rax + 5], edi
    ret

This version is OK and it avoids the unwraps (and unsafe code) but it's a bit too much fiddly with many variable names:

#[inline(never)]
pub fn foo2(ch: &[u8; 13]) -> ([u8; 5], [u8; 5]) {
    let &[a0,a1,a2,a3,a4, _,_,_, b0,b1,b2,b3,b4] = ch;
    ([a0, a1, a2, a3, a4], [b0, b1, b2, b3, b4])
}

This avoids that problem, using just few variables:

#[inline(never)]
pub fn foo3(ch: &[u8; 13]) -> ([u8; 5], [u8; 5]) {
    let &[a@.., _,_,_, _,_,_,_,_] = ch;
    let &[_,_,_,_,_, _,_,_, b@..] = ch;
    (a, b)
}

But this (reasonable) version currently doesn't compile (two @.. aren't supported):

#[inline(never)]
pub fn foo4(ch: &[u8; 13]) -> ([u8; 5], [u8; 5]) {
    let &[a@.., _,_,_, b@..] = ch;
    (a, b)
}

So the question is, is it worth supporting this last version of the code too to slice & dice fixed-size arrays (not run-time-sized slices)?

With methods like .array_chunks() I'm (finally) using more arrays in my Rust code.

kornel · December 31, 2024, 4:12pm

I'd solve that with a comment // the unwrap can't fail.

The try_into code is simple enough.

The other version looks like a morse code. Multiple @ don't really solve that problem, only happen to help in the specific case when the arrays are small and mostly overlapping. If you had &[u8; 128], it would still be unreadable even with multiple @s.

kornel · December 31, 2024, 4:19pm

Hypothetically, if Rust supported much more const generics and variadic generics, then this could look something like this:

let (a, b) = ch.get_many_chunks::<(0..5, 8..13)>();

jdahlstrom · December 31, 2024, 4:22pm

It is possible to express something like compile-time slices in stable Rust using const { assert!() }:

let b: [u8; 5] = ch.get::<8, 5>()

where 5 is the length (a version taking a (start, end) range can't be written currently because it would require const generic arithmetic). Both of these will fail to compile:

let c: [u8; 5] = ch.get::<7, 6>(); // length does not match c
let d: [u8; 5] = ch.get::<9, 5>(); // out of bounds of ch

leonardo · December 31, 2024, 4:32pm

Yes, also const { unwrap() }, it's a simple enough solution that requires no changes in const generics nor variadics

jdahlstrom · December 31, 2024, 4:36pm

Something that's frustratingly close ^[1] to possible is a static split_at for arrays:

fn split_at<T, const N: usize, const M: usize>(arr: &[T; N]) 
    -> (&[T; M], &[T; N - M])

So close, but yet so far… ↩︎

scottmcm · December 31, 2024, 4:40pm

Did you notice https://doc.rust-lang.org/std/primitive.slice.html#method.split_first_chunk and friends?

That's how I'd spell something like that these days, to get the arrays directly without the .try_into().unwrap() dance at all.

Then the only "unwrap" is the one that's equivalent to the one that you're seemingly fine with in the indexing syntax anyway.

scottmcm · December 31, 2024, 4:49pm

There's always the inline-const version of that for now:

fn array_ends_mut<T, const N: usize, const FRONT: usize, const BACK: usize>(
    a: &mut [T; N],
) -> (&mut [T; FRONT], &mut [T; BACK]) {
    const { assert!(N >= FRONT + BACK) };
    let (front, a) = a.split_first_chunk_mut().unwrap();
    let (_a, back) = a.split_last_chunk_mut().unwrap();
    (front, back)
}

Not quite as nice as a 2-const-generics version, but not bad.

Topic		Replies	Views
More musings about slices and arrays language design	13	1428	January 3, 2021
[Pre-RFC #2]: Inline assembly language design	161	10668	March 15, 2020
[Pre-RFC]: Inline assembly language design	70	14033	March 25, 2019
[DRAFT] RFC: or patterns language design	4	876	March 25, 2019
[Pre-RFC] Multiple if-let guards (not chaining)	12	1064	April 13, 2021

Regarding pattern matching

Related topics