Unsized constants?

I was looking for a way to convert small amounts of hex data into an array, at compile time, I've found this: https://docs.rs/hex-literal/0.2.1/hex_literal/

But we have const fn and const generics, so I've given it a try:

#![feature(const_if_match, const_loop, const_generics, const_fn, const_panic)]
#![allow(incomplete_features)]

const fn from_hex_len(arr: &[u8]) -> usize {
    let mut i = 0;
    let mut count = 0;

    while i < arr.len() {
        match arr[i] {
            b'0' ..= b'9' | b'a' ..= b'f' | b'A' ..= b'F' => { count += 1; },
            b' ' | b'\r' | b'\n' | b'\t' => {}, // Ignored.
            _ => { panic!("from_hex_len: found a disallowed byte."); },
        }
        i += 1;
    }

    if count % 2 != 0 {
        panic!("from_hex_len: count % 2 != 0.");
    }
    count / 2
}


/// It accepts in the input data only:
/// '0'..='9', 'a'..='f', 'A'..='F' hex characters which will be used.
/// ' ', '\r', '\n', '\t' formatting characters which will be ignored.
const fn from_hex<const N: usize>(arr: &[u8]) -> [u8; N] {
    let mut result = [b'\0'; N];
    let mut i = 0;
    let mut count = 0;
    let mut pred: u8 = 0;

    while i < arr.len() {
        match arr[i] {
            b'0' ..= b'9' => {
                if count % 2 == 1 {
                    result[count / 2] = pred * 16 + arr[i] - b'0';
                } else {
                    pred = arr[i] - b'0';
                }
                count += 1;
            },
            b'a' ..= b'f' => {
                if count % 2 == 1 {
                    result[count / 2] = pred * 16 + arr[i] - b'a' + 10;
                } else {
                    pred = arr[i] - b'a' + 10;
                }
                count += 1;
            },
            b'A' ..= b'F' => {
                if count % 2 == 1 {
                    result[count / 2] = pred * 16 + arr[i] - b'A' + 10;
                } else {
                    pred = arr[i] - b'A' + 10;
                }
                count += 1;
            },
            b' ' | b'\r' | b'\n' | b'\t' => {}, // Ignored.
            _ => { panic!("from_hex: found a disallowed byte."); },
        }

        i += 1;
    }

    if count != N * 2 {
        panic!("from_hex: count != N * 2.");
    }
    result
}

fn main() {
    const S: &[u8] =
      b"08021661260f0028004b0405074e340c324d5b083131632811
        5112393c571128622b453004383e0051311f49374f0e1d5d47
        284335581e03310d244134465f17043c0b2a45184438012038
        472502245b161f104733433f59295c24361628281c42210d50
        182f203c63032d022c4b21354e24541423110c322062511c40
        17430a1a2628433b36464212264046431a1444023e0c145f3f
        5e273f08285b42315e1518373a054249631a61114e4e60530e
        5822593f48152417094b004c2c142d230e003d2161221f215f
        4e11351c164b1f430f5e0350043e100e0935385c1027052a60
        231f2f373a581800113618241d55395638003023475907052c
        2c252c3c153a3336113a13505144055e2f451c495c0d563411
        4d04593728043408536123631007613920101a1a4f211b6242
        58244457393e1448032e21432e370c203f5d3545042a104926
        19270b185e4812082e1d20283e4c2414452429481e1758223e
        634552433b554a0424101449231d4e1f5a014a1f3147305651
        10173905360146364753333645105c21303d2b340159134330";

    const SH: [u8; from_hex_len(S)] = from_hex(S);
    println!("{:?}", &SH[..]);
}

It seems to work well enough (I haven't tested it for larger amounts of data or tons of conversions in a single program) but having two different functions with that API isn't great. Do you know if we can improve that with the features Rust offers now?

Looking forward to possible future features I think the problem could be solved with some kind of 'unsized constants', where the length isn't written in the source code but it's still a compile-time constant:

const fn from_hex<const N: usize>(arr: &[u8]) -> [u8] {...}

const SH: [u8] = from_hex(b"08021661260f00");
2 Likes

"Unsized" in Rust means that the size isn't a compile-time constant, therefore unsized constants are by definition impossible.

However, you could just write a macro that infers the size of the hex constant and produces an appropriately-sized array, like include_bytes! does. Actually, you could just use include_bytes! and move the binary data to a separate file – that's even cleaner.

I don't see a reason why you would need two different functions.

1 Like

you could just write a macro that infers the size of the hex constant and produces an appropriately-sized array,

Yes, a macro could solve similar problems, I will think about that.

Actually, you could just use include_bytes! and move the binary data to a separate file – that's even cleaner.

This was just an example. I have other similar functions that generate arrays at compile-time and the problem is that to assign its result to an array I have to first compute the length of the result. This thread is about this problem.

I don't see a reason why you would need two different functions.

If I have a compile-time function that generates a const array (and currently they have some advantages: https://github.com/rust-lang/rust/issues/73780 ), but I don't know how many items the result will have, how do I solve the problem? So far I solve the problem like this, using a function to compute the length of the result and one to compute the resulting array:

const SH: [u8; from_hex_len(S)] = from_hex(S);

The point of this thread is to ask if future rust could solve that. Another possible way is to allow length inference in constants:

const DATA: [u32; _] = generate_array(data);
1 Like

While it's rather bizarre to call this "unsized constants" or to suggest slice syntax for a fixed-size array, the [Type; _] syntax has definitely been suggested before and IMO is kind of a no-brainer. I suspect the only reason we don't already allow _ as a length is that much of const eval is still being worked out, and changing anything about array lengths would probably expose all of that unfinished stuff before it's safe. So it's just in the usual pile of "blocked on everything we're already working on".

5 Likes

Slightly off-topic, but with regard to the example from_hex_len really should ignore the readability separator _, and from_hex should ignore _ when count % 2 == 0 (i.e., between hex bytes).

3 Likes

Yeah, that's a valid idea for an improvement :slight_smile:

1 Like

That doesn't work if generate_array is generic over the length of the returned array. Rust has to infer the type of DATA before executing the generate_array function.

2 Likes

It is not the slice type [T], what we want is a dependent sum type exists (N : usize). [T; N] which means it is an array of some dynamic length. It is not the same as the unsized slice type because its length can be extracted from the value given the sum type value, not given a fat reference. It is very similar to Box<[T]> but without dynamic allocation (thus requires alloca).

1 Like

Even then, you need something like RFC #2884 so that generate_array can return arbitrarily sized arrays. No idea how hard it would be to integrate with const functions, though.

Note that in some cases we want to preserver the array type, so unfortunately a const function which returns &[u8] will not cover all use-cases. Ideally I would like to be able to write something like:

const fn calc_hex_size(data: &str) -> usize { .. }

const fn from_hex<const N: usize>(const data: &str) -> [u8; N]
    // only const arguments can be used in `where` clauses
    where N = calc_hex_size(data)
{ .. }

const arguments were proposed here for a different use-case, but they will fit really nicely here as well.

cc @gnzlbg

1 Like

I believe this is also what is being discussed in rust#68436.

Hm, I guess we could indeed write:

const fn from_hex(const data: &str) -> [u8; {calc_hex_size(data)}] { .. }
// or without const arguments:
const fn from_hex<const DATA: &str>() -> [u8; {calc_hex_size(DATA)}] { .. }