Idiomatic generating function for arrays

The core question I want to ask is well presented by this stackoverflow issue:

From what I understand, they are really 2 safe ways to create an array (or [T; N]) in rust.

  • Either you create it with an element of a type that implement the copy trait
  • Or you list explicitely what will be in the array, like a tuple.

Let's say I want to initialize a big array (let's say 50000 elements) with even values in boxes.

The standard way would be:

const N: usize = 50000
evens = {
    let mut temp = [Box::Default(); N];
    for i in 0..N {
    temp[i] = Box::new(i*2);
    }
    temp
}

This is a pretty readable code, but I think you can do better:

  • First, you have to initialize the array, and in this case you have to allocate N boxes on the heap to replace them just after that. The initialization step is useless !
  • Second, I used mutability, but you don't have to: what you really want to do is generate the array from a function FnMut(usize) -> T on N indices.

So here is I think the most elegant solution currently:

impl<T, F, const N: usize> From<F> for [T; N] where F: FnMut(usize) -> T {
    fn from(f: F) -> [T; N] {
        unsafe {
            let mut result: [T; N] = std::mem::MaybeUninit::uninit().assume_init();
            for i in 0..N {
                result[i] = f(i);
            }
            result
        }
    }
}

fn main() {
    let array_of_evens : [usize, 50000]
                       = (|x| Box::new(x*2)).into()
}

Could that be implemented in the standard libs ? I think the compiler could achieve quite good optimizations around this way of generating arrays.

This is UB. Based on the reference, producing invalid value itself is UB and uninitialized memory outside of the union and struct paddings are invalid. And worse, this code drops the uninitialized value on assignment.

Easiest workaround currently possible is to utilize arrayvec crate which provides array-based container which can be made with .collect() and be converted into the inner array if full.

4 Likes

Since the values are initialized directly after, without needing to read values in the array, why would that be an issue ?

My point is that this could make sense if the compiler initialized the array for me from the function I gave to it.

The array-init crate provides a version of this function, and some variations. It has 27 reverse dependencies and over 100,000 recent downloads, which I think is decent evidence that this feature is widely needed (especially since this is just one of a few similar crates).

I think it would be good to add this to the standard library for discoverability, especially considering the pitfalls of people trying to implement it from scratch (as demonstrated in this thread!).

Assigning to the value will cause the destructor to read the previous (uninitialized) value. The MaybeUnit docs show the correct way to do this, using [MaybeUnit<T>; N].

2 Likes

Thanks !

The zip and map functions where already quite good abstractions around arrays, but I feel like it is the missing piece of the puzzle (for what I wanted to do anyway)

Also note that [T; N]::map will be stable in rust 1.55, so you'll be able to do [(); N].map(|_| /* code here */) to initialize an array. Even though this is a weird hack, it should be enough for infallible array creation.

9 Likes

I still have to manually manage the indices: an array filled with () does not have information about the indices, it's not like a range.

That's why I said it's a weird hack. However it's much better than the two ways you linked above: it avoids both having to allocate twice and having to use unsafe (with the linked UB potential)

If T has a destructor then result[i] = will run it on uninitialized data, which is very crashy.

Apart from that, arrays and slices are defined to only have initialized data. Invalid assume_init is UB by definition. It feeds self-contradictory data to the optimizer. The optimizer can then make invalid assumptions and compile bogus code (e.g. even delete the whole function, because UB is defined to never happen, so if it happens, then it's allowed to assume this code never runs).

A slightly more complicated function that someone could need is map but with the index:

With an iterator you could achieve that easily with iterator.enumarate().map(|(i, v)| /*...*/).

With the array_init you could do it (result = array_init(|i| /* something with i and old_array[i])*/))

But it will not work if old_array is a non-copy array ...

I'm trying to find a small reproducible exemple. Is there a way to convert an array to an array of references ?

The only way to get an index when using [T; N]::map is to manage it "by hand":

let mut i = 0;
let array = [(); N].map(|_| {
    let val = do_something_with(i);
    i += 1;
    val 
});
2 Likes

Since map and zip on arrays are still unstable, how can I implement them with MaybeUninit ?

Use the arrayvec crate.

use arrayvec::ArrayVec;
use std::convert::TryInto;

const N: usize = 10;

fn main() {
    let arrvec = (0..N).map(|x| x * 2).collect::<ArrayVec<usize, N>>();
    let arr: [usize; N] = arrvec[..].try_into().unwrap();
}

Since map and zip on arrays are still unstable, how can I implement them with MaybeUninit ?

This was once how unstable [T; N]::map was implemented(has be refactored a bit since). However that is far from as safe and simple as using something like array-init.

A slightly more complicated function that someone could need is map but with the index [...] I'm trying to find a small reproducible exemple. Is there a way to convert an array to an array of references ?

If you want to use some of the Iterator methods but on arrays, then there is iter_fixed. The limitation there is that main type IteratorFixed only implements the subset of Iterator's methods that affect the length in a const-predictable way.

struct NotCopy(i32);

let values = [NotCopy(0), NotCopy(1), NotCopy(2)];
let res: [(usize, &NotCopy); 3] = (&values)
    .into_iter_fixed()
    .enumerate()
    .map(|(i, v)| (2 * i, v))
    .collect();
assert_eq!(res, [(0, &NotCopy(0)), (2, &NotCopy(1)), (4, &NotCopy(2))]);