Pre-RFC: Array fill syntax

nwn · April 28, 2020, 8:42pm

Summary

Allow shorthand syntax for array literals with the same element repeated at the end. For example:

let x: [i32; 6] = [1, 2, 3, ..0]; // The last 3 elements are 0
assert_eq!(x, [1, 2, 3, 0, 0, 0]);

Motivation

Parity with Other Languages

In C and C++, it is common to define arrays by their first few elements, letting the remainder be default-initialized, like so:

int int_array[6] = {1, 2, 3}; // Final elements initialized to 0
array<string_view, 6> str_array = {"Hello", "World"}; // Final elements initialized to ""

Rust currently has no analogous syntax. We propose to fill this gap with the following syntax:

let int_array: [i32; 6] = [1, 2, 3, ..0];
let str_array: [&str; 6] = ["Hello", "World", ..""];

This syntax is in fact more powerful than the C/C++ version in that it supports arbitrary Copy values to be filled into the array:

let mostly_some: [Option<f32>; 100] = [Some(0.0), Some(1.0), None, ..Some(-1.0)];

Code readability

Repetition in code can both be a source of bugs and reduce the readability of the code. A large array wherein most of the elements are identical is not currently obvious with the existing syntax. A reader of the code would be required to scan the entire array to determine any deviance.

The proposed syntax makes this both more convenient when writing such code and more clear when reading such code.

Guide-level explanation

An array expression can be written by enclosing zero or more comma-separated expressions of uniform type in square brackets. This produces an array containing each of these values in the order they are written. Such an array expression can optionally be terminated with an element of the form ..<expr>, where <expr> is of the same type as the preceeding elements. This appends zero or more copies of expr (the fill expression) to the produced array. The length of the array is at least the number of elements before the fill expression, and is determined exactly by type inference.

If the fill expression fills more than one element, then the element type must be Copy. The fill expression is evaluated exactly once and moved or copied to the necessary number of elements.

For example:

let x: [i32; 6] = [1, 2, 3, ..0]; // The last 3 elements are 0
assert_eq!(x, [1, 2, 3, 0, 0, 0]);

This becomes useful when dealing with many repeated elements or when the elements are complex:

let x: [(Option<f32>, usize, bool); 20] = [
    (Some(1.0), 0x42, true),
    (None, 0x1a, false),
    ..(Some(0.4), 0xfe, true) // This term is repeated 18 times
];

It is possible for the fill expression to not fill any elements in the array:

let x: [char; 5] = ['H', 'e', 'l', 'l', 'o', ..'\0']; // All elements are already specified
assert_eq!(x, ['H', 'e', 'l', 'l', 'o']);

Reference-level explanation

This proposal does not affect the other types of array expressions: fully expanded array expressions without a fill expression and repeat-style array expressions still behave exactly the same. This instead introduces a third type of array expression.

Desugaring

The examples in the guide-level explanation can be considered to desugar as follows:

let x: [i32; 6] = [1, 2, 3, ..0]; // This desugars to the expression below.

let x: [i32; 6] = {
    let elem_0 = 1;
    let elem_1 = 2;
    let elem_2 = 3;
    let elem_fill = 0;
    [elem_0, elem_1, elem_2, elem_fill, elem_fill, elem_fill]
};

let x: [(Option<f32>, usize, bool); 20] = [
    (Some(1.0), 0x42, true),
    (None, 0x1a, false),
    ..(Some(0.4), 0xfe, true)
]; // This desugars to the expression below.

let x: [(Option<f32>, usize, bool); 20] = {
    let elem_0 = (Some(1.0), 0x42, true);
    let elem_1 = (None, 0x1a, false);
    let elem_fill = (Some(0.4), 0xfe, true);
    [elem_0, elem_1, elem_fill, elem_fill, elem_fill,
     elem_fill, elem_fill, elem_fill, elem_fill, elem_fill,
     elem_fill, elem_fill, elem_fill, elem_fill, elem_fill,
     elem_fill, elem_fill, elem_fill, elem_fill, elem_fill]
};

let x: [char; 5] = ['H', 'e', 'l', 'l', 'o', ..'\0']; // This desugars to the expression below.

let x: [char; 5] = {
    let elem_0 = 'H';
    let elem_1 = 'e';
    let elem_2 = 'l';
    let elem_3 = 'l';
    let elem_4 = 'o';
    let _elem_fill = '\0'; // The fill expression still gets evaluated
    [elem_0, elem_1, elem_2, elem_3, elem_4]
};

Note that the fill expression is evaluated exactly once, even if it fills no elements. This matches the behaviour of repeat-style arrays of length 0. An array containing only a fill expression behaves exactly like a repeat-style array expression of the same length. This means the following are also equivalent:

let x: [bool; 0] = [ ..{ println!("Side effects"); true } ];
let x: [bool; 0] = [ { println!("Side effects"); true }; 0 ];
let x: [bool; 0] = {
    let _elem_fill = { println!("Side effects"); true };
    []
};

Length Inference

The length of an array expression with a fill expression is determined by type inference:

If an exact length can be uniquely determined from the surrounding program context, the array expression has that length.
If the program context under-constrains or over-constrains the length, it is considered a static type error.

So this (in isolation) is a type error:

let x = [..true]; // Length is under-constrained

This is also a type error:

let x = [..true]; // Length is determined from uses of `x`
let y: [bool; 3] = x; // Fixes length of `x` to 3
let z: [bool; 4] = x; // Error: array length mismatch

But this is valid:

let x: [[bool; 4]; 2] = [[..true], [true, ..false]]; // Each sub-array has length 4

Errors

Several errors can arise when using this feature.

If the length is under-constrained as in the following code:

let x = [..true];

the following error is produced:

error[E0282]: type annotations needed
 --> src/main.rs:4:9
  |
4 |     let x = [..true];
  |         ^   ^^^^^^^^
  |         |   |
  |         |   cannot infer length for array
  |         help: consider giving `x` a type

If the length has not yet been fixed, a type mismatch yields a slightly different error as follows:

let x: [bool; 3] = [true, false, true, false, ..true];

yields:

error[E0308]: mismatched types
 --> src/main.rs:4:24
  |
4 |     let x: [bool; 3] = [true, false, true, false, ..true];
  |            ---------   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected an array with a fixed size of 3 elements, found one with at least 4 elements
  |            |
  |            expected due to this

Once a length has been assigned to a filled array expression, array length errors act as normal. The following code:

let x = [..true];
let y: [bool; 3] = x;
let z: [bool; 4] = x;

yields the following error:

error[E0308]: mismatched types
  --> src/main.rs:10:22
   |
10 |     let z: [bool; 4] = x;
   |            ---------   ^ expected an array with a fixed size of 4 elements, found one with 3 elements
   |            |
   |            expected due to this

Interactions with `RangeTo`

The ..<expr> syntax is given special meaning within an array expression with higher precedence than the RangeTo operator. This only applies when the fill expression is a direct child of the array expression. Other range operators are unaffected. This means that the following hold:

let x: [u32; 6] = [1, 2, 3, ..0]; // Interpreted as a fill expression, not a range expression
assert_eq!(x, [1, 2, 3, 0, 0, 0]);

let x: [u32; 3] = [..2]; // Interpreted as a fill expression, not a range expression
assert_eq!(x, [2, 2, 2]);

// Other range expressions are unaffected
assert_eq!([0..2], [Range { start: 0, end: 2 }]);
assert_eq!([..], [RangeFull]);
assert_eq!([2..], [RangeFrom { start: 2 }]);
assert_eq!([0..=2], [RangeInclusive::new(0, 2)]);
assert_eq!([..=2], [RangeToInclusive { end: 2 }]);

When not a direct child of the array expression, the RangeTo operator is unaffected. That is, subexpressions and parenthesized expressions are unaffected. Therefore the following also hold:

// Parentheses prevent the expression from being treated as a fill expression.
assert_eq!([(..2)], [RangeTo { end: 2 }]);

// Subexpressions of an element are also treated normally.
let x: [RangeTo<u32>; 2] = [.. ..2]; // `RangeTo` expression within a fill expression
assert_eq!(x, [RangeTo { end: 2 }, RangeTo { end: 2 }]);
assert_eq!([{ ..2 }], [RangeTo { end: 2 }]); // `RangeTo` expression within a brace expression

Drawbacks

Breakage

The biggest drawback is that this is a breaking change. As discussed above, existing code using arrays of RangeTo literals would conflict with this syntax and fail to compile. To our knowledge, such code is used extremely rarely and can easily be fixed. The exact interactions are detailed above.

Inferred Lengths

The proposed syntax hides the actual length of an array literal. Unlike the two existing array forms ([1, 2, 3] and [true; 5]), the length of the array cannot be determined from the expression alone. This can hinder readability. However, in use cases where one would prefer the fill-syntax, the only alternative is a fully expanded array of sufficient length that this information is effectively hidden from the reader anyway. In such cases, explicit type annotations can be used.

This would also complicate the compiler's job of type inference.

Limitations

This syntax does not work in pattern positions, where it conflicts with unstable half-open range patterns. However, precedence for such differences between expressions and patterns exists. Both RangeTo literals and repeat-style array literals (e.g. [true; 5]) cannot appear in patterns, so it is reasonable to expect that fill-style array literals (as proposed here) also cannot.

This syntax does not afford extensions to arbitrary run-length encoded arrays, as described in the alternatives.

Rationale and alternatives

Rationale

The proposed syntax was chosen mainly for its familiarity. It is more intuitive for newcomers since the .. acts similarly to the ellipsis in both English and mathematical notation. It also reflects the meaning of the .. in the struct update syntax, namely "copy/move the remaining fields/elements from what follows".

Alternatives

Implementing this as a macro in either std or an external crate. Not sure if this is actually possible for compile-time evaluation without const loops.
An alternative syntax:
- Extend the repeat-syntax instead of the expanded syntax. This makes the length explicit, but the syntax would be less intuitive and noticeable:
  assert_eq!([1, 2, 3, 3, 3], [1, 2, 3; 5]) or
  assert_eq!([1, 2, 3, 3, 3], [1, 2, ..3; 5])
- Use a syntax that doesn't conflict with RangeTo, e.g. assert_eq!([1, 2, 3, 3, 3], [1, 2, 3...]) leveraging the existing (but unused) ... token
A more general syntax for run-length encoding array literals. This would solve the earlier drawback of multiple runs. However, in the real world, most cases involving multiple runs would require sufficient granularity that such a feature would provide little benefit.

Prior art

As described above, a similar feature is present in C and C++. In C, missing elements in an initializer are implicitly initialized to zero (NULL, etc.). C++ improves on this design by default-initializing any missing elements, allowing for more complex types in this position.

Both C and C++ suffer from the problem that arrays are silently and implicitly filled when elements are missing. This can lead to unexpected behaviour and bugs. Still, the convenience of this feature means that it continues to be used frequently. The proposed feature solves this problem while improving usability by making the behaviour explicit and opt-in, and by using only a user-defined value.

Unresolved questions

Is this the best syntax for such a feature?
Should we (or Clippy) warn when a fill expression would expand to 0 entries?
Allowing this could have valid use cases and aligns with the lack of warning when a base struct contributes no fields.
Should this allow !Copy types if the expression does not expand to more than one entry?
This aligns with the [vec![]; 1] syntax.

Future possibilities

This could be extended to allow middle-filling: [1, 2, ..3, 2, 1]

RustyYato · April 28, 2020, 9:22pm

Currently you can do this with:

let array: [_; count] = [a, b, c, d, e, ..fill];

// is the same as

let mut array = [fill; count];
array[..5].copy_from_slice(&[a, b, c, d, e]);

But this sugar would be nice, however it is super niche given that arrays are pretty much second class citizens in Rust (without const-generics).

comex · April 28, 2020, 9:36pm

Alternative: avoid the breaking change by gating this on the (hypothetical) 2021 edition.

matt1985 · April 28, 2020, 9:36pm

I would expect the [a, b, c, ..value] syntax to do this:

let foo = [4, 5, 6];
let bar = [1, 2, 3, ..foo];
assert_eq!(bar, [1, 2, 3, 4, 5, 6]);

comex · April 28, 2020, 9:38pm

Notably, that's how it works in JavaScript except with three dots instead of two.

josh · April 28, 2020, 10:38pm

I would like something like this as well.

I'd like to also be able to fill in entries other than the beginning, as in C:

let arr: [SomeType; 50] = [1: SomeType::Foo, 32: SomeType::Bar, ..SomeType::default()];

Also, I agree with @matt1985 that this proposal seems somewhat inconsistent with struct initializer syntax.

scottmcm · April 28, 2020, 10:51pm

Thinking about that, struct initializer syntax already supports tuple-structs using numbers:

    struct Foo(i32, i32, i32);
    let x = Foo(0, 0, 0);
    let y = Foo { 1: 2, ..x };
    dbg!(y); // Foo(0, 2, 0)

So we could do the same for tuples or arrays, if only there were a type name that could be put there...

(Here's another place where = for struct initializers would be nice, since (1= is clearly not going to be valid today, whereas (1: suggests type ascription.)

nwn · April 28, 2020, 11:56pm

That's a fair point! That version would eliminate the complexity of length inference.

In this case, the outcome of the proposed [a, b, c, ..d] would instead look like [a, b, c, ..[d; 5]].

steffahn · April 29, 2020, 9:30am

I would prefer not to use .. and create a breaking change. Even if you’d edition-guide the change, it’s still confusing to have [..a] differ from [(..a)] in such a fundamental way. Or think of [(..2), (..1), ..(..0)], crazy. It’s actually unfortunate how many different meanings .. already has. It’s part of expressions or patterns in ranges and range patterns (already two totally different things), and it is punctuation in struct update expressions, tuple (struct) patterns and struct patterns.

In a perfect world (“perfect” in my view), struct update, tuple patterns and struct patterns would already use .... Then we could have all sorts of things for arrays and slices:

Expression [1, 2, 3, 0...] for any [{integer}; N] with N ≥ 3
Expression [1, 2, 3, 0...; 6] for [1, 2, 3, 0, 0, 0]
Even expressions [1, 0..., 1] and [1, 0..., 1; 6].
Pattern [1, 2, 3, 0...] matching arrays or slices of length ≥ 3 starting with 1, 2, 3 and the (possibly empty) rest zeros.
Pattern [1, 2, 3, 0...; 6] matching slices or arrays of length 6 that match [1, 2, 3, 0, 0, 0].
Pattern [1, 2, 3, ...] with the same meaning of [1, 2, 3, _...] matching slices or arrays that start with 1,2,3.
Have consistency with tuple patterns that would look like (1, 2, ..., 2, 1) vs. the array / slice patterns [1, 2, ..., 2, 1].
This would mean that [0; 10] is a shorthand for [0...; 10]

This would introduce a syntactical distinction between ...a (in struct update) and a... (see above) where a... is repeating the element a and ...a is filling the rest with the fields of a. I find [1, 0..., 1] more readable than [1, ...0, 1], but that’s just an opinion.

Going back a step and trying not to intdroduce a bunch of new syntax, one could hope for const-generics supporting something like

impl<T, N: usize> [T; N] {
    const fn append<T, M: usize>(self, other: [T; M]) -> [T; N + M] {…}
}

in the future. This would allow [1, 2, 3].append([0; 3]) to be evaluated to an [i32; 6] at compile time. Perhaps even an overload for Add is possible, resulting in [1, 2, 3] + [0; 3], however I don’t know about const overloads of non-const methods.

HeroicKatora · April 29, 2020, 10:23am

When the default supplies the length, we could reuse the position where it would usually appear. This could avoid the breaking change. It's debatable if it is quite as intuitive but it looks workable to me. The confusing question would be if it supplies the tail, or the default elements.

let foo = [0; 4];
let bar = [1, 2; ..foo];
//               ^^^^^ only `usize` is accepted here currently
assert_eq!(bar, [1, 2, 0, 0]);
// OR
assert_eq!(bar, [1, 2, 0, 0, 0, 0]);

lordan · April 29, 2020, 12:01pm

Having a different token for updates/fills vs. ranges would indeed be nice!

I also like the idea to distinguish between "repeat a value" and "use this as template to fill in missing elements".

That is actually my go-to counter-example why I think conflating addition with concatenation was (and is) a mistake. To me this should be the addition of two 3-vectors: [1, 2, 3] + [4, 5, 6] == [5, 7, 9].

I'd rather prefer if we'd have a separate syntax for concatenation, e.g., ++, and could deprecate the use of Add for this in strings. That said, I'm not sure it's actually worth adding it over generalizing the existing fill syntax.

lambda-fairy · April 29, 2020, 12:50pm

While I'm aware that const generics are a long way away, I'd still love to see something like this:

impl<T: Default, const N: usize> [T; N] {
    fn resize_into<const M: usize>(self) -> [T; M] {
        // ...
    }
}

Then your example can be expressed as

let x: [i32; 6] = [1, 2, 3].resize_into();

with no new syntax.

kennytm · April 29, 2020, 6:01pm

This actually already works at the current level of const generics support in Nightly.

Unfortunately resize_default_into() cannot be a const fn, so it can't be used to initialize a const/static variable.

impl<T: Default, const N: usize> [T; N] {
    pub fn resize_default_into<const M: usize>(mut self) -> [T; M] {
        use std::mem::{MaybeUninit, forget};
        use std::ptr::copy_nonoverlapping;
        
        let mut result = MaybeUninit::<[T; M]>::uninit();
        let src = self.0.as_mut_ptr();
        let dest = result.as_mut_ptr() as *mut T;
        forget(self);
        unsafe {
            copy_nonoverlapping(src, dest, M.min(N));
            if M >= N {
                for i in N..M {
                    dest.add(i).write(T::default());
                }
            } else {
                for i in M..N {
                    src.add(i).drop_in_place();
                }
            }
            result.assume_init()
        }
    }
}

Play

kennytm · April 29, 2020, 6:14pm

BTW a macro is already possible, without any loops, if you're fine with copying the default value more times than required (I used ; instead of , as the separator before the .. to avoid ambiguity):

macro_rules! array {
    ($($value:expr),*; ..$def:expr; $len:expr) => {{
        let mut result = [$def; $len];
        let mut index = 0;
        $(
            result[index] = $value;
            index += 1;
        )*
        result 
    }}
}

const X: [i32; 6] = array![1, 2, 3; ..0; 6];
const Y: [&str; 6] = array!["hello", "world"; ..""; 6];
const Z: [(Option<f32>, usize, bool); 20] = array![
    (Some(1.0), 0x42, true),
    (None, 0x1a, false);
    ..(Some(0.4), 0xfe, true);
    20
];

Play

kaj · May 7, 2020, 10:37am

How about using the syntax we have for this, just allowing it to be cominded, like so:

let existing_1: [u8; 6] = [1, 2, 3, 4, 5, 6];
let existing_2: [u8; 6] = [0; 6];
let suggested: [u8; 6] = [1, 2, 3, 0; 3];

This may also open up for mixing the syntaxes more freely than the .. notation:

let wild = [u8; 8] = [1, 0; 3, 2, 0; 3];
assert_eq!(wild, [1, 0, 0, 0, 2, 0, 0, 0]);

This would be very powerful, and nice in certain situations, but maybe "too magic"?

L0uisc · May 7, 2020, 3:03pm

Not "too magic" IMO. It's simple to explain what it does. It doesn't depend on the phase of the moon and ten other seemingly unrelated things. It always does one thing.

zackw · May 7, 2020, 3:28pm

I like this but I vote for updating the style guide and rustfmt so that this is how you do the whitespace when there's both commas and semicolons between the square brackets:

let suggested: [u8; 6] = [1, 2, 3, 0;3];
let wild: [u8; 8] = [1, 0;3, 2, 0;3];

This makes it clearer that you're repeating just the one number right before each semicolon.

josh · May 7, 2020, 5:30pm

How common is the use case of wanting to fill "gaps" like that, rather than just "here's the one value that should be used anywhere I don't specify, and here are the values for specific indexes"?

In C99, it's common to do things like this:

some_struct array[] = {
    [5] = { .a = 2, .b = 10 },
    [8] = { .a = 3, .b = 24 },
};

And indexes 0,1,2,3,4,6,7 will all be filled with {} (a zero-filled version of some_struct).

That seems like it'd be the common case in Rust as well, rather than needing to specify each gap (and put the values in order, and count the size of the gaps).

L0uisc · May 7, 2020, 6:34pm

You see, I think we'd only be able to answer that question if we start using a language feature enabling it. In C99, it's also very common to then mutate the array afterwards if you couldn't initialize the array quite nicely in the initializer list thing. (C purists, please correct my terminology. I don't know what C calls it. I'm much more familiar with C++, and "initializer list" is what it's called there.) So you'd just mutate the array afterwards in a loop if you want to init it with all 1s except for a few spots.

Rust doesn't like that philosophy, but we can work around it with the following idiom:

let mut array: [u32; 500];
// loop which inits all the elements as you want
let array = array; // shadow data as immutable.

However, this proposal will make such workarounds redundant and less verbose. So I think we'd have to start using it to see how common it actually is and how useful it is. And what are the gotchas we don't think of now.

josh · May 7, 2020, 8:23pm

To clarify, I still think that we should support specifying a single default value to be filled in for any indexes that don't have a specific initializer. So if you want to have an array that's initialized to 1 everywhere except for the indexes you specify, you could do that. I'm just suggesting that I don't think we need the "repeat" syntax, and that I'd instead prefer a "single default value plus specific initializers" syntax.

Topic		Replies	Views
Pre-RFC: Array expansion syntax language design	12	3054	March 2, 2021
Pre-RFC: Extended array literal syntax language design	5	897	September 21, 2019
Current syntax	17	5562	March 25, 2019
C# Index Expressions	9	1180	March 25, 2019
Spread array shortcut for match branch and array destruction	8	902	November 21, 2023