Pre-RFC: Array expansion syntax

This is an evolution of a pre-RFC I wrote a few months ago.

Summary

Allow shorthand syntax for inserting a subarray into an array literal. For example:

let x = [3, 2, 1];
let y = [4, ..x, 0]; // Insert the elements of `x` into the literal
assert_eq!(y, [4, 3, 2, 1, 0]);

Motivation

Simpler array definitions

Concatenation is a fundamental operation of arrays. Yet in current Rust, there's no way to express array concatenation at compile time. Even with the eventual stabilization of const library features like copy_from_slice, the user must specify the length of the resulting array, which is not necessarily known to them (though it is always to the compiler).

For example, consider the concatenation of two run-time evaluated arrays. Even leaving aside the actual concatenation, the length of the resulting array cannot be determined by the user at compile time.

let a = compute_a();
let b = compute_b();
let c: [u8; a.len() + b.len()] = concat_arrays(a, b);
//          ~~~~~~~~~~~~~~~~~
//           Error: attempt to use a non-constant value in a constant

Since the contents of a and b are not known at compile time, we are unable to obtain their combined length, though this is known to the compiler. This proposal would allow the user to leverage the compiler to sidestep this issue entirely.

let a = compute_a();
let b = compute_b();
let c = [..a, ..b];

This syntax aims to provide an intuitive and generalized notation for building arrays from their constituents.

The RLE pattern

In C and C++, it is common to define arrays by their first few elements, letting the remainder be default-initialized, like so:

int int_array[6] = {1, 2, 3}; // Final elements initialized to 0
array<string_view, 6> str_array = {"Hello", "World"}; // Final elements initialized to ""

Rust currently has no analogous syntax. With this proposal, this gap could be filled like so:

let int_array = [1, 2, 3, ..[0; 3]];
let str_array = ["Hello", "World", ..[""; 4]];

This syntax is in fact more powerful than the C/C++ version in that arbitrary values can be inserted into the array, rather than just repeating the default-initialized value.

let alt = [Some(1.0), None];
let seq = [..alt, ..alt, ..alt];

Moreover, it allows multiple expansions to occur anywhere within an array literal. This effectively allows a run-length encoding of array literals:

let rle = [1, ..[0; 32], 2, 3, 4, ..[-1; 28]];
let zimin0 = [0];
let zimin1 = [..zimin0, 1, ..zimin0];
let zimin2 = [..zimin1, 2, ..zimin1];
let zimin3 = [..zimin2, 3, ..zimin2];
assert_eq!(zimin3, [0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0]);

Either of the above examples using the existing syntax would require care both when transcribing and when reading for patterns.

Less redundancy

Repetition in code can lead to bugs and reduce the readability of the code. A large array with many repeated elements is not currently obvious with the existing syntax. A reader of the code would be required to scan the entire array to determine any deviance.

Similar problems occur when defining subarrays, where separately defined literals can be less readable and can easily fall out of sync.

const PNG_HEADER: [u8; 8] = [ 0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a ];
const PNG_IHDR: [u8; 4] = [ 0x49, 0x48, 0x44, 0x52 ];
const PNG_IDAT: [u8; 4] = [ 0x49, 0x44, 0x41, 0x54 ];
const PNG_IEND: [u8; 4] = [ 0x49, 0x45, 0x4e, 0x44 ];
const IMAGE: [u8; 76] = [
    ..PNG_HEADER,
    0x00, 0x00, 0x00, 0x0d, ..PNG_IHDR,
    0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0x00, 0x00,
    0x37, 0x6e, 0xf9, 0x24,
    0x00, 0x00, 0x00, 0x10, ..PNG_IDAT,
    0x78, 0x9c, 0x62, 0x60, 0x01, 0x00, 0x00, 0x00, 0xff, 0xff, 0x03, 0x00, 0x00, 0x06, 0x00, 0x05,
    0x57, 0xbf, 0xab, 0xd4,
    0x00, 0x00, 0x00, 0x00, ..PNG_IEND,
    0xae, 0x42, 0x60, 0x82,
];

The proposed syntax makes such code more convenient to write and more clear to read.

Guide-level explanation

It can be useful to insert an existing array into an array being created. This can be done using array expansions.

Any of the comma-separated elements in an array expression may be replaced with an array expansion of the form ..<expr>, where <expr> is itself an array. This inserts each element of <expr> into the new array at that position as though they had been written out explicitly. The array being defined and <expr> must therefore have equivalent element types.

For example, say we wish to define array as follows:

let sub_array = [3, 4];
let array = [1, 2, sub_array[0], sub_array[1], 5, 6];
assert_eq!(array, [1, 2, 3, 4, 5, 6]);

This could be written more concisely using an array expansion:

let sub_array = [3, 4];
let array = [1, 2, ..sub_array, 5, 6];
assert_eq!(array, [1, 2, 3, 4, 5, 6]);

This notation is even necessary when inserting an array of !Copy elements. For example, the following snippet does not work since the value of sub_array is moved by sub_array[0] before we can move sub_array[1]:

let sub_array = [String::from("1"), String::from("2")];
let array = [String::from("0"), sub_array[0], sub_array[1], String::from("3")];

Instead we can move the entirety of sub_array at once:

let sub_array = [String::from("1"), String::from("2")];
let array = [String::from("0"), ..sub_array, String::from("3")];

Reference-level explanation

This proposal does not affect existing array expressions: array expressions without an array expansion and repeat-style array expressions still behave exactly the same. This only extends array expressions.

Desugaring

The examples in the guide-level explanation can be considered to desugar as follows:

let sub_array = [3, 4];
let array = [1, 2, ..sub_array, 5, 6]; // This desugars to the expression below.

let array = {
    let elem_0 = 1;
    let elem_1 = 2;
    let [elem_2, elem_3] = sub_array;
    let elem_4 = 5;
    let elem_5 = 6;
    [elem_0, elem_1, elem_2, elem_3, elem_4, elem_5]
};
let sub_array = [String::from("1"), String::from("2")];
let array = [String::from("0"), ..sub_array, String::from("3")]; // This desugars to the expression below.

let array = {
    let elem_0 = String::from("0");
    let [elem_1, elem_2] = sub_array;
    let elem_3 = String::from("3");
    [elem_0, elem_1, elem_2, elem_3]
};

Note that the array expansion is evaluated exactly once, even if it expands to no elements. This matches the behaviour of repeat-style arrays of length 0. An array containing only an array expansion behaves exactly like a copy/move of the expanded expression. This means the following are also equivalent:

let x = [ ..some_array() ];
let x = some_array();
let x = [ ..[{ println!("Side effects"); true }; 0] ];
let x = [ { println!("Side effects"); true }; 0 ];
let x = {
    let _elem = { println!("Side effects"); true };
    []
};

Length Inference

The length of an array expression with expansions is the sum of:

  • The number of non-expansion elements in the expression
  • The length of each array being expanded

Errors

Several errors can arise when using this feature. Each of these is an instance of existing errors.

  • If the computed length of an array expression does not match the expected length, this is treated as a type mismatch, as usual:

    let x: [u32; 3] = [..[42; 2]];
    

    the following error is produced:

    error[E0308]: mismatched types
     --> src/main.rs:2:23
      |
    2 |     let x: [u32; 3] = [..[42; 2]];
      |            --------   ^^^^^^^^^^^ expected an array with a fixed size of 3 elements, found one with 2 elements
      |            |
      |            expected due to this
    
  • If array expansions are used to create a circular definition, the usual error results:

    const X: [u32; 1] = [..Y];
    const Y: [u32; 1] = [..X];
    

    The following error is produced:

    error[E0391]: cycle detected when simplifying constant for the type system `X`
     --> src/main.rs:2:1
      |
    2 | const X: [u32; 1] = [..Y];
      | ^^^^^^^^^^^^^^^^^^^^^^^^^^
      |
    note: ...which requires simplifying constant for the type system `Y`...
     --> src/main.rs:3:1
      |
    3 | const Y: [u32; 1] = [..X];
      | ^^^^^^^^^^^^^^^^^^^^^^^^^^
      = note: ...which again requires simplifying constant for the type system `X`, completing the cycle
    
  • If the element type of an expanded array does not match that of the other elements:

    let x = [true, false, ..[7]];
    

    yields the following error:

    error[E0308]: mismatched types
     --> src/main.rs:2:19
      |
    2 |     let x = [true, false, ..[7]];
      |                              ^ expected `bool`, found integer
      = note: expected array `[bool; _]`
                 found array `[u32; 1]`
    
  • If the expression in an array expansion is not an array:

    let x = [true, false, ..7];
    

    yields the following error:

    error[E0308]: mismatched types
     --> src/main.rs:2:19
      |
    2 |     let x = [true, false, ..7];
      |                             ^ expected array `[bool; _]`, found integer
    
  • If the element type of an array cannot be determined:

    let x = [..[]];
    

    yields the following error:

    error[E0282]: type annotations needed for `[_; 0]`
    --> src/main.rs:2:13
      |
    2 |     let x = [..[]];
      |         -      ^^ cannot infer type
      |         |
      |         consider giving `x` the explicit type `[_; 0]`, with the type parameters specified
    

Interactions with RangeTo

The ..<expr> syntax is given special meaning within an array expression with higher precedence than the RangeTo operator. This only applies when the array expansion is a direct child of the array expression. Other range operators are unaffected. This means that the following hold:

let x = [1, 2, 3, ..[0; 3]]; // Interpreted as an array expansion, not a range expression
assert_eq!(x, [1, 2, 3, 0, 0, 0]);

let x = [..[2]]; // Interpreted as an array expansion, not a range expression
assert_eq!(x, [2]);

// Other range expressions are unaffected
assert_eq!([0..2], [Range { start: 0, end: 2 }]);
assert_eq!([..], [RangeFull]);
assert_eq!([2..], [RangeFrom { start: 2 }]);
assert_eq!([0..=2], [RangeInclusive::new(0, 2)]);
assert_eq!([..=2], [RangeToInclusive { end: 2 }]);

When not a direct child of the array expression, the RangeTo operator is unaffected. That is, subexpressions and parenthesized expressions are unaffected. Therefore the following also hold:

// Parentheses prevent the expression from being treated as an array expansion.
assert_eq!([(..2)], [RangeTo { end: 2 }]);

// Subexpressions of an element are also treated normally.
let x: [RangeTo<u32>; 1] = [{ ..2 }]; // `RangeTo` expression within a block expression
let x: [RangeTo<u32>; 2] = [.. ..2]; // `RangeTo` expression within an array expansion (parses, but yields a type error)

Drawbacks

Breakage

The biggest drawback is that this is a breaking change to the language. As discussed above, existing code using arrays of RangeTo literals would conflict with this syntax and fail to compile. The exact interactions are detailed above.

The breakage introduced can be handled relatively harmlessly.

  • To our knowledge, broken code (RangeTo literals within array expressions) is used extremely rarely. A crater run should be performed to confirm this.
  • Such breakages can be easily detected and programmatically fixed across an edition boundary.
    • RangeTo literals can be parenthesized to avoid the conflict.
    • The resulting type mismatch can be easily recognized.
  • Code with such conflicts will almost always fail to compile, rather than miscompile. Miscompilation only occurs when:
    • The array only contains literal RangeTo expressions on array types, and
    • The array's type is not specified in any of its other use.

This breakage can also be considered to extend to users' mental models. Users may be confused when their RangeTo expression unexpectedly yields a type error about array expansions. This can be mitigated by detecting when such errors occur in a [RangeTo<_>; _] literal, and suggesting that the element be parenthesized.

let first = ..4;
let array = [first, ..5];
error[E0308]: mismatched types
 --> src/main.rs:2:26
  |
2 |     let array = [first, ..5];
  |                           ^ expected array of `u32`, found integer
help: try using parentheses here:
  |
2 |     let array = [first, (..5)];
  |                         ^^^^^

Further overloading ..

The proposed syntax adds yet another meaning to the .. symbol, which some already consider too overloaded. Our hope here is to align with users' existing familiarity with .. in struct update contexts, making this "expanding on a meaning" rather than "adding a meaning".

Limitations

This syntax does not work in pattern positions, where it conflicts with unstable half-open range patterns. However, precedence for such differences between expressions and patterns exists. Both RangeTo literals and repeat-style array literals (e.g. [true; 5]) cannot appear in patterns, so it is not too surprising that array expansions (as proposed here) also cannot.

Rationale and alternatives

Rationale

The proposed syntax was chosen mainly for its familiarity. It reflects the meaning of the .. in the struct update syntax, namely "copy/move fields/elements from what follows". It should also be intuitive for newcomers from Javascript since the .. acts similarly to the ... in Javascript's spread syntax.

Alternatives

  • Implementing this as a macro in either std or an external crate. This may not currently be possible due to the lengths of non-const arrays not being available in const contexts. (For example)

  • An alternative syntax that doesn't conflict with RangeTo, e.g. assert_eq!([1, 2, 3, 3, 3], [1, 2, ...[3; 3]]) leveraging the existing (but unused) ... token.

Prior art

Javascript

Javascript has a similar, though less restrictive, feature in its spread syntax, which allows arbitrary iterables to be expanded to discrete arguments to a function call or elements to an array literal. The spread syntax also works for object expressions, where it acts similarly to Rust's struct update (FRU) syntax.

This proposal can be thought to similarly extend Rust's existing "spread" syntax to the context of array literals.

C and C++

As described above, a related feature is present in C and C++. In C, missing elements in an initializer are implicitly initialized to zero (NULL, etc.). C++ improves on this design by default-initializing any missing elements, allowing for more complex types in this position.

Both C and C++ suffer from the problem that arrays are silently and implicitly filled when elements are missing. This can lead to unexpected behaviour and bugs. Still, the convenience of this feature means that it continues to be used frequently. The proposed feature solves this problem while improving usability by making the behaviour explicit and opt-in, and by using only user-defined values.

Unresolved questions

  • Is this the best syntax for such a feature?
  • Could this be implemented as a macro without loss of usability?
  • Are array expansions "transparent" to eventual length inference?

Future possibilities

  • This syntax can be implemented for dynamically-sized slices in the vec! macro.
  • The proposed syntax does not preclude any of the existing indexed array initializer proposals.
6 Likes

You can do array concatenation with const-generics, and with a little more effort in const fn you can even do it at compile time (the only thing missing on nightly is ptr::write).

Here's a macro for array concatenation: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=c4de737c39cae6227ac7ac3ac850a3db

Here's a comparison against just using a literal: https://godbolt.org/z/4zjjr1

Currently the macro caps out at 26 elements, but that can be extended by editting the macro a little

1 Like

This reminds me of tuple splatting, such as Splats and tuples - Crystal -- should it work for that as well? Is there anything specific about arrays here? What about mixing? Can I convert a let p = (1, 2); into an array with [...p]?

It's not at all clear to me that a breaking change here is worth it. Why not just use three dots, say, like C++ parameter pack expansion? (Oh, and like in javascript, you mention later.)

The FRU parallel here doesn't feel that strong to me, since FRU is the same type, but this almost never would be (it wouldn't be illegal, but [..a] would just be a). In fact, the FRU parallel might have me expect that [1, ..[0;3]] is [1, 0, 0] because it only uses the rest of the elements. (If you think of an array as a tuple struct with homogeneous fields and repr(C), then ['a', ..['b';3]] is Array3 { 0: 'a', ..Array3('b', 'b', 'b') } => Array3('a', 'b', 'b'), not something producing Array4.)

I would expect const fn concat_arrays<T, N: usize, M: usize>(a: [T; N], b: [T; M]) -> [T; {N+M}] to work eventually, which would not require that the user specify the length of the resulting array.

Now, that doesn't necessarily mean this is a bad feature, but I'm not convinced by this part of the motivation.

5 Likes

Indeed, this isn't an original idea. The splat operator also exists in Ruby, PHP, Python, and probably more.

I agree that this is questionable. I'm not opposed to using ... instead, but I do see the value in mirroring the FRU syntax.

That's fair. However, there have already been discussions about extending FRU to differently parameterized structs. With arrays moving in the direction of being parameterized by their length, I believe this parallel becomes even stronger since we drop the expectation of preserving the exact type.

struct Spanned<T> {
    token: T,
    span: Span,
}
let ch = Spanned { token: 'c', span: Span };
let str = Spanned { token: "c", ..ch }; // Builds a Spanned<&str> from a Spanned<char>

let three = [0; 3];
let four = [1, ..[0; 3]]; // Builds an Array<u32, {4}> from an Array<u32, {3}>

That's true, and RustyYato gives an example of how a variadic macro might look to do the same. That section was written with current stable Rust in mind, but was perhaps shortsighted.

1 Like

I don't the RangeFrom-like syntax makes all that much sense. Could we just have a const implementation of + for arrays?

This is something that I think could be valuable; at least I use similar features quite often in some other languages (albeit mostly in dynamic ones where it complements or replaces iterator/comprehension syntax – notably, Python).

However, I think the following specific case could be better served by improving const evaluation:

Unless you mean that a and b are slices – if they are really arrays, then their .len() ought to be const.

2 Likes

The terminology gets a bit confusing here. When a and b are non-const arrays, I don't believe there's any way to get their lengths at compile time. Even though .len() is a const fn and depends only on the types of a and b, we cannot get their lengths since passing non-const arguments to const functions is not allowed at compile time. (For example)

Sure, it does not compile today, but I don't see why the second one couldn't be made work some day. It should "only" be a matter of more precise tracing of which subexpressions and generic parameters pass constness through. (It's a nontrivial amount of work for T-compiler, but there's nothing theoretically impossible in it.)

For instance, the following compiles today in stable Rust:

fn main ()
{
    let arr1 = [0, ::std::env::args().count()]; // not const
    let arr2 = [2, 3, 4];
    let arr3: [_; 5] = concat(arr1.into(), arr2.into()).into();
    dbg!(arr3);
}

It does use ::generic_array cumbersome generics and a bit of unsafe, so it's well far from ideal, but it showcases that the fact the items of an array are not const does not change the fact that the array lengths, which are encoded within the type information, thus are.

Here is, for instance, a glimpse from the future:

fn main ()
{
    let arr1 = [0, ::std::env::args().count()]; // not const
    let arr2 = [2, 3, 4];
    let arr3: [_; 5] = concat(arr1, arr2);
    dbg!(arr3);
}

fn concat<Item, const N1: usize, const N2: usize> (
    arr1: [Item; N1],
    arr2: [Item; N2],
) -> [Item; N1 + N2]
{
    #[repr(C)]
    struct Contiguous<_0, _1>(_0, _1);
    
    unsafe
    fn transmute_unchecked<Src, Dst> (src: Src) -> Dst
    {
        ::core::mem::transmute_copy::<Src, Dst>(
            &*::core::mem::ManuallyDrop::new(src)
        )
    }
    
    unsafe {
        transmute_unchecked(Contiguous(arr1, arr2))
    }
}

Regarding the OP as a whole, I am much in favor of that kind of "array splice" operator, provided it uses a sigil distinct from that of FRU, for the reasons @scottmcm pointed out.

4 Likes

If we're going to have subslice destructuring patterns (Oops, we already do! I consulted obsolete docs, thanks @matt1985), there should IMO be a constructing syntax that mirrors the pattern syntax, as is normal for patterns. Indeed I think that besides constructing sized arrays from sized arrays, it should be possible to construct unsized slices from other unsized slices with the same syntax, as long as the slice being constructed is owned (ie. it should also be possible to do something like vec![a, b, ..cs, d] and so on, because you can also destructure a Vec like that.

I find that FRU is very much analoguous to this "slice splatting", and should preferably share the syntax. Optimally, that syntax should not be .. which is overloaded for ranges, and I've noted before that migrating to ... might be possible in the long term. I'd also like to lift the same-type restriction of FRU to allow differently parametrized instances of the same generic type, which would indeed be completely analogous to allowing different-sized instances of the same family of array types.

1 Like

That stabilized in Rust 1.42.0 with the foo @ .. syntax.

Example: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=a4338effc5c25cd7b1938d8b334c41ea

fn main() {
    let arr = [0_i32, 1, 2, 3, 2];

    if let [a, b @ .., c] = &arr[..] {
        dbg!((a, b, c));
        // a: &i32
        // b: &[i32]
        // c: &i32
    }

    {
        let [a, subarray @ .., c] = arr;
        dbg!((a, subarray, c));
        // a: i32
        // b: [i32; 3]
        // c: i32
    }
}

Same code in godbolt.org, using the 1.42.0 compiler

3 Likes

There are semi-regular complaints about + meaning concatenation on String, and I think things get worse for arrays -- [1, 2] + [10, 20] => [11, 22] is also a reasonable and useful implementation of +.

So I suspect that + will remain unimplemented for arrays. (The SIMD work will likely provide types that provide element-wise addition and multiplication and such, but ordinary arrays probably won't.)

6 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.