Append/Push syntax for arrays versus dedicated methods versus ops::{Add, Sub}

This post tries to merge the recent discussion about push, append & pop (later on truncate) methods for arrays and the late-2020 to early-2021 discussion about push & append being language syntax.

My point is to discuss whether to wait for better const expressions and get those methods into core, create an expansion syntax that does not collide with the range syntax, or overloading arithmetic operators. For the syntax, my proposals are:

  • [...array, element1, element2], with chaining [...array1...array2, element1, element2]: new three-dots syntax.
  • [array @ .. element1, element2], with chaining [array1 @ .., array2 @ .., element1, element2]: overloaded at-range syntax.

The latter would integrate much more that syntax into the language, since now it is very niche, and I haven't seen much people who know about it. However, the three-dots one seems much more readable and is more pleasant for chaining & concatenating, which, at the end of the day, is one of the main points of the first discussion.

If this option is chosen, we could also implement it in tuples, and be able to call functions like this:

fn foo(x: i32, y: u32) -> i32 {
    unimplemented!()
}

fn bar(x: i32, y: u32, z: &str) -> String {
    unimplemented!()
}

fn baz(x: i32, y: u32, z: i128, a: usize) -> i128 {
    unimplemented!()
}

let tuple1 = (1i32, 2u32);
let tuple2 = (10i128, 5usize);
let a = foo(...tuple1); // three-dots syntax
let a = foo(tuple1 @ ..); // at-range syntax
let b = bar(...tuple1, "bar"); // three-dots syntax
let b = bar(tuple1 @ .., "bar"); // at-range syntax
let c = baz(...tuple1...tuple2) // three-dots syntax
let c = baz(tuple1 @ .., tuple2 @ ..) // at-range syntax

Personally, I don't like creating new niche syntaxes (we would be repeating the issue I just talked about when discussing the at-range), so as much as I like the three-dots, I vote for the at-range.

The last option would be overloading core::ops::Add with a const impl for every of the following:

  • [T; N] + [T; L],
  • [T; N] + T
  • T + [T; N]

Although I don't see this last option very coherent since:

  • In rust, patterns are much more idiomatic than overloading.
  • Truncate can't be nicely implemented overloading core::ops::Sub since there's no way to select the end to truncate. The only possible way would be by selecting the end based on the position of the array in the subtraction expression, which seems rather confusing and quite error-prone. However, it aligns with the 2 - 1 != 1 - 2 principle.

And they would still require const expressions...

There's a ton of people who loathe + on String, so I suspect you'll find plenty of opposition to using Add for array concatenation.

But also, one could reasonably expect [1, 2] + [3, 4] to be [4, 6] or [1, 2, 3, 4]. And when there are multiple reasonable options for an operator, that tends to mean it shouldn't be provided in core. Instead, those things should use named methods (or different types) instead.

20 Likes

I also dislike overloading operators for this purpose, and what you point out is the killing factor for discarding them as an option here. However, I like the idea of having [1, 2] + [3, 4] and then methods for concatenating, since that add operation resembles matrix addition.

It does, and I like that too, but as soon as you get to * everything gets complicated: matrix, dot, cross, elementwise, etc. So I think it's best to leave that stuff to dedicated math or simd libraries, rather than having it on every array.

(array::zip(a, b).map(Add::add) can always exist if needed.)

7 Likes

Exactly, word to word, what I was thinking. I guess we should try to move on to the original topic (discarding the operator overloading).

Dlang avoids the + ambiguity by having a ~ operator for appending and concatenation. In Rust the ~ sigil is free…

~ is taken by const traits, but I don't see any ambiguities. However, rust has a long history of preferring methods over operator overloading, so the language design team should be the one to choose the implementation between those 3 options.

@OP: you didn't explain why you think array concatenation needs more syntax. (I don't think it does.)

My opinion is that since we have destructuring assigments for truncating, it would be coherent to have "building assignments" for appending & concating, which may not limit themselves to arrays but also work with tuples. However, I prefer the language design team to be the ones to discuss the best option, since they have much more experience than me in this topic.

A very silly potential syntax, presented here for comedic effect moreso than actual utility:

let a = [0; 4];
let b = [0; 12];

let concat @ [.., ..] = (a, b);
let push @ [.., _] = (a, 0);
let [pop @ .., _] = b;

Basically, saying "This tuple looks kinda like an array. Let me match it kinda as one."

This does suggest an interesting option, though it relies on tuple packs, and may have undesirable interactions with multi-typed variadic tuples:

impl<T, const N: usize> From<(..T;N,)> for [T; N] {
    // SAFETY: Look at me. I'm the compiler now.
    fn from(tuple: _) -> Self { transmute(tuple) }
}

(Completely ad-hoc syntax.)


If something like this is done, using name @ .. is almost certainly the best bet syntax-wise. (Member of wg-syntax, but speaking for myself, not the wg (which hasn't done anything in a long time).)

I don't like treating tuples like arrays, it's confusing. I'd rather have building assignments on the right side:

let array = [...array1...array2, element1];
let array = [array1 @ .., array2 @ .., element1];

That'd be their main difference with destructuring assignments. That makes parsing them significantly easier and even allows us to use the existing syntax on destructuring assignments without any ambiguities. For example, this is a valid destructuring assignment:

let [a, b @ .., c] = array;

Don't you see how it is also a valid at-range syntax building assignment? The thing is destructuring creates new elements, which we name on the left side. Building assignments combine, so they shouldn't create more names. It makes no sense to include them in the side where we define new elements. They must be placed in the one where we create the values.

My proposal for building assignments are at the beginning of the reply & in the initial post.

Counterargument: this compiles fine:

let out @ [head, mid @ .., tail] = array;

and the normal ownership rules still apply, of course.

The exact syntax I used was using .. to mark a splat on the pattern side; this does indeed not really work.

But I disagree with that this is fundamentally a question of values and not names, because no, it's about names. (a, b) and [a @ .., b @ ..] are fundamentally the same value, the only difference is what you're allowed to do with it. Coercing from tuple-of-arrays to one large array would provide the desired functionality.


That said, I'm not arguing for anything along those lines. Using @ to splat is a good choice, for one specific reason: it maintains pattern-expression duality, e.g.

// like this is a noöp
let &name = &name;

// and this is a noöp
let (a, b) = (a, b);

// so is this
let [a @ .., b] = [a @ .., b];

That said, I'm not arguing for anything along those lines. Using @ to splat is a good choice, for one specific reason: it maintains pattern-expression duality, e.g.

Good point.

The duality is not complete though since this won't work:

let [a @ .., b, c @ ..] = [a @ .., b, c @ ..];

To actually be able to do the inverse operation you need a way to specify the length of a .. pattern to disambiguate multiple uses of .. in the same pattern.

The .. pattern isn't really the inverse of the spread operator though, it's more a "rest" pattern. I've wanted an actual spread pattern before which would allow that (a spread pattern being something like ".." <sub-pattern> where the sub-pattern must be array/slice-like and expands to a multi-element pattern of the element types similar to the rest pattern):

let [a @ ..[_, _], b, c @ ..[_, _, _]] = [1, 2, 3, 4, 5, 6];

Add in a shorthand for a non-binding lengthed array/slice pattern like [_; 2] and that gets quite nice IMO:

let [a @ ..[_; 2], b, c @ ..[_; 3]] = [1, 2, 3, 4, 5, 6];

EDIT: Or maybe that would be

let [..(a @ [_; 2]), b, ..(c @ [_; 3])] = [1, 2, 3, 4, 5, 6];

depending on how the spread and identifier patterns inter-operate.

Complexity escalates quickly, but if you want to propose it, I suggest you to open a new language design thread.

Minor note, but the [...a...b, c] syntax is extraordinarily close to the JavaScript [...a, ...b, c] syntax: note the additional comma. Rust already murders my muscle memory with JS/TS function syntax, I'd rather not have this issue too!