Variadic generics design sketch

First: I'm a member of T-opsem and as such know a significant amount about the semantics of Rust and how those impact what transformations/optimizations the compiler is allowed to do without deriving further proofs about the code. For this purpose, though, I am speaking exclusively for myself and do not carry any T-opsem influence, and just offer my understanding, which will never be absolutely flawless.

I was also a member of wg-grammar while that was a thing. It effectively isn't anymore, but it's still structurally present.

Ah, must've missed that when I scanned through.

While this kind of shuffling does get optimized out, given that unsized_fn_params needs a different solution and that "iterate pack by reference" would rather be spelled static for item in &pack (which creates a reference to the pack) than static for ref item in pack (which doesn't, but would also be a consuming iteration by analogy to regular for), I'd lean towards uniformly always treating parameter packs differently from plain tuples. The existence of the reference to a pack-tuple can still be fairly easily optimized out if the reference never escapes the local function, but given the compiler needs to treat packs specially roughly always and the language needs to treat packs specially when they potentially contain unsized params, it feels more consistent to consistently treat parameter packs as "tuple-like" but a distinct concept from tuples.

There's also lifetime packs, which clearly need to always be their own thing. Variadic syntax should work for tuples and borrow from tuple vocabulary, but I think it's fair to admit that packs are always a distinct concept from tuples, with tuples being more restricted but thus offering more options. An interesting way of looking at it would be to say (A, B, C) is #[repr(Rust)] (A, B, C) but variadic packs are #[repr(RustVariadic)] (A, B, C). This mostly communicates how they're similar but distinct types (like extern "Rust" fn vs extern "C" fn), but is still an insufficient lie — packs shouldn't be described as a singular object that has a consistent layout — because of the following point.

Additionally, it's a subtle pitfall, but it's not sufficient to only prove absence of taking the address of the entire pack to justify eliding data shuffling into tuple layout. It's also required to prove absence of taking the address of any item in the pack. This is the case even with strict subobject provenance slicing, because since the items in the tuple are all part of the same Rust Allocated Object, operations which care about the containing Rust Allocated Object require that (stack) allocation to exist with the correct layout and for any subobject with its reference taken to be at the correct location in that tuple object. (Notable examples of such operations include pointer arithmetic and comparison.) If we want variadics to be a properly "zero overhead abstraction" (i.e. that going without the abstraction doesn't get you better codegen), we need these to optimize to identical code: [godbolt] (ignoring variadic typeck implications for the example)

#![allow(improper_ctypes)]

// pass-by-ref ABI
pub struct Data([u64; 8]);

extern "Rust" {
    fn sink(
        a: &Data,
        b: &Data,
        c: &Data,
        d: &Data,
    );
}

// variadic form
pub unsafe fn f1(...data: (...Data; 4)) {
    sink(...&data,)
}

// monomorphizes to
pub unsafe fn f1(
    data_0: Data,
    data_1: Data,
    data_2: Data,
    data_3: Data,
) {
    let data = (data_0, data_1, data_2, data_3);
    sink(&data.0, &data.1, &data.2, &data.3);
}
// -> a bunch of stack shuffling and then a call

// manually monomorphized
pub unsafe fn f2(
    data_0: Data,
    data_1: Data,
    data_2: Data,
    data_3: Data,
) {
    sink(&data_0, &data_1, &data_2, &data_3);
}
// -> just a tail call (jmp)

Unless packs somehow are distinct from tuples, this does mean that packs including unsized parameters are extremely difficult to work with, because you'd be limited to only ever passing them by value; any attempt to use a reference to the value could potentially the creation (since the use of the reference, if not inlined, can't be proven not to observe the address and do the problematic pointer things to the reference).

Plus, presuming that static for gets evaluates-to-tuple semantics, converting from pack to tuple isn't all that syntactically involved (static for item in pack { item }) and clearly communicates to the reader that there's a potential address impacting data shuffling cost here. (But further down I do express trepidation for that behavior.)

The nuclear option for the compiler would be to attempt to treat packs differently than tuples until the whole pack is used as a tuple to avoid this subtle pessimization.


I've had a bit more time to look a bit closer, so here are some specific notes I didn't see addressed:

...

... is already used for C variadics (RFC #2137). This should be unambiguous, but should be kept in mind.

... is deprecated but still allowed in pattern position for inclusive ranges for edition2015 and edition2018. Edition2021 makes it a hard error semantically but it's still allowed grammatically (e.g. under #[cfg(false)] or an unused macro $:pat capture). Range-to patterns with ... are a syntax error in all editions, so there's again no strict conflict, but the potential developer impact still should be kept in mind.

However, it also needs to be noted that exclusive range patterns ..x are already stably permitted syntactically and unstably permitted syntactically. This is why "rest" patterns need to be bound with ident @ ... ... has been deprecated and removed as an exclusive range pattern because of the confusability with the two-dot exclusive range, so making it such that .. and ... are ever valid in the same syntactical position is a notable confusion risk. To be completely fair, ranges and variadic packs are very different and unlikely to both be valid patterns for the same scrutinee type[0], but such a small visual difference between two valid bits of syntax is usually a poor idea.

I don't have a different proposal, though.

[0]: The one case I can think of is when matching against one member of an array/tuple pattern, with ...X as the pattern and some const X item in scope. As a pack pattern this would shadow the constant item, but at least still generate a warning for the nonstandard style, which could also point out the X in scope and ask if ..X was meant instead.

ref ...head

By analogy to ref head @ .., this should probably be spelled ...ref head instead. (The pattern binding mode decorates the binding, not the pattern.)

It should probably be explicitly noted somewhere whether unpacking patterns can be arbitrary patterns (e.g. ...Foo(a, b, c) for an unpack of Foos) or if they're restricted to being simple binding patterns. Arbitrary patterns should be allowed imho (and that allows ...ref head just like any other pattern rather than handling it as a special case).

For arrays, ... patterns serve as an alternative to ident @ ... [...] ... patterns also work with tuples.

Obviously, pattern matching ...rest on a tuple is producing a tuple. Tuple structs also produce tuples for a ...rest pattern. Arrays are the odd one out in that ...rest produces an array. Well, and slices would produce a slice, if permitted there, which has been left as an unresolved todo for now.

I will admit that matching kind is almost certainly preferable than rest @ .. making an array but rest @ ...

[static for] is unrolled at compile-time.

It should be noted in the eventual documentation that this is macro-style expansion (so I'd word it as "expanded" instead of "unrolled"). Notably to call out that this means the syntax is the same across each member but name resolution can give different results.

For the concrete case, anyway. Along with that it should be noted that static for name resolution for generic parameter packs is still done with the generic bounds only.

static for loops evaluate to a tuple.

I think it was already noted upthread, but if we want to not require every (almost: see next point) static for to have a semicolon, we'll have to adjust the semicolon elision rules. The current one is that expressions with blocks[2] are syntactically statements when in statement position[1] and are semantically rejected if their evaluated type is anything other than (). For (probably just static for) we'd need to also accept (...(); N).

We probably wont ever change nonstatic for to be able to produce a nonunit value, so I'm actually slightly preferable to leaving static for as always being a ()-valued statement expression rather than evaluating to a tuple. (That makes the pre-section-break about easy conversion to a tuple weaker, though.) Getting a tuple out could then be done the same way is, by creating the output place first and assigning to it. This also makes continue/break more properly consistent with their use in nonstatic for.

[1]: Certain following tokens like . can make them be expressions, but thankfully this is a purely syntactical decision. Also thankfully, loop { break 1 } $op 1 parses the same with either + or - (a statement and a prefix operator expression), despite prefix + not being valid.

[2]: Expressions are classified as block expressions not just for having a trailing block, but also need to be able to evaluate to (); this is why async {} isn't a block statement, for example. It would be very annoying if it were, since to return async {} as a tail expression you'd need to wrap it in (); if it were considered a expression-with-block, it would be parsed as a statement when in plain "tail expression" position.

[static for expansion semantics]

If static for creates "place aliases" instead of a fresh place, some things become a lot easier. To make the comparison clearer, consider this example:

static for i in (a, b) {
    Some(i)
}

There's two ways of translating this; either as a fresh binding/place, roughly:

(match a { i => {
    Some(i)
}, match b { i => {
    Some(i)
})

(using match instead of let because of temporary lifetime implications) or as a place alias, roughly:

({
    Some(a)
}, {
    Some(b)
})

The semantic difference becomes more obvious if we use a more interesting example, e.g. a body of Some(&i) instead:

// fresh place; value does not live long enough
(match a { i => {
    Some(&i)
}, match b { i => {
    Some(&i)
})

// place alias; references each item in the pack
({
    Some(&a)
}, {
    Some(&b)
})

A pattern of ref i would get the same behavior from both expansion forms, but the "place alias" form has a more "do what I mean" behavior. Support for the sort of analogous fine subplace "by use capture" already exists for edition2021+ closures, so it's not unprecedented, and getting "place alias" semantics for static for wouldn't be an outsized amount of additional work compared to using fresh bindings. It would be nonzero, though, since named places aren't a thing in the frontend yet. But "autoref bindings" have been discussed as desirable for a while, at least, and have very similar implications on the frontend as place aliases.

The place alias semantics also make assigning to a tuple without evaluate-to-tuple semantics a lot easier, since it could allow you to static for over uninitialized places:

// variadic
let out: (...,);
static for place, item in out, pack {
    place = Some(item);
}
let tuple = (...out);

// expanded
let (out_0, out_1);
let _: ((), ()) = ({
    out_0 = Some(pack.0);
}, {
    out_1 = Some(pack.1);
});
let tuple = (out_0, out_1);

// Yes, Rust allows deferred initialization of let bindings.

It's less convenient than just evaluating to the tuple, but it does directly mirror what you would do with nonstatic for (if you aren't using iterator combinators).

You can static for over multiple unpackable values at once, as long as they are the same length.

I'm not super fond of static for a, b in as, bs as the syntax for zipping unpackables, since it's so close to a pattern of (a, b), which would unpack each item in a single unpackable. That said, , is in the follow set of both patterns and expressions, so a grammar of static for $($pat:pat),+$(,)? in $($expr:expr),+$(,)? $block:block works without any problems and there's no real alternative I can offer.

error: unexpected `,` in pattern
  -->
   |
LL |     for a, b in iter {}
   |          ^
   |
help: try adding parentheses to match on a tuple
   |
LL |     for (a, b) in iter {}
   |         +    +

This proposal uses square brackets [for lifetime packs]

Square brackets are []. <> are angle brackets, when refered to as brackets.

In terms of syntax, parenthesized list ('a, 'b, ...) is out, as () would be ambiguous in generic argument lists—is it an empty set of lifetimes, or is that set elided and it’s actually a type?

It's almost certainly not happening for edition2024, but I'm actually in favor of entirely deprecating lifetime elision in generic lists, and requiring the use of '_ to explicitly use an inferred/elided lifetime. The main benefit is that doing so would unlock the ability to have defaulted lifetime parameters instead of leaving them to inference, sidestepping the difficulties of making defaults affect inference. The main difficulty is (lack of interest and) that an explicitly empty generics list (e.g. ::<>) is (almost) equivalent to the lack of a generics list; it doesn't specify the lack of any generic arguments (and thus the use of the defaults), it just leaves them as inferred. E.g. you can use Vec::<>::new() just fine, despite Vec<> being an error (struct takes at least 1 generic argument but 0 generic arguments were supplied) as a type annotation.

But even if this were made to be the case and, due to the nonelision of lifetimes, () wouldn't be ambiguous anymore, old editions would still be unable to specify an explicitly empty lifetime variadic. Such an outcome of edition differences is considered at best undesirable.

Without #![feature(unsized_fn_params)], you can’t produce a value of a mutiply-unsized tuple at all.

You need to add "safely" to this assertion; it's possible to create compound structs with an unsized tail on stable Rust, but it's unsafe and very manual. Run this through Miri to get some confidence that it's not doing anything UB (like e.g. creating an allocation with incorrect layout). [playground]

We use usize to store arity, so with this proposal it’s impossible to have a tuple or tuple struct with more than usize::MAX members. Such code would be incredibly degenerate anyway, so I doubt this restriction will be a problem. But it is, in theory, a potential source of post-monomorphization errors.

More than half of the pack items would need to be zero-sized for this to become the issue. If that isn't the case, you'll run into "values of type are too big for the current architecture" errors far sooner, which are also post-mono and (currently) don't get emitted from statically unreachable code.

Type-level for-in

Your markup breaks here.

.. inferred type parameter lists

Why isn't this ...? .. is a rest pattern, but given the pattern name @ .. can be equivalently and more succinctly written as ...name with these language extensions, I'd generally expect to see ... significantly more than .. not meaning a range.

Functions with variadic generic parameters

...ah, apparently since I had first read this I had forgotten that you're (seemingly?) requiring variadics calls to be syntactically provided with a tuple. (Or is that just for explicitness in the examples? If so, an early example should call out that either tupled or not are both considered valid call syntaxes.) This seems a bit inconsistent between the generic parameters (don't use tupling) and the value parameters (do).

Coming into this apparently more fresh than I thought, I'd expect fn drop_all<...Ts>(ts: Ts) to be variadic and accept any number of parameters, and for fn drop_tup<...Ts>(ts: (...Ts,)) to take a tuple argument of any arity. (The former would have separate function argument ABI; the latter would have an ABI taking a single tuple. Both don't tuple their generic arguments.) Instead, if I read correctly, those should be spelled fn drop_all<...Ts>(...ts: ...Ts) and fn drop_tup<...Ts>(ts: Ts).

Tupling at the call site also has implications similar to why place colocation for function parameters can be surprisingly difficult. Like in that case it's not necessarily a deal breaker given careful semantics, it's still potentially surprising to ever permit observably overlapping live places. (This can matter to unsafe even if no provenance valid pointers alias.)

... patterns in parameter lists also work with arrays.

Do they produce array bindings here, like when in local pattern bindings, or do they produce tuples?

Combining function parameter ... and variadic generics gives us varargs.

So I got ahead of myself a bit in the section about just variadic generics but not arguments.

If variadics are tuples, then it makes sense that ts: Ts and ts: (...Ts,) would be equivalent. However, I remain (for now, at least) of the thought that "just" doing so is still overly restrictive on the compiler if we want to do so without significant semantic cleverness allowing variadics to behave not like tuples (i.e. places within the pack being distinct objects rather than subobjects of a tuple). fn variadic(...args: ...Args) is certainly enough to signal that things are different, but saying that args "is" a tuple implies that it has all the semantics of being a tuple place, even if the ABI splats the components.

Given that packs want to be treated differently than tuples by the compiler, I maintain that it makes sense to expose this semantic difference to the developer. To that end, I would currently lean towards defining the variadic pieces roughly along the lines of:

  • In fn<...Ts>, Ts is a variadic type.
  • ts: Ts is never valid; a typical pattern cannot be of variadic type.
  • (...Ts,) reifies the variadic type into a tuple type.
  • ...ts is a variadic pattern.
  • A variadic pattern can be of variadic type, e.g. ...ts: Ts is valid.
  • (...ts,) is a tuple pattern which creates a variadic binding.
  • fn drop_all<...Ts>(...ts: Ts) defines a variadic function taking any number of arguments of any type.
  • fn drop_tup<...Ts>(ts: (...Ts,)) defines a unary function taking any tuple as an argument.
  • fn drop_tup<...Ts>((...ts,): (...Ts)) has the same public signature as the previous definition, but binds the tuple argument to a a variadic binding. As with any other argument pattern, it's as if the function starts by creating a binding like let (...ts,) = _arg; where _arg represents the value passed to the function call[1].
  • Invalid spellings include fn drop_ts<...Ts>(ts: Ts), fn drop_ts<...Ts>(ts: ...Ts), fn drop_ts<...Ts>(...ts: ...Ts), fn drop_ts<...Ts>(...ts: (...Ts,)), and fn drop_ts<...Ts>((...ts,): ...Ts). These are all errors.
  • As a variadic binding, ts cannot be used without either being unpacked (e.g. into a tuple or function arguments) or expanded over with static for.
  • A variadic pattern cannot be assigned from a tuple, e.g. let ...ts = (a, b, c); is invalid.
  • A variadic binding in a tuple pattern can be assigned from a tuple, e.g. let (...ts,) = (a, b, c); is valid. A variadic binding cannot be directly assigned to a tuple binding, even if its statically known, e.g. let (a, b, c) = ts; is invalid.
  • To unpack a variadic binding, you can reify it into a tuple, e.g. let (a, b, c) = (...ts,); is valid.
  • The ...ts and ...Ts variadic syntaxes are generally only ever valid as part of a parenthesized comma-delimited list.
  • (I don't know about let [...ts] = array, though if supported, it would create a parameter pack, not an array.)
  • (This sacrifices the ability to use tuple/array affordances to get heterogeneous/homogeneous packs, as well as being able to simply reuse the T; N notation for known size packs.)
  • (Inferring that packs need to be the same size if they're unpacked together in the signature seems nice, but a bit more implicit than typical for Rust signatures, and doesn't really[2] handle when the zip only happens in the body.)

I haven't spent the effort to go through the entire ripple effect of these initial choices. I know there would be multiple resulting impacts to other choices.

I am at least somewhat familiar with C++ template parameter packs, so I am at least somewhat influenced by them. I chose to keep the ... consistently on the left not not because how C++'s syntax trends to an "is it on the left or the right" ambiguity. IIRC, I originally learned C++ variadics first formatting as typename ...Ts and Ts ...ts rather than typename... Ts and Ts... ts.


Modulo the notes above — almost all of which are nits, honestly — this is a really strong foundation. I've some experience authoring RFCs (though none that have actually gotten officially addressed, because prioritization, and more that never actually got posted officially for similar reasons), and would be willing to help coauthor a proper RFC series for this if we can recruit a T-lang liason/sponsor.

T-opsem (of which I'm a member) is a subteam of T-lang, but I/we don't have any real pull on T-lang agenda (yet?), and definitely shouldn't for something like this which is outside T-opsem's domain.

This is a big enough addition that it should probably get an "intent to experiment in tree" (MCP? eRFC? Initiative?) and a proof-of-concept partial/incomplete implementation before full RFC. Based on (vibes of) current T-lang initiatives, I think the earliest I'd dare hope to see any real team acknowledgement of the "variadics initiative" to be after edition2024 ships (so about 1½ years). Out of tree experiments (e.g. like ThePhD/PhantomDerp/JeanHeyd was doing with uwuflection) would of course still be possible in the meantime.

FWIW, the large chunks I see would be:

  • Pattern un/packing of tuples/arrays,
  • static for expansion over tuples/arrays, and
  • Variadic generics and arguments,

and I see those three as likely three separate RFCs in a series. I can't quite decide what would be the ideal cadence to RFC them, though; simultaneously is bound to get comments on the final about the earlier and run into issues with the groundwork getting rewritten out from under it, but waiting until the earlier are FCPd to post the latter could easily lose context from the earlier about why some specific choices are made to assist later additions.

Just splitting into two (combining un/packing and static for into one) is also a reasonable division imho, and might be able to motivate the non-variadic part more strongly independently of variadics. But the two are mostly disjoint functionality only really tied together in how they are used to interact with variadics.


  1. Fun trivia: let _ = place; doesn't move from the place. Any temporaries are dropped, but if the assignment rhs is a place rather than a temporary value, it doesn't get dropped. Consistent with this, _ bindings for function arguments do not drop the argument at the start of the function! If not moved from, arguments are dropped in reverse order (the last declared is the first dropped) after any locals have been dropped. ↩︎

  2. The same stupid workaround as currently used for feature(generic_const_exprs) of an empty trait bound would work to add such a constraint to the signature, e.g. (...(As, Bs)): /*nothing*/. ↩︎

3 Likes