Variadic generics design sketch

CAD97 · July 13, 2023, 9:52pm

That could work, but either:

It logically moves out of the tuple to the fresh places instead of iterating by reference (i.e. static for ref item in tuple);
You get case (2) where packs have different semantics than proper tuples; or
Through complicated spec trickery you say it also does "place splitting" for tuples, but
- in practice, the address of proper tuple members won't change, so it'll still effectively look like (2),
- this makes it a semantically meaningful pessimization to extract a temporary variable (not unprecedented but still undesirable),
- allowing colocated objects in the opsem is complicated, especially if writable.

It sounds somewhat reasonable on the surface, and it might be possible to make work, but the implementation in the compiler is almost certainly still going to need to make packs their own type kind to get the desired semantics (with just assigning collected varargs to a tuple, there's no way to tell LLVM it's allowed to split the alloca into multiple; we need to do that on our side). I'm also not certain how you'd even spec it operationally to work for mutable^[1] references (incl. shared mutable).

The only way I can think of to get the requested behavior would be to specifically say something like "for the purpose of operations that care about the bounds of an allocation, the bounds accessible by this pointer are constrained to this single field" (very roughly, the maximally strict subobject provenance) but there are issues with taking this restrictive approach instead of a constructive/operational approach. Not the least being that this by itself isn't enough — the addresses of the tuple fields still have a fixed and observable offset between them^[2] — and I'm not fully confident that there aren't other ways to rely on the container layout for the absence of UB.

I do think the more honest option is to surface packs being a different thing from tuples. But importantly, they're not a type! (Types have consistent layout, packs deliberately shouldn't.) Rather, they're a different kind of binding, binding a single name to (potentially) many values. We can still talk about the pack binding as "having" a tuple type (i.e. for the purpose of explaining what you can do with it and ascribing a type to the binding), but packs being a binding of one name to many values is imho an approachable and understandable reason as to why it wouldn't be able to be manipulated like regular bindings.

At some rough high level it kinda feels conceptually similar to references. C++ references aren't proper types the way Rust references are. IIUC, they were originally conceptualized as what I'm calling "place aliases" and as a decorator on the binding rather than the type (e.g. Type &name versus Type& name). In C++, you can't have a reference reference (T&& is something different, and constructing T& & with templates immediately normalizes to just T&) and oftentimes templates just don't work with references (e.g. std::optional<T&> = delete); using std::reference_wrapper is required to get regular type semantics for references.

Rust absolutely enjoys benefits from references being regular types. It'd certainly be nice if packs could be regular as well. It's frustrating that they almost can be but for a few known obstacles. If "packs are just tuples" worked more generally, treating them regularly like other types would certainly be preferable. But given there needs to be special handling at least for packs of lifetimes, it seems reasonable to let packs be different rather than try to force them into being regular. Something being almost but not quite regular in some cases (e.g. C++ references, though packs would be less irregular than them) is worse than being clearly irregular from early on.

Btw, if you want to talk about this with a bit lower latency, ping me on the Discord (project or community) or the Zulip; I'll show up as CAD97 there as well. You could also post in the T-lang channel desire to create a variadics project at some point if you're interested in making the (slow, extended) push towards getting a T-lang liaison/sponsor and eventually seeing this in the compiler.

A couple more minor things that popped into my head, and one major one at the end:

core::marker::Tuple

If it's a pure marker trait (i.e. has no members and never will), core::marker is a reasonable home for it. If it has associated members, though, the core::primitive module feels like it might be a better fit^[3]. For: tuples are a primitive types, and documented as such. Against: compound primitive types generic over / including user types don't feel as primitive, and core::primitive's current primary missive is for primitive types with shadowable names rather than types with syntax (e.g. there's no type Ref<'a, T> = &'a T).

<Ts as Tuple>::ARITY

"Arity" is the formal name, but Rust generally just calls it the tuple length. E.g. from the primitive tuple docs:

Tuples are finite. In other words, a tuple has a length. Here’s a tuple of length 3:
("hello", 5, 'c');
‘Length’ is also sometimes called ‘arity’ here; each tuple of a different length is a different, distinct type.

ref ...pack vs ...ref pack

Since you're using ...&pack as the expression to expand to a reference (individually) to each item in the pack/tuple, the pattern dual to that is ...ref pack, causing each item to be referenced (individually) by the bound pack.

Expressions and patterns aren't perfect duals, but we should avoid diverging them more than required. We should either

use ...ref pack for patterns and ...&pack for expressions, or
use ref ...pack for patterns and &...pack for expressions,

in order to maintain that dual.

Using the latter could actually potentially help somewhat in the "let packs be just tuples" problem, since now you aren't syntactically creating a reference to a tuple. (Though tbf when you have a tuple reference, &...*tuple is awkward when we have pattern binding modes and autoderef trying to remove this kind of "reference coercion" noise.)

It means you can have a simple syntactical rule that if a pack is ever used except directly in an unpack, it becomes a tuple, and it stays as separate places if it's only used directly by unpacks. I don't like bypassing expression composition to put that behavior onto ...&pack, but I'd be able to accept this as workable. static for still uses &pack and not &...pack so it's still not great, but it can be workable. But it'd still remain that packs are observably different from tuples; it'd just be that packs easily decay/infer to be tuples by some easily predictable rule.

Even though I agree that expansion expressions more complicated than simple list expansion should use a loop styled expression, (...pack,) to unpack into a tuple is very reasonable. If converting a pack to a tuple can be as simple as let tuple = (...pack,); (or let array = [...pack,]; for uniformly typed packs), automatic conversion isn't saving that much developer effort. The compiler should absolutely be able to see the use of a pack outside an unpack context and give a useful error suggesting to unpack it either inline or at the time of binding.

"place alias" static for semantics

I'm admitting it's an unlikely long shot. That it's potentially quite surprising. But it would solve some problems (e.g. can iterate by value over a varargs pack including unsized types), and perhaps calling it macro for would help conceptualize that given macro for place in pack, mentioning place is equivalent to mentioning pack.{N}.

Combined with spelling referencing unpack expressions as &...pack, this would allow elimination of the need to syntactically take a reference to a pack when references to its members are what's desired. It doesn't itself solve the "containing object" observability, but it manages to constrain the required scope of "not actually a tuple" manipulation to a more intuitive minimum.

Highly tangential: macro let

I've argued a couple times to add some sort of $:place and/or $:value and/or $:param matcher for macro_rules!, to make it easier to get function call-ish semantics for macros, with arguments evaluated exactly once and temporary lifetimes having the correct extent. $:value/$:param would be intended to behave exactly like a function argument and take ownership of the value, but $:place would behave more like $:expr and only capture the value as necessary (e.g. println! only uses its arguments by reference, not moving from them, despite using by-value syntax), just now with single upfront evaluation semantics instead of being reevaluated at each position the binder is used. I believe $:param can be accurately emulated with match $param { param => { /* body */ } }, but the extra code does come with a compilation time penalty, and macro authors need to both know and remember to use this trick.

Along similar lines, there's been some vague talk around some kind of k#autoref binding mode that would permit code (mostly macros) to extract names for partial expressions but while preserving autoref behavior (i.e. using a place by-value, by-ref, or by-mut) determined by usage of the named binding.

A somewhat interesting alternative popped into my head: macro let. The point being that macro already carries the copy/paste semantic connotation, so when writing something like macro let name = some.place[expr]; it shouldn't be too surprising for name to maintain autoref semantics for the assigned expression. It would ofc still maintain that the place is evaluated once at the let; syntactic repetition of the expression can and should still use macro_rules!. It's not fully general

impl<T: Tuple>

While I favor this simplification for non-varargs generics, to point out an obvious consequence: they're almost certainly no longer going to be usable with multiply-unsized varargs. Because for a tuple type to exist (and thus to have a layout which can be queried by offset_of!) it can only have a single tail unsized member.

This is honestly fine; it's an extreme edge case where this matters and using vararg generics instead would be ambiguous. In the case where it's absolutely required for some reason, a dummy generic of a different kind (i.e. const generic) can be used to split two vararg generics; we no longer require a strict {lifetime, type, const} ordering between generic kinds.

Yes, tuple WF could be loosened to to allow multiply unsized tuples to exist at a type level, or we could "just" have different WF rules for unpacked tuples than proper tuple types^[4], but this is a very annoying thing to have leak to the rest of the language.

where Ts::ARITY == Us::ARITY

Some sort of additional notation is generally considered to be necessary to add const bounds to where clauses, to switch from type context to expression context. The current nightly hack would usually be Bool<{ Ts::ARITY == Us::ARITY }>: True (but others are possible), and allowing const { Ts::ARITY == Us::ARITY } has been mentioned.

I think I'd stick to having for<T, U in Ts, Us> in the signature implying matching arity for now, I think, to avoid getting into the weeds of the problems with const bounds (in short, SAT solving), meaning you can just remove the where clause entirety from the two examples using const == bounds in where.

(I think this implication is roughly comparable in impact to implied lifetime bounds (e.g. &'a T implies T: 'a, and for<'a> Ty<'a> implying bounds on 'a), so should hopefully be considered acceptable.)

...pack

What exactly is allowed for pack in an unpack expression? Is it any tuple-valued expression?

// e.g.:
call(...make_tuple(dispatch),);

// if so:
call(...static for item in make_tuple(dispatch) { item });

// potential misinterpretation:
call(...static for dispatch in dispatch { make_tuple(dispatch) });

Is it restricted to exactly just syntactically ...name, ...&name, and ...&mut name? What about iterating name: Box<(A, B)> by value with ...*name? More layers of (de)referencing?

The examples use ...(tup, ple), so literal syntax for unpackable types are allowed, it seems. Do these allow arbitrary nested syntax or is it restricted somehow? Perhaps to the "pattern or expression" expression syntax allowed in pattern assignment? Or related to the "pure initializer" rules used for static promotion? If I static for _ in (call(), call()), do all iterated values get evaluated at once up front, or interleaved with the loop body evaluation? If up front, how can I get the interleaved seq-macro semantics?

If it's stricter than "any expression," is this a syntactic restriction or is it a semantic one? (I.e. does it apply when hidden under #[cfg(FALSE)] or not?)

How strict are the type requirements? Is it restricted to just tuples, tuple structs, (packs,) and arrays? Do we allow references (single-level) of them? Does autoderef coercion get applied?

for loops use the IntoIterator trait to consume their iterable. Should / why shouldn't static for use some Unpack trait? The output type is statically known and variadics allow blanket implementing it for all tuples, after all. And would allow custom types to opt into being unpackable rather than automatically just being or not.

Can I have a return type generic over arity which isn't constrained to match an input?

These all have simple answers available but aren't entirely self-evident, so need to have those answers given. "Here's a bunch of examples" is good to explain how to use a thing, but the edge cases are just as important if not more for a good proposal.

In order to support copy elision of abi-by-ref syntax-by-value parameters, we're eventually going to have to solve a similar spec issue to permit live objects to alias in this case. But this is a much simpler case; while the parent place is (potentially) still live, it's protected (UB to read/write from) for the call scope and has been uninitialized (so observing the value in the place after the call is UB). ↩︎
It would actually be marginally easier to do for Abstract C++, because C++ makes (nonequality) comparison of pointers to different allocated objects UB, and leaves casts between pointers and integers entirely implementation defined. Thus in C++ there'd be no spec-compliant way to determine the layout offset between two items. Rust is much more permissive here, allowing comparison between arbitrary pointers (by address, allowing you to wrapping_offset to an equivalent address), and guaranteeing that casting to/from usize doesn't change the address offset between pointers. (See provenance rules for pointer validity implications of doing such tricks.) This makes address stability more observable (thus restrictive) in Rust than Abstract C++. (Strictly speaking, I believe a C++ implementation could be within rights to relocate allocations to a different address so long as it can fix up any and all pointers into the allocation to point to the relocated one, at least until PNVI provenance rules in the spec make tracking such impractical; our hypothetical implementation sidesteps that being a problem by ptr/int casts always producing 1.) ↩︎
Especially with my pre-RFC that would also add traits there. ↩︎
An interesting tidbit: for similar reasons as type aliases ignoring trait bounds, they also ignore WF, e.g. you can define a type alias to (str, str) or [[u8]] today, and it only errors when you try to use it. ↩︎

Topic		Replies	Views
Pre-RFC: Variadic Generics language design	29	6611	July 25, 2020
Pre-RFC: variadic tuple language design	90	7107	December 31, 2019
Variadic generics - pre-RFC language design	56	7354	March 25, 2019
Pre-RFC: variadic tuples attempt #80973022 language design	8	698	November 13, 2024
[Analysis / Pre-RFC] Variadic generics in Rust	72	5755	May 11, 2021

Variadic generics design sketch

Related topics