Hi all!
Please find below my (as of 2024-05-05, very early stage WIP) draft for an RFC about argument unpacking.
I've been a user of Rust for some years now, and this is my first time giving back to the community. I actually attempted to use this feature at day_job
, only to find out it hasn't been implemented.
The pre-RFC below is still in a very early stage, and I still have lots of improvements planned for it, as you can probably tell from the various TODOs scattered around the document. However, I think it makes sense to share it already now to gauge interest on the overall proposal and get some perspective from the community.
Edit 1: Fix some typos and Markdown syntax.
Pre-RFC - Static Function Argument Unpacking
- Feature Name: static_fn_arg_unpacking
- Start Date: 2024-05-xx
- ...
Summary
Allow call-site unpacking of tuples, structs, tuple structs, and fixed-size arrays into function arguments, using ..expr
within the function call's parentheses as the syntax. Both the number and types of elements in the collection being unpacked, and – when applicable – their names or the order of appearance must match the remaining function parameters being filled and be known at compile time.
Example:
fn main() {
// Unpack the expression (here, a tuple) into function arguments:
print_rgb(..hex2rgb("#123456"));
}
fn print_rgb(r: u8, g: u8, b: u8) {
println!("{} {} {}", r, g, b);
}
fn hex2rgb(hexcode: &str) -> (u8, u8, u8) {
let r = u8::from_str_radix(&hexcode[1..3], 16).unwrap();
let g = u8::from_str_radix(&hexcode[3..5], 16).unwrap();
let b = u8::from_str_radix(&hexcode[5..7], 16).unwrap();
(r, g, b)
}
Motivation
Argument unpacking reduces the verbosity and increases the ergonomics of Rust.
-
Improves code ergonomics by reducing repetitive tasks with an unneeded intermediate step.
-
Allows more concise code both in terms of number of lines and line length.
-
Allows reducing the number of named local variables in scope.
-
Is intuitive for developers accustomed to argument unpacking from other programming languages.
-
Improves Rust's cohesion by adding a missing piece to the family of certain kind of syntactic sugar already in Rust, used for features like struct update syntax and destructuring assignment.
-
Provides groundwork for the syntax and its meaning for next steps:
If compatible, the proposed feature could also reduce the workload and scope of more general and ambitious initiatives by splitting down the work and iterating towards them in small steps – that is, if having the proposed feature would be a subset of those.
Guide-level explanation
TODO: Introduce the concept. Be explicit about what term is used: argument unpacking.
TODO: Code examples! ELI5!
Consider ..
as a machine-readable shorthand for et cetera, but used as a prefix for telling the compiler where to get the rest of the stuff from.
In the familiar context of instantiating structs, ..another_struct
is the struct update syntax that automatically fills the remaining fields of the new struct from another_struct
of the same type. Similarly, when calling a function, argument unpacking as defined in this proposal allows automatically entering the arguments into the function call from a collection whose elements match the remaining function parameters.
Reference-level explanation
This RFC proposes a zero-cost abstraction to improve the ergonomics and readability of code related to function calling, specifically of passing of arguments. In short, the proposed feature is syntactic sugar commonly known as argument unpacking.
Furthermore, this RFC proposes the use of the syntax in a restricted context: statically, when the number and types of unpacked arguments, and – when applicable – their names or the order of appearance are known at compile time. Consequently, the proposed form of argument unpacking is infallible at run time. Note that infallibility is not intended to be part of the specification – rather, it's a side effect arising from the restricted scope of this proposal.
Guiding principles in the design are:
- Familiarity of syntax.
- Explicitness in supported use cases and scope of the proposal.
- Intuitiveness of use and the principle of least astonisment.
- Zero-cost: The idea that this is just syntactic sugar for passing the arguments by hand.
The feature proposed only relates to:
- Functions. Only function and method calls are affected. Macro calls and closures are out of scope.
- Call-site. The feature is only about argument unpacking, not parameter packing or variadic functions.
- Compile-time. Hence the word static. The feature is not about run-time behavior.
- Provably successful situations. The collection types usable for the feature are selected to make the use of the proposed feature infallible.
This is not to say that other RFCs couldn't be written to address the above situations (see Future possibilities). Just that the scope of this RFC is limited.
Syntax
Functional Record Updates (i.e., Struct Update Syntax) already allow automatically filling fields when instantiating structs. This RFC proposes to use the same, familiar syntax, i.e. ..
followed by an expression, for argument unpacking. Another reason to use ..
is that some other programming languages such as JavaScript and PHP already use a look-alike ellipsis ...
prefix for similar language features, benefiting inter-language consistency and familiarity for new users of Rust.
Commonly, in other programming languages, the order in which the tokens appear is that inside the parentheses of a function call syntax, the collection to unpack the arguments from is prefixed by the symbol that is used for unpacking (e.g. ...
or *
). Thus, the same order is proposed in this RFC. One notable exception to this rule is Julia, in which argument unpacking – known as splatting – happens in the form f(args...)
.
The unpacking operator ..
has a low precedence, allowing unpacking of whatever was produced by the expression following it.
This RFC proposes that argument unpacking can occur at any location in the function call and arbitrarily many times as well, as long as there are corresponding valid parameter slots left to pass the next arguments into. For example, the following is allowed:
fn f(a: u8, b: f32, c: bool, d: [u8; 5], e: &str) {
todo!()
}
struct S {
c: bool,
d: [u8; 5],
e: &'static str,
}
fn main() {
let a_1tuple = (5,);
let b = 6.0;
let cde_struct = S {
e: "foo",
c: true,
d: [1, 2, 3, 4, 5],
};
f(..a_1tuple, b, ..cde_struct);
}
Unpacking Rules
Unpacking of tuples, structs, tuple structs, and fixed-size arrays is proposed in this RFC. Other collections are out of the scope. Whether unpacking is successful is checked during compilation, and unsuccessful attempts are rejected, having the side effect that this initially proposed design is infallible during run-time.
Successful unpacking requires, that:
- All of the items inside the collection are unpacked.
- There must be at least as many unfilled parameters left in the function call as there are items inside the collection being unpacked.
- Each item inside the collection is passed as an argument matching one parameter.
- The types of the items in the collection must be compatible with the corresponding parameters.
- If there are N items in the collection being unpacked, the immediately next N parameters in the function call are filled with the collection's items as the arguments.
- Either of these two rules need to be fulfilled:
- For tuples, tuple structs, and fixed-size arrays, the order of the items in the collection is the same as the order in which they are unpacked.
- For structs, the names of the fields in the collection are the same as the next parameters in sequence; only the immediately following sequence of parameters are considered.
When attempting to unpack a struct with named fields, where the number and types of fields match, but the names are different is rejected. Technically, it would be possible to emit syntactically correct code from the sugar, but the motivation is ambiguous. Therefore, it's better to leave it up to the developer to decide what is it that they want to accomplish. Also, it's difficult to specify what would happen when there are multiple arguments of the same type: What should the order be when the names don't match? What would happen if one of the struct's fields was renamed into one of the parameter names?
Thoughts
- If unpacking a struct with the exactly named fields, the order of the struct's fields vis-à-vis the arguments doesn't matter. Just pass the struct fields as the correspondingly named parameters.
- The struct fields need to be visible at call-site, e.g.
pub
orpub(crate)
. - When there's ambiguity, prefer that the developer takes control and is explicit about what they mean. This could prevent errors. There are downsides: No access to the syntactic sugar.
Diagnostics
-
Error: Attempt to pass the expression itself as an argument without unpacking it, if and only if the conditions that would allow argument unpacking are fulfilled. -> Suggest refactor: Did you mean (same but with the unpacking syntax)?
-
Error: Attempt to unpack an expression where a specific element/field is incorrect (e.g. has the wrong type or name). -> Point out the incorrect field by underlining it, telling what it incorrectly is, and what is expected instead.
-
Error: Attempt to unpack a slice, trait object, iterator, vector, or HashMap. -> Fallible unpacking of Dynamically Sized Types is not supported.
-
Lint: When unpacking a type
T
that could also produce aRangeTo<T>
. -> Ambiguous use of argument unpacking of type that implementsRangeBounds<T>
. Use{..expr}
to produce a range instead. -
Lint: When directly unpacking arguments from an expression could be done instead of using temporary variables or accessing the elements/fields by hand. -> Suggest refactor: Use unpacking instead.
Guide/Documentation Changes
Standard library documentation that may benefit from the mention of the new syntax:
- Structs:
- stdlib keyword: struct - Rust
- Tuples:
- stdlib primitive: tuple - Rust
- Arrays:
- stdlib primitive: array - Rust
The Rust Reference:
Since Functional update syntax is documented under Struct expressions, the likely place to document argument unpacking would be under its own subheading in Call expressions.
Corner cases
Empty collections
Attempting to unpack a unit struct, the unit type, or an empty array is disallowed. It doesn't make sense to do it since there are no arguments to unpack. Minimum of one element/field is required in the collection being unpacked.
RangeTo<T>
If the collection of type T
being unpacked also implements RangeBounds<T>
and its fields are both named and typed correspondingly to the function's parameters, allowing argument unpacking to proceed, favor the new syntax of argument unpacking instead of instantiating RangeTo<T>
. If a RangeTo<T>
is actually desired, that argument could be wrapped inside curly braces: {..expr}
.
See Possible Concern: RangeTo<T>
below.
Drawbacks
Functions that accept many parameters may already be a code smell, and the proposed change would likely help calling such functions the most, becoming an enabler for anti-patterns. At the same time, unpacking three of four arguments by hand is not much work, decreasing the usefulness of the change in normal code.
A sufficiently smart language server could automate argument unpacking, also decreasing the usefulness of having the feature in language itself when writing new code.
Although the proposed syntax is familiar from other contexts, e.g. as a means for struct instantiation, it still burdens developers with additional syntax to understand. Possibly, depending on how intuitive the syntax is or how familiar the developer is with similar features from other programming languages, this may or may not imply an additional mental overhead when working with Rust code.
However, as the new syntax comes in the form of syntactic sugar, this shouldn't be so bad: no-one is forced to use this. Additionally, it could be reasonably argued that the proposed change makes the language a bit more consistent, since a similar feature for struct instantiation already exists. Anecdotally, the author of this RFC tried to use the syntax for the proposed feature only to notice it doesn't exist yet.
Possible Concern: RangeTo<T>
The proposed syntax overlaps with existing valid syntax: Given let x: T
, where T: RangeBounds
, ..x
is already valid syntax for instantiating RangeTo<T>
. For functional record updates, these same ambiguous situations are resolved by favoring struct update syntax over the range instantiation. For consistency, argument unpacking should behave the same.
If this change in syntax is found to be a breaking change, it could be stabilized in the next edition.
Note: The author of this RFC couldn't quickly come up with a struct and a function such that the struct implements RangeBounds
and it has a field (with the same name and type as the function's only parameter) with a self-referential type. A truly ambiguous situation, where the both meanings, argument is a RangeTo<T>
or arguments are being unpacked, would be valid, may not occur that often. Non-working attempt below:
use std::ops::{RangeTo, RangeBounds, Bound};
struct WeirdType {
x: RangeTo<WeirdType>,
}
impl<T> RangeBounds<T> for WeirdType {
fn start_bound(&self) -> Bound<&T> { todo!() }
fn end_bound(&self) -> Bound<&T> { todo!() }
}
fn ambiguous(x: RangeTo<WeirdType>) {
println!("???")
}
fn main() {
let y = WeirdType {
x: ..y,
};
ambiguous(..y);
}
Rationale and alternatives
Aside from not implementing the proposed change at all, some subset of it could be implemented instead. For instance, only allowing unpacking of structs with fields that have exactly the same names. This could still be useful, even though it wouldn't help in some of the example use-cases.
A different decision could be made allowing unpacking structs that have extra fields in addition to the named fields that could be successfully unpacked. The remaining fields would just not be used as arguments.
The proposed feature could also be implemented as a part of a more ambitious initiative of treating function arguments as distinct tokens accessible by macros, or something equally general. E.g. being able to do something like:
fn main() {
// Changes (u8, u8, u8) into three u8 arguments in the function call
set_color(to_args!(hex2rgb("#123456")));
}
This would have the downside of including another macro in std
. Including the macro in a separate external crate via the ecosystem could be done as a workaround, but the cost-to-benefit ratio of including another dependency may not make it worth it for some users.
Some programming languages (e.g. Python and Ruby) use the asterisk *
character in place of the proposed ..
. In Rust, such syntax would be confusing, since it's already used for dereferencing.
A somewhat different design, allowing the use of bare ..
as a shorthand for passing variables in the current scope as arguments in the function call, would still make code shorter. Technically, this wouldn't conflict with the design proposed in this RFC. However, having two different but syntactically similar shorthands for functionality resembling each other might be confusing, which may be a reason to only commit to one or the other.
Workarounds If RFC Is Not Implemented
Instead of changing the language to include the syntactic sugar, a standard library method from fn_traits
could be used. A slightly more verbose example:
fn main() {
std::ops::Fn::call(&set_color, hex2rgb("#123456"));
}
The downside of this is that the syntax diverges from a normal function call, i.e. the code seems to be calling call
, with the actual function to be called being just one argument. Given the verbosity and unfamiliar syntax (from the point of view of argument unpacking in other programming languages), this option also doesn't increase ergonomics that much. Directly unpacking structs, tuple structs, or fixed-size arrays isn't supported either, although .into()
can be used with the last one. Relying on this might also confuse language servers when trying to locate uses of the called function.
A simple way to avoid the verbosity of having to pass the arguments by hand is to change the type signature of the function being called to accept the tuple/struct instead. However, sometimes this is not possible, if the function comes from a 3rd party crate for instance. The proposed syntax specifically targets call-site unpacking, which avoids this problem. Of course, it should be possible to manually implement a wrapper for the 3rd party function in these cases.
Prior Art
Different Programming Languages
The proposed or a similar feature is known by many names in different programming languages. Various terms include unpacking, destructuring, deconstruction, exploding, splatting, and spreading. Some examples below:
Python has argument unpacking, (also see: 6. Expressions — Python 3.12.3 documentation) which allows using the *
or **
operator at call site to, respectively, extract values from tuples or dictionaries into distinct arguments:
def hex2rgb(hexcode: str) -> tuple[int, int, int]:
r = int(hexcode[1:3], 16)
g = int(hexcode[3:5], 16)
b = int(hexcode[5:7], 16)
return r, g, b
def print_rgb(r: int, g: int, b: int) -> None:
print(r, g, b)
if __name__ == "__main__":
print_rgb(*hex2rgb("#123456"))
TODO: Another Python example showing the likeness between double-asterisk unpacking of dicts and the intended similar feature proposed here for Rust structs.
-
JavaScript:
- Spread Syntax: Spread syntax (...) - JavaScript | MDN
- Syntax:
sum_of_three(...numbers)
-
Julia:
- Splat: Functions · The Julia Language
- Syntax:
sum_of_three(numbers...)
-
PHP:
- PHP RFC: Argument Unpacking: PHP: rfc:argument_unpacking
- Syntax:
sum_of_three(...$numbers)
-
Ruby:
- Splat operator: calling_methods - Documentation for Ruby 3.3
- Syntax:
sum_of_three(*numbers)
Haskell has no separate syntactic sugar for argument unpacking, but various uncurryN
functions can be implemented, where N
is the number of items in a tuple, e.g.:
uncurry3 :: (a -> b -> c -> d) -> (a, b, c) -> d
uncurry3 f (a, b, c) = f a b c
Notable differences to existing implementations
For example, in Python, fallible unpacking occurs dynamically, at run time. Use cases, such as unpacking data structures created at run time with varying number of elements, are supported. On the other hand, whether unpacking can happen at all is not known until it is attempted during program execution. The proposed feature in this RFC is different, only allowing unpacking when it is proven to succeed during compilation, marking the feature infallible and static.
(To be clear, a related Python feature, packing of the parameters, is unrelated to this proposal and connected to the distinct concept of variadic functions.)
Existing Rust Work on Subject
TODO: Any urlo, reddit, github links for this?
IRLO:
Rust GitHub:
- Language feature: flat tuple as function arguments: Language feature: flat tuple as function arguments · Issue #2667 · rust-lang/rfcs · GitHub
- Draft RFC: variadic generics: Draft RFC: variadic generics · Issue #376 · rust-lang/rfcs · GitHub
- This might in the end solve the same problem. However, this looks like a more ambitious feature, whose progress seems to have stalled. Meanwhile, would we want to have the proposed solution, which essentially provides a subset of the consequences of variadic generics?
See related: rfcs/text/2909-destructuring-assignment.md at master · rust-lang/rfcs · GitHub
Stack Overflow questions:
- Is it possible to unpack a tuple into function arguments?
- Is it possible to unpack a tuple into method arguments?
Unresolved questions
TODO: Work through these and put the results under "Reference-level explanation".
- What to do with references? Same as when building structs? Same as when normally passing arguments?
- Should these be "intelligently" selected to match the order/names' type definitions? I.e. if parameter type is &i32, pass a reference automatically if the struct field is
i32
?
- Should these be "intelligently" selected to match the order/names' type definitions? I.e. if parameter type is &i32, pass a reference automatically if the struct field is
- What to do with mutability?
- If unpacking directly from the return value of a function, use the same mutability as defined for function parameters?
- Consider if mutability in this case makes any sense at all...
- If the tuple or struct instance is defined in scope with a name, is there something with interior mutability here that we'd specifically need to worry about in this context?
- If unpacking directly from the return value of a function, use the same mutability as defined for function parameters?
- What to do when function parameters are generic, using
<T>
,impl
ordyn
?- Exactly the same as when the arguments are passed by hand!
- What to do when unpacking structs with named fields into macro call arguments?
- What to do when unpacking unions? Is this a supported use-case at all? Why/Why not?
- What to do when function has more parameters than are being unpacked?
- Should be a valid use-case. E.g.
set_color(my_alpha, ...get_rgb());
- Should be a valid use-case. E.g.
- What to do when the collection being unpacked is a reference, smart pointer or something else containing the collection type?
- Depends on if this can fail? If it can be shown to provably succeed at compile time, then it should work.
- Closures omitted, would they be possible future work? Is there even anything additional to do about them?
Future possibilities
Macros, callable with the macro_name!(...)
syntax have been omitted from the scope of this proposal. The only reason for omission is the time concerns related to differences in design. For example, some macros (e.g. println!
) accept an indefinite number of arguments. Unpacking structs, where the field names play an important role, may be unsuitable for some macros, but unpacking tuples, tuple structs, and fixed-size arrays may make sense. Further design, meriting a separate RFC, is needed.
The scope of argument unpacking could be expanded to dynamic contexts as well. Runtime unpacking of dyn Trait
trait objects, slices, Vec
s, HashMap
s, iterators in general etc. would be fallible, since the existance of a correct number, order, typing and naming of items to match the parameters can't be guaranteed at compile time. Something like ..expr?
could be considered to improve ergonomics for those cases as well, but that would definitely merit a separate RFC.