Placeholder Syntax - Revisited

emchristiansen · January 23, 2023, 1:13am

(This is a follow-up to Placeholder Syntax)

Hey Rustaceans,

Can we get placeholder syntax in Rust?

One of the things I really miss from Scala is the placeholder syntax, and I'd love to see at least a limited version of it in Rust.

To plagiarize this 2021 post on the topic, the syntax lets you transform code like this:

users
   .iter()
   .filter(|x| mypredicate(42, x))
   .map(|x| x.name)
   .collect();

into this:

users
   .iter()
   .filter(mypredicate(42, _))
   .map(_.name)
   .collect();

Semantically, all it does is replace the underscores (the anonymous arguments) with named closure arguments, so foo(_) desugars to |x| foo(x), _.bar desugars to |x| x.bar, _.baz(_) desugars to |x, y| x.baz(y), etc.

Besides automatically providing an elegant partial application syntax, I love this syntax because it allows me to avoid naming things that don't need to be named, which would otherwise detract from the logical flow of the code. Code should be simple, and simple code shouldn't contain irrelevant details.

When this was previously discussed in 2021 the main apparent blocker was how the closure arguments would be scoped, e.g. this post.

But, I think this is a solved problem - just use the same rule as Scala. Specifically, bind at the level of the closest outer scope. Or, to quote from the Scala semantics:

An expression e of syntactic category Expr binds an underscore section u, if the following two conditions hold: (1) e properly contains u, and (2) there is no other expression of syntactic category Expr which is properly contained in e and which itself properly contains u.

E.g. foo(bar(42, _)) becomes foo(|x| bar(42, x)).

Slight tangent

The previous thread considers the possibility that Scala's simple scoping rule wouldn't prove sufficiently flexible, and suggests elaborations. I think the best idea was to use a special token, e.g. |..|, as a scope marker. I think it would work like this:

If the scope marker is not present, default to Scala rules.
Else, anonymous arguments are scoped to the level of their closest scope marker. E.g. foo(|..| bar(baz(42, _))) would become foo(|x| bar(baz(42, x))).

However, while I would like to elide argument names in arbitrarily nested scopes, it's not a hill I'm willing to die on, and I'm open to the criticism that this special syntax doesn't added enough benefit to justify its inclusion.

scottmcm · January 23, 2023, 2:05am

TBH, this is still my biggest thought about the whole feature.

If Rust's lambda syntax was something as horrific as C#'s delegate syntax, then I'd be all in favour of doing something to make it more convenient -- like how C# added => syntax.

But |x| x + 1 just isn't bad, since for anything short enough for placeholders to not be confusing can just use a short name (like x) for the variable. Not to mention that names avoid questions like what _ << _ means.

And that avoids all the scoping questions. For example,

This still surprised me, since it's unclear why it's outside the bar, rather that foo(bar(42, |x| x)). If it was foo(bar(42, _ + 1)), would that still be foo(|x| bar(42, x + 1)), or would that make it foo(bar(42, |x| x + 1))?

This one definitely doesn't seem worth it, since it's not even shorter.

emchristiansen · January 23, 2023, 5:37am

Hey Scott, thanks for taking the time to write a detailed response.

First to answer your question, foo(bar(42, _ + 1)) would be foo(bar(42, |x| x + 1)) using the Scala binding rule. So that's probably not what you wanted to do there - the Scala binding rule is very simple, generally quite useful, but not generally expressive.

Regarding your main point, that`|x| x + 1` isn't too bad

I bet you'd agree that one of the indicators of elegant code is when an API, class, etc. asks for exactly what it needs to accomplish its task, and no more. E.g. a function shouldn't take a Vec argument when anything implementing IntoIterator would do, and it shouldn't take a T argument when an &T would have sufficed.

For me, the problem with |x| x + 1 isn't about length, it's about elegance. The language is asking for something it doesn't actually need, in this case the name of the positional argument. Consider again the following example:

users
   .iter()
   .filter(|x| mypredicate(42, x))
   .map(|x| x.name)
   .collect();

Ask yourself - What meaning does the x convey? The answer is nothing - the x could just as well by y. This is in contrast to everything else in this expression, which has actual semantic meaning (users indicates a specific variable, mypredicate is a specific function, iter, filter, etc. are well-known iter methods).

What |x| x + 1 is attempting to convey is simply how a positional argument coming in should be bound to a positional value in the expression. And that is exactly what the _ syntax specifies.

Going general with a binding token (`|..|`)

It's possible to provide the language with exactly what it needs, and no more, using binding tokens, e.g. |..|. The binding token tells you in which scope positional arguments are bound, and the underscores tell you where to put them.

If you wanted full control over where to place the positional arguments, you could adopt a syntax with tokens like _0, _1, etc, so you could write things like |..| foo(_1, _0 + _1) to get |x, y| foo(y, x + y).

Final thoughts

If none of this really resonates with you, no worries. I've been writing a lot of OCaml, and I've grown accustomed to pipelined code (very few variables, mostly just functions with syntax specifying how values should flow). Perhaps I just need to make my peace with Rust.

scottmcm · January 23, 2023, 6:00am

One minor thing: this is arguably a breaking change, since foo(_ = 4) can compile today.

TBH, I'm quite enamoured, theoretically, by concatenative programming. I do really like avoiding useless variable names and having things in dataflow order. And in some places, following those principles can be great for Rust -- see .await, for example.

But it's unclear to me how far along these lines Rust should go. For example, a pipelining operator comes up here regularly: When will Rust adopt the syntax of pipline operator "|>" in ocaml

What would a big change in this direction look like? Is it worth churning the idiomatic way of doing things? Dunno.

jjpe · January 23, 2023, 8:50am

Here's what I said on the precious thread.

This is in fact the scoping ambiguity I was referring to: what of the surrounding context expr, if any, is included in the lambda, and why? The _ syntax makes that as clear as mud, and I don't think it's helpful to have to go figure it out at every occurrence. And even if the proposal says the closest scope, that's still something that would have to be looked up and learned. That then has to be weighed against the benefit, which, other than a 2nd set of syntax for closures, is what exactly? That's still unclear to me.

In the meantime I could add another issue: The _ syntax is already used for pattern matching. Now patterns aren't exprs, and so on a technical level it might be able to be squeezed into the grammar. But that leaves the human interface problem: suddenly _ means 2 different and only tangentially related things depending on whether they're used in pattern or expr context. This seems likely to be confusing and thus doesn't make much sense to me.

That was one of my criticisms as well: even in the best, most optimal scenario, using _ rather than a 1-character identifier like x isn't even going to be shorter, except for maybe eliding the param list - but that param list is precisely what clearly demarcates the body of the lambda in the first place. Even if not grammatically speaking, it's just easier to parse as a human being.

What it provides is a name for each param, just like everywhere else in the language. In other words, it's more consistent with the rest of Rust, making it less surprising and easier to work with.

As for the |..| syntax, there we run into the pattern/expr duality confusion again. And once again, even if grammatically feasible, it isn't clear why this would be worth having at the expense of clarity: does the .. mean something lambda related, or is it saying "I don't care about the rest of the (sub-)pattern", like it does in patterns? Note that patterns can be used in param lists today, which allows for destructuring of params.

I've seen this convention used before in an academic setting. I found it insufferably inelegant then, and I notice that my opinion on that front hasn't changed. But granted, this is subjective.

If such a proposal were to ever be accepted, these things would need to be addressed first I think.

withoutboats · January 23, 2023, 2:10pm

(NOT A CONTRIBUTION)

I want this feature all the time. I regularly write one liner lambdas that are just a bit too long and force multi-lining the expression and it always feels like it would just be so much nicer to have some kind of syntax like .map($.foo(bar)) or whatever instead.

rkuhn · January 23, 2023, 3:12pm

Long-time Scala user here: placeholder syntax is arguably one of the worst mistakes of that language, precisely due to the implicit and non-obvious scoping rules. Rust already has very concise lambda syntax, I’d say it is minimal under the condition of being hard to misread.

That said: I have occasionally wished for a slightly smaller feature that allows a method or function call to be turned into a lambda by omitting some of the arguments (called eta expansion in Scala). One possibility could be to do this when _ is used as the complete expression supplied as an argument. This feature would still require additional learning for newcomers, but at least it is unambiguous without having to also learn obscure scoping rules.

CAD97 · January 23, 2023, 3:42pm

The specific case of things like .map(|x| x.foo()) where that's using autoderef and can't be written as .map(T::foo) I've run into many times and been mildly annoyed by each time. I've even written .map(Deref::deref).map(T::foo) at least once to stay pointfree.

Generalized placeholder based closures are fundamentally (human) ambiguous unless the language is designed around supporting it, e.g. like Kotlin or Swift, where closures are always {}-delimited. (And even there, nesting closures with implicitly and explicitly named arguments is problematic.)

I think Rust could get away with and probably benefit from specifically allowing pointfree single-argument closures only for method calls with the argument as a receiver. Allowing _ as a single complete argument to a function call is potentially arguable (but is very context dependent imho, which makes it problematic), but anything beyond that is too far into (human) ambiguity to justify.

steffahn · January 23, 2023, 4:28pm

I’m similarly annoyed in some cases when the types do match, but when it’s more of a comparison of

.map(TheTypeWeAreOperatingOn::foo)

.map(|x| x.foo())

Arguably, this case can more realistically be solved by simply^[1] introducing a way to allow the compiler to infer the type, e.g.

.map(_::foo)

But I wouldn’t mind a generally usable

.map(_.foo())

either.

Though with such a feature, one would probably expect

.map(_.bar(argument))

to work, too, in which case … well … we are kind-of introducing the full complexity of closures (i.e. variable capturing). E.g. it would need to be discussed whether

.map(_.bar(compute_argument(xyz)))

evaluates the argument when the closure is constructed or when it’s called… (probably when it’s called?) and then there would be no way to make it a move closure, which – I mean – might be rarely useful to begin with? But at least it’s something to consider.

Looking at this syntax for longer, I cannot stop feeling like this is weirdly confusing w.r.t. order of evaluation, since the |…|s are missing that would clearly mark everything after them as “lazy” / not directly evaluated. Maybe a maximally conservative approach (without ruling out providing arguments alltogether) would limit captured args to constants. (Using some reasonably conservative notation of const.) Which has the nice benefit of allowing full Fn compatibility, and zero-sizedness, and castability to fn; whilst staying equivalent to the desugared version, and also compatible with the native interpretation/intuition that the arguments could have been eagerly evaluated upon construction, because it looks like an ordinary expression.

Allowing the _ in other places in functions and methods… well…

.map(foo(_))

doesn’t seem too bad I guess(?) though, the _ do somewhat look like they’re on different levels

e.g.

.fold(0, _.add(_))

isn’t 100% obvious to me that the _s belong together, especially considering how whilst

_.add(_)

is

|x, y| x.add(y)

the somewhat similar-looking

_.add(_.f())

would be

|x| x.add(|y| y.f())

As a mostly unrelated side note… I’ve not looked into fun allowed unicode symbols that we can use today. Not to be taken seriously, of course. But truly I find something like

.map(|ˑ| ˑ + 1)

.map(|ˑ| ˑ.pow(2))

somehow looks a lot “cleaner”^[2] than coming up with a variable name ^[3]

Rust Playground

on second thought… is it so simple? With methods, there is prior art in that we only consider methods, no other associated functions; and there’s existing resolution rules, all the possible ambiguities are already possible, no room for making things wors… ↩︎
awww… this doesn’t work properly on my phone. Well… if you see funny boxes… it’s supposed to look like this:

↩︎
by the way, no worries… the thing is very much still “literally” pointfree: if you look closely, the symbol I’m using is a triangle! ↩︎

emchristiansen · January 23, 2023, 5:13pm

@scottmcm, I hadn't heard of concatenative programming, but I think the OCaml code I admire can be at least roughly described as such. Essentially, you build a system of pipelines (i.e. a DAG) through which data flows - when it is done well it can make certain logic very clear and quite beautiful. But, any concept can be overused, and I think there's still a place for the judicious use of bound values.

RE the pipeline operator, I just started using tap and so far it's been enough.

scottmcm · January 23, 2023, 5:29pm

I do think we should just fix that, though practically it might need to wait for the new trait solver. Allowing "lambda coercions" that generate new lambdas applying coercions to the arguments and return types seems entirely reasonable to me.

emchristiansen · January 23, 2023, 5:50pm

@jjpe, thank you for you detailed feedback.

To address your points:

The binding scope for `_` is ambiguous, or at least hard for humans to grok

It's been years since I regularly used Scala, but I don't recall being confused by the behavior of _. I expect it's something most people would get used to, but I could be wrong. (As a counterpoint, @rkuhn is clearly not a fan.)

FWIW, Scala also uses _ for pattern matching and I never found it confusing.

The `_` syntax might not be shorter

Both you and @scottmcm brought this up, but it's not actually part of my argument. I don't care about length so much as what I perceive as inelegance. Consider the comment by @CAD97 here - they're willing to write the super-long .map(Deref::deref).map(T::foo) just to avoid using a meaningless name.

The meaning of `x`

That the name x doesn't have any meaning is, I think, proven by the fact that it's possible to write the same code in pointfree style.

I agree that a complete synthesis of argument destructuring with the _ anonymous argument concept is a further complication that should be sorted out.

rkuhn · January 23, 2023, 6:06pm

Just to highlight the specific confusion I was referring to:

some_fun(x, _ + 2) // this calls some_fun with a lambda as second argument
some_fun(x, _)     // this is a lambda supplying its argument as second argument to some_fun

Given how expressions work (in Scala as well as in Rust), no sane person expects this difference before having tripped over it, fallen, mended their broken nose, and got up again. This is why I argue for not permitting the first one, even if the second one should be added to the language.

emchristiansen · January 23, 2023, 6:24pm

@steffahn, I might be crazy, but I might actually use .map(|ˑ| ˑ.pow(2)) or at least something similarly nameless.

afetisov · January 23, 2023, 6:40pm

Whether something is eagerly or lazily evaluated is not an irrelevant detail, particularly in a low-level language like Rust, which deals with lots of things considered "irrelevant" in higher-level languages. Rust has generally eschewed introducing syntax sugar which could lead to ambiguities, even if some could consider it less pretty.

Well, you've lost the bet. In my opinion, you should strive for clear, concise and concrete interfaces, rather than overgeneralizing them inn pursuit of some subjective elegance. A function which can work fine with a Vec in 99% of cases should just accept a Vec and not IntoIterator. The majority cases which are fine dealing with a Vec can just use it as-is, the minority which has just an Iterator can collect it into a Vec before passing into the function, and the small majority which can't do it for some reason should use a different API.

Remember that Rust isn't Scala. It doesn't have dynamic dispatch in its generics, it doesn't have a JIT to eliminate inefficiencies at runtime, and its syntax reflects those constraints. Overgeneralized interfaces cause more problems than they solve, due to code bloat and increased compile times.

emchristiansen:

Consider again the following example:
users
   .iter()
   .filter(|x| mypredicate(42, x))
   .map(|x| x.name)
   .collect();
Ask yourself - What meaning does the x convey? The answer is nothing - the x could just as well by y.

That's because you chose the names poorly. You can name the variable user instead of x, and suddenly it helps to keep track of the intermediate transformations. If you name your variables x and your functions mypredicate, you just end up with unreadable code. Write it differently, and suddenly the explicit variables are not as useless:

users
   .iter()
   .filter(|user| user_age_is_greater(42, x))
   .map(|old_enough_user| old_enough_user.name)
   .collect();

Or use filter_map, which allows to omit one intermediate variable if you want to.

The minor syntactic conciseness of placeholder syntax doesn't justify the type&parsing ambiguities it can cause. Neither does it compose with Rust's multiple binding modes, and with the importance of understanding variable lifetimes.

Sure, it looks fine in trivial synthetic examples, like all the _ + 1 ones. Isn't as nice with complex nested expressions involving generic functions.

kpreid · January 23, 2023, 7:19pm

I'd like to mention that I frequently do end up writing code like .map(|c| c + 0.5) in my mathematical code. .map(|vector_component| vector_component + 0.5) would not improve its readability, and the most conventional placeholder x would be bad because it implies the x-axis (the thing being operated on is a 3-dimensional vector or point).

(I don't mean to argue for the proposed placeholder syntax; only to remind that “trivial” cases are still things that can appear in real code, not solely examples-for-arguing-with.)

afetisov · January 23, 2023, 7:27pm

There is a common placeholder name it.

CAD97 · January 23, 2023, 8:05pm

To be clear, this was in quick 'n dirty code, and the process went something like

Wrote what I wanted, .map(Thing::foo);
Compiler said "found fn(&Thing) -> Foo, expected fn(&Box<Thing>) -> Foo;
Oh, I need a deref;
Add .map(Deref::deref);
It works, move on.

When I'm in writing "elegant" code mode, I'll prefer |thing| thing.foo() 9 times out of 10. The time I wouldn't requires at least that the "pipeline" is both already multiline and entirely pointfree. Another motivating factor is that the natural thing name (the literal type/field name) is long and/or already bound in the containing scope. (I don't like pure placeholder names in "elegant" code either, but "identity" names are fine.)

Yes please! If the method syntax works the compiler "knows" what I mean (and I think even suggests using the closure form sometimes).

If .call(|a, b, c| ufcs(a, b, c)) works, then ideally .call(ufcs) should work. Though this may have knock-on effects on Inference; are there any places where .call(ufcs) works but .call(|a, b, c| ufcs(a, b, c)) causes an inference error? If not, it might make sense to give both equivalent inference semantics; we already explicitly allow multiple fn() pointers to the same function item to compare unequal^[1], so we don't lose any guarantees there. The new closure item is also still zero-sized and convertible to fn pointer.

Then I suppose you also have method autoref to ask about, not just method autoderef. E.g. calling &self trait methods on Copy types being pipelined by value.

Some problems/difficulties/limitations remain though; I thought of at least:

These coercions can't work if you're using a different trait (e.g. bevy's IntoSystem) than the actual Fn* traits, since the "actual" argument types aren't known... but that's I think just a specific instance of type inference for the closure form not working either.
If it's implemented as just a desugar, it generates useless monomorphization bloat since code which works today gets given a fn(&T) {closure#0} type instead of fn(&T) {crate::foo}.
- Losing the fn item name from the debug typeinfo is also an error message quality regression.
It adds difficulty to any future features allowing naming fn items^[2] (e.g. to implement additional traits), for the same reason other coercions tend to also interact poorly with type Inference.

Other than potential method autoref, would _::foo behave any different than today's <_>::foo? Does that find inherent functions or just trait functions? (Does it find associated items which aren't functions (types, consts)?)

_::foo probably falls under the general feature of "allow type inference in more places" that could allow _::Variant for enum variants or _ { fields } for struct literals, where the type is concretely inferable syntactically before^[3] that position.

... yeah. My PL Formality hat would expect _.call($expr) to desugar as { let _1 = $expr; move |_0| _0.call(_1) } (but applying coercions before binding to _1, not at the call site), which severely impacts when you'd want to use it (similar to the clippy lint shape from .unwrap_or to .unwrap_or_else to defer computation^[4] of the fallback value).

Especially since Rust uses a lot of add encourages using mutable values. Scala is a functional language first (IIUC); Rust is a (data oriented) imperative language first, although one strongly multi-paradigm and encouraging the use of lot of functional techniques.

Which is still a poor choice when in a context where it could be the iterator. (Rust conventions usually use iter for "the" iterator, but that's not universal.)

This is IIRC an area of "nondeterministic but consistent". Any source conversion from fn item to fn pointer will always give the same value (as will copies of that value, obviously), but a separate source conversion of that same fn item to fn pointer gives another nondeterministic value.

.... What's the provenance of a fn pointer? If you have nonitem fn (e.g. from runtime compilation or dynamic linking), can you have two fn pointers which compare equal but one's provenance is invalidated? The more corners I consider about nonstatic fn the more cursed it gets. Despite being able to convert between data and fn pointers, they're very different beasts on the Abstract Machine. The Harvard Architecture seems to be a more correct model, even if machines follow Von Neumann architecture. (wasm doesn't! wasm is fun with its multiple tables/memories.) ↩︎
My favored is just being able to say fn foo in unambiguous type position (i.e. where you don't need turbofish) to name the fn item type. With another partial stabilization, allowing writing <fn foo<Generics,> as FnOnce>::Output ↩︎
The same restriction as field/method access, where the "type must be known at this point" (E0282) even if a latter expression (or even a different control flow arm) would concretize the type. ↩︎
The lint as currently implemented doesn't fire if the expression is const compatible, IIRC, to reduce noise for cheap fallback values like Box::new. As more things become const compatible, though, and we claw back rvalue promotion, this becomes less correct of a heuristic. Will we need some sort of #[clippy::trivial] annotation/analysis for this lint in the future, so it can suggest to add an inline const block for nontrivial fallback computation? ↩︎

mathstuf · January 23, 2023, 9:24pm

This is way off-topic, but I presume this is for rounding. In IEEE, you actually want to use 0.5.next_down() (a nightly feature) because otherwise 0.5.next_down() becomes 1 under a 0.5-using rounding factor due to the round-to-even rule. This is somewhere I would really recommend a method or function to handle (because computers and continuous fields are…not very compatible in the details…<insert QM/GR joke here>).

afetisov · January 23, 2023, 9:24pm

Another place where simple function passing annoyingly doesn't work is with tuple arguments, e.g.

iter::zip(as, bs).map(|(a, b)| Add::add(a, b))

I would prefer to just write .map(Add::add), but I can't, because the input is a tuple rather than a sequence of arguments, and Rust doesn't have currying.

I agree that it would be nice if simple function/closure passing would work in more cases. It's just that coercions are a very scary part of the language, which is hard to reason about. It's very easy to break type inference with new coercions, or worse, introduce ambiguity.

Topic		Replies	Views
Placeholder Syntax language design	22	3812	May 5, 2021
Compact closure args	24	4858	March 25, 2019
Allow type placeholder for item signatures if they constrain the associated types language design	6	2327	March 25, 2019
"asking await" syntax proposal language design	27	6432	September 7, 2019
Idea: allow `\|\|` instead of `\|_\|` in closures that ignore all parameters language design	35	3246	March 25, 2019

Placeholder Syntax - Revisited

Slight tangent

Regarding your main point, that|x| x + 1 isn't too bad

Going general with a binding token (|..|)

Final thoughts

The binding scope for _ is ambiguous, or at least hard for humans to grok

The _ syntax might not be shorter

The meaning of x

Related Topics

Regarding your main point, that`|x| x + 1` isn't too bad

Going general with a binding token (`|..|`)

The binding scope for `_` is ambiguous, or at least hard for humans to grok

The `_` syntax might not be shorter

The meaning of `x`