Weird syntax idea(s) for UMCS

UFCS: Unified Function Call Syntax, i.e. why you can do Type::method(&this, params). UMCS: Unified Method Call Syntax, i.e. allowing you to do foo.function(params) in some fashion for free function instead of requiring associated method lookup.

Some sort of "pipe" operator has been considered desirable for a while, e.g. item |> function(param) translating to function(item, param), potentially including autoref behavior. The thing is, Rust already has a "pipe" operator: the applicative .. The "only" issue is that item.function locks you into method name lookup instead of function name lookup.

So I have two ideas I'd like to see general temperature for:

UMCS.. Importantly, item.function(params) still always gives method lookup, and (item.function)(params) still always gives field lookup. We add a third form: item.(function)(params), which looks up the path function using function/item name lookup rules, and attempts to call it as function(item, params), including applying autoref rules to item if it's "immediately obvious" that autoref is needed. (Autoref is complicated. The rules should mirror method autoref as closely as possible. The main caveat to uses is that if a parameter is generic, autoref doesn't apply and the by-value use of the receiver place is attempted and checked for satisfying the trait bounds, completely ignoring if autoref would satisfy.)

Extension functions. Allow function items to be written with their first parameter binding as self, e.g. fn function(self: String). When doing method lookup (e.g. s.function()), any extension functions imported into scope are included in the name resolution iff the receiver is a potential match (i.e. is a receiver for the type). For the purpose of name resolution priority, extension functions equal traits, and behaves similarly as to if it were defined with a trait. The function is of course still callable with function syntax.

Both have their weirdness and edge cases, but I think both are useful and potentially desirable. item.(path::to::func)(args) looks significantly weirder than item.path::to::func(args), but some syntactic signal to switch from method to path lookup is needed, and I personally think item.(f)(args) is better than item.self::f(args) to get local scope lookup (is that "path self" or "receiver self"?). Extension functions probably need to answer whether a receiver of &Type acts like a trait implemented for Type (i.e. with a &self method) or for &Type (i.e. with a self method); a pure desugar (or proc macro) definitely needs to, but a language feature might be able to paper over the small differences?

14 Likes

You described UMCS as taking a path, but would it be feasible to extend it to any parenthesized expression, just like regular call syntax takes any expression? For one, that would allow using closures and function pointers in method chains.

2 Likes

And to circle the square, allowing parenthesized operators to stand for their functions (e.g. (+) as an alias for (std::ops::Add::add)) would finally allow for natural looking inline arithmetic operators

(2).(+)(3)

I kind of love this idea

let x = some()
    .chain_of()
    .operations()
    .(|x| x + 92)()
    .more_things()

Edit: it has to be .(|x| ...)()

9 Likes

I don't really think that an inline definition of a closure should be used. Instead of

let x = some()
    .chain_of()
    .operations()
    .(|x| x + 92)()
    .more_things();

(note the necessary extra () not present in @toc's post) it would imho be clearer to write this as

let f = |x| x + 92;
let x = some()
    .chain_of()
    .operations()
    .(f)()
    .more_things()

although this might have an impact on closure type inference? Though on the other hand because of autoref support it's somewhat likely that closures can't ever infer their param type from usage in UMCS and need to have it inferable from just the closure definition.

And actually, I think the reason I originally came up with the this.(f)(args) syntax for UMCS was in fact specifically such that f can be interpreted as an expression (analogous to the way parenthesization does so for the function item in UFCS call syntax), permitting the use of things like closures, function items, and function pointers with this syntax.

So yes, it should grammatically accept any expression, and "desugar" $this.($f)($($arg),*) as essentially the equivalent of ($f)(k#autoref $this, $($arg),*), but with the evaluation order {$this, $f, $($arg)*} for the former instead of the latter's {$f, $this, $($arg),*}.

Definitely a good idea imo, though maybe a bit "interesting" to include in the syntax. But unfortunately not perfectly unambiguous, e.g. does (-) refer to infix (sub) or prefix (neg) minus? Would (..) be allowed? It can be a prefix, infix, or suffix operator.

... for some definition of "natural looking". In a bigger chain .(+)(3) is probably preferable to + 3) (and whatever havok that causes to the rustfmt formatting), but standalone like this it looks rather silly imho.

I must say, this looks amazing.

2 Likes

Can that not be inferred from the argument list?

Can you say a litte more about why, please? To me this

reads less smoothly than the version with the closure inline, because it makes me stop and scan back up, possibly scroll back up, to see what f does.

If the closure isn't a one-liner, and/or if it can be given a name more meaningful than f, my opinion might change. In my head it's a trade-off among ease of grokking the entire pipeline, ease of grokking each individual operation, and number of single-caller subroutines to remember and think of good names for, and not something with a single answer strong enough to put in a style guide.

9 Likes

Neg/Sub probably easily, unary range not so much.

More generally if the arity differs it's unambiguous (which is the case for pre/post-fix vs infix) but not when the arity is the same (like prefix vs postfix).

It can, but it requires the (-) function item to be overloaded, which as of current isn't a thing which Rust exposes.

Yes, you can (unstably) implement the Fn traits for different argument lists and it works as an overloaded function item — and you can even observe this possibility on stable (e.g. where F: Fn(A) + Fn(B)) — but as far as I'm aware this isn't a designed capability and it's discouraged to actually utilize it.

The closest to proper overloading that Rust has is generic dispatch, e.g. that you can call both Neg::neg(1i32) and Neg::neg(1i64). But if you assign it to a function item and try to use it like this (e.g. let neg = Neg::neg; neg(1i32); neg(1i64);) you'll get a mismatched type error. Function names can be generic (i.e. appear overloaded), but function values are always concrete (have a single nongeneric signature), even the ZST function items which could theoretically multiply and/or generically implement the Fn traits.

Allowing (-) to be a "function name" overloaded on arity (as opposed to on generic dispatch) would be new functionality to Rust in at least one dimension. Even in the face of generics (and even specialization!) it's the case that a "function name" always logically refers to "one" function (even if that function just immediately does a generic dispatch), and overloading (-) to mean either Sub::sub or Neg::neg is a meaningful departure from that.

Then there's also the fact that postfix deref is desirable miring whether (*) should be mul or deref, and whether a postfix deref (*) should just be a call to Deref::deref (i.e. not work on pointers) or perform a builtin place deref (i.e. work on pointers and not create (and retag, impacting provenance UB) an intermediate reference), and it just gets more complicated.

File it as a future possibility. It can be added to UMCS later without any compatibility issues.

For little one-line closures it's not too bad, but it easily gets out of hand quickly. It's for much the same reasons as clippy::redundant_closure_call.

Furthermore, it can't be written as just .(|x| x + 92)(), it needs to be written like .(|x: i32| x + 92)(), where the parameter type is fully constrained. This is because method call syntax does autoref, so without the type being fully constrained it's ambiguous whether it should be pipe, pipe_ref, or pipe_ref_mut (self, &self, or &mut self).

Combining with extension functions, it could perhaps be written as .(|self| self + 92)() to remove the autoref ambiguity, but this is now again another new mostly-orthogonal feature, and almost certainly the wrong meaning for self in closures[1].


  1. |self, ...| should imo mean the closure's self receiver, a) permitting explicitly controlling whether you get FnOnce (self), FnMut (&mut self), or Fn (&self), and b) allowing closures to be recursive (borrowck permitting). ↩︎

3 Likes

About Extension functions

Unfortunately, self create ambiguity. But trait is not.

item.trait::f(args)
1 Like

One thing to highlight is that Rust almost doesn't need pipe syntax and this idea shows why: traits and how you can add methods to any type almost satisfies every usecase, and the . is as you say the pipe operator. As seen in how we use iterators in Rust.

Why functions, wouldn't there always be a way to write the same thing with a trait method? Any real-world examples?

Personally some flavors of custom operators as functions do come to mind. I'm sorry if this is too unstructured.

Case: Iterator .zip()

  • As a method, a.zip(b) requires a to be an Iterator but b IntoIterator
  • As a function zip(a, b) both a and b can be IntoIterator.

This has been implemented in std and shows the slight freedom given to functions (and nice symmetry)

Example use, then:

    vec![1, 2, 3]
      .(std::iter::zip)([1, 2, 3])

Case: Custom operators

Follows the previous example. The same advantage, instead of strict receiver type, we can use a “convert to operand” trait bound for all function arguments including the first.

a.(matrix_mul)(b).(vector_mul)(c)
fn matrix_mul<A, B>(a: A, b: B) -> ..
where
    A: LooksLikeAMatrixToMe,
    B: LooksLikeAMatrixToMe,

This reminds me of the infix function call syntax proposal: Why Rust does not provide several custom operators? - #13 by bluss - help - The Rust Programming Language Forum but the syntax in this thread, while maybe not prettier, at least doesn't have leave any doubt about operator precedence.

2 Likes

As a user of rust (as opposed to a developer on rust itself), this would be very nice. One use case where I ran into this was when using itertools. Specially the intersperse method from itertools. There is an unstable intersperse method in the std iterator trait.

If I don't use the fully qualified name for the itertools intersperse I got a warning (forget if it was clippy or rustc). If I do use the fully qualified name, I can't use the chain of method calls (i.e. a.iter().something(...).intersperse(...).otherthing().collect()).

Currently I'm forced to break up the chain by either writing it as Itertools::intersperse(a.iter().something(...), ...).otherthing().collect() which means the flow of reading code reverses direction in the middle, which is bad for readability. Or I have to have to stop halfway and assign to a temporary variable. Which is better, but still introduces extra mental load on whoever is reading the code and trying to figure out why I did this seemingly unneeded step.

2 Likes

Couldn't you also just silence the warning?

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.