Idea: Simpler method-syntax private helpers

Reading the discussion about infix method calls made me wonder how much of the problem is that it’s just annoying to make a private helper function that uses method syntax.

For example, suppose you wanted to write Vec::sorted for yourself. Today it looks something like this:

trait VecEx {
    fn sorted(self) -> Self;
}
impl<T: Ord> VecEx for Vec<T> {
    fn sorted(mut self) -> Self {
        self.sort();
        self
    }
}

What if, instead, you could just write this? (Note the self.)

fn sorted<T: Ord>(mut self: Vec<T>) -> Vec<T> {
    self.sort();
    self
}

That would expand to something like this:

mod anonymous239048724 {
    pub trait ExtensionMethod<T: Ord> {
        fn sorted(self) -> Vec<T>;
    }
}
impl<T: Ord> anonymous239048724::ExtensionMethod<T> for Vec<T> {
    fn sorted(mut self: Vec<T>) -> Vec<T> {
        self.sort();
        self
    }
}
use anonymous239048724::ExtensionMethod as _;

That expansion means the trait cannot be named, so it’s only usable as a method and cannot be exported.

Thoughts?

1 Like

At some point I had the idea of using a special member operator to simply allow calling a free fn as a method:

fn sorted<T>(mut vec: Vec<T>) -> Vec<T> {
    vec.sort();
    vec
}

let result = vec![2, 3, 4]
    .^sorted();

I liked that variant because you could easily make it call closures as well.

The intriguing part about your proposal is that that might be possible today with something like an #[extension] attribute rewriting the fn to a trait.

A simpler (from the user’s perspective), but more radical approach could be to change the method call syntax resolution algorithm such that when resolving:

receiver.method(args..)

if every resolution method (inherent impls, trait methods) fails, the compiler will look for a free function in scope:

method(receiver, args...)

This requires zero changes to user code, and it should be backwards compatible since it is used as a fallback.

Another addition, for when the function is not in scope is to allow the caller to write:

receiver.path::to::method(args..)

This also has benefits for traits.

receiver.Itertools::flatten()

The first proposed mechanism has the following benefits:

  • It is general
  • It requires no changes to user code

It has the following drawbacks:

  • It could lead to questions for an unsuspecting reader:

    “where is this method defined? uh… can’t find it; <wait a bit> ooh, it is a free function!”

    I believe the main proposal in this thread should also suffer these drawbacks. (?)

    A mitigating factor is RLS and IDEs which can make it easier to find the definition of the called function.

Unresolved questions:

  • Does this interact in weird ways with Deref-coersion?
1 Like

In the interests of prior art, there is (was?) a very similar proposal for C++ that went by the name "unified call syntax": https://isocpp.org/blog/2016/02/a-bit-of-background-for-the-unified-call-proposal As far as I know, this hasn't gone anywhere, but I can't find solid evidence of that or what the reasoning was for not pursuing it.


The first time I read your post I completely missed the self and almost started writing a "no marker at all would be too magical" response. So I think this specific syntax is a little too subtle.

But between special call syntax (or "method cascading" as this thread called it) and implicit fallback to free functions, I'm not sure I have a preference. I suspect both have massive ambiguity landmines associated with them. But more importantly, I need convincing that they would significantly improve enough code that it's worth the added language complexity, even if there were no landmines, so I think we're blocked on some kind of data gathering exercise like this guy did.

Incidentally, can this be done with a macro today? It sure seems like it should be possible.

1 Like

As a canonical example, the Itertools trait could / would go poof and you would just have the free functions in that crate.

Might require a gensym mechanism for the name of the module?

EDIT:

This happened to me too; Tho, it could just be my current habit.

What stuck with me from Bjarne's isocpp post was:

Furthermore, David Vandevoorde pointed out that because of the two-phase lookup rules having x.f(y) find f(x,y) would complicate the task for compiler writers and possibly be expensive in compile time.

I'd like some input from compiler engineers on how bad this aspect would be for Rust.

One advantage of my proposal not actually making free functions is that one only needs to meet trait method resolution rules, not "only one function of that name" rules of functions, so this is allowed:

fn sorted<T>(self: Vec<T>) -> Vec<T> { ... }
fn sorted<T>(self: VeqDeque<T>) -> VecDeque<T> { ... }

One could also consider that a disadvantage, I suppose. But it must still meet all the same rules as "method overloading" does today, so it's not introducing anything new, and the chance for confusing here feels lower to me (especially since they're private) than the annoyance of needing to make different names.

(Similarly, use stats::minmax; use itertools::minmax; has a name conflict issue that use itertools::Itertools as _; avoids, assuming they apply to different types -- and it's no worse if they're on the same type.)

I was hoping that, since self is a keyword, standard highlighting would help distinguish

fn sorted<T>(self: Vec<T>) -> Vec<T> { ... }

from

fn sorted<T>(v: Vec<T>) -> Vec<T> { ... }

I certainly agree that no marker for the proposed rewrite is unacceptable :stuck_out_tongue:

1 Like

It’s almost possible to fake extensions as a macro_rules macro:

Try It Online!
#![feature(macro_at_most_once_rep)]
#![feature(macro_vis_matcher)]

macro_rules! extension {
    (
        $vis:vis fn $name:ident $(<$($generics:ident),*>)? (
            mut $this:ident : $This:ty,
            $($args:tt)*
        ) $(-> $ret:ty)?
        $(where $($where_lhs:ty : $where_rhs:path),* $(,)?)?
        { $($body:tt)* }
    ) => {
        #[allow(non_camel_case_types)]
        $vis trait $name $(<$($generics),*>)?
        $(where $($where_lhs: $where_rhs),*)?
        {
            fn $name (self, $($args)*) $(-> $ret)? ;
        }
        
        impl $(<$($generics),*>)? $name $(<$($generics),*>)? for $This
        $(where $($where_lhs: $where_rhs),*)?
        {
            fn $name (self, $($args)*) $(-> $ret)? {
                let mut $this = self;
                $($body)*
            }
        }
    }
}

extension!{
fn sorted<T>(mut this: Vec<T>,) -> Vec<T>
where T: ::std::cmp::Ord
{
    this.sort();
    this
}
}

fn main() {
    let v = vec![3, 2, 5, 1, 4];
    print!("{:?}", v.sorted());
}

You could probably make it a bit better, but the current limitations:

  • Generics list is overly restrictive: you can only have $(<$($:ident),* $(,)?>)? (comma separated identifiers)
    • This is solvable but I didn’t feel like spelling out the generics list grammar further
  • Would require a separate match arm for each of self, mut self, &self, &mut self, and any other combination that’s technically legal
    • Again, trivially solvable
  • The where grammar is probably too restrictive
    • Probably solvable? I’m not sure, it took me a while to find one that worked for this case and generalized
  • Probably some $:ty captures should be $:path to better align with Rust’s grammar.

So it’s quite possible, even within just macro_rules, but not exactly friendly and requires re-implementing the grammar for function declarations. It’d be much clearer as a #[extension] proc-macro attribute, anyway, assuming those are allowed to remove + rewrite what they’re bound to.

Actually, I’d be happy to publish that as a crate if someone’d be willing to help me iron out the edge cases (though that’d mean I’d have to spell out all of the macro_at_most_once_pat repetitions… so maybe I’d have to block on that being available).

1 Like

Do we actually need this feature? My gut feeling is that we probably don’t, for two reasons:

  • free standing functions are not just good enough, they are great because they use the minimal amount of language mechanisms to achieve abstraction.

  • method resolution in Rust is a hilariously complicated process, probably even more complicated then in Scala (in Scala, you import implicits, in Rust, you import traits, but have to look for impls all over the project). So we’ve probably blown past the complexity budget in this area already.

2 Likes

I think we need something that solves the problem here; (because I do think there's one, and the isocpp post confirms it in my view).

Free functions are awesome yes; but they are at a disadvantage due to the inability to call them infix with method call syntax, which helps to starve off parenthesis build-up. In my view, free functions are much nicer to work with in a language like Haskell.

The technical complexity budget, or the one for learning the language? Are you differentiating between mine and @scottmcm's proposals here or does your argument apply to both, and if so, why?

I think that doesn't happen all too often (?); when it does, the vec.Vec::sorted(..) syntax can be used -- or just going back to calling free functions as they were. Admittedly, it isn't a silver bullet, but I think it solves more problems in practice.

Sure; but as you know, that also limits the usefulness severely. For me it becomes a bit "meh" to have it only work for private stuff. Having to write self as the argument is also a new rule that the programmer must learn and the non-public stuff is far from obvious.

The more universal fun(foo, bar) <=> foo.fun(bar) seems like an easier rule to remember to me and has more expressive power and gives the language uniformity from a user's perspective. Presumably, this would also allow the user to even call associated functions of traits infix like: recv.Type::bar() or even recv.<Type as Trait>::bar().

(I'll start calling this the Uniform Method Call Syntax (UMCS))

1 Like

There is clearly a problem solved here: Currently utility crates have to a kitchen sink Trait like Itertools or FutureExt°. If you’re using two of these crates and they both decide to add a method with the same name to their Trait (both non-breaking changes) you suddenly get an error because it’s suddenly ambiguous. If these utility methods could be imported one at a time as needed, such an error cannot occur. Playground example

@Centril But, I must say that I’m a bit skeptic about whether this should be enabled for all functions. It’s really not intended for all functions. In particular these functions that act as methods need to be listed appropriately inside the documentation and therefore be marked accordingly. Also you can’t have more that one free function with any particular name in scope (unless you use as to give it another name).

@scottmcm One problem I see with your proposal is that importing the helper function looks exactly like importing a normal function. Instead, it is really a method and has to be called like a method. That seems a bit weird. Edit: You define it as private. Hadn’t read that when I wrote this. Hmm. This makes it less useful…

I’m not so sure whether either of these ideas fit nicely into the language. (Edit: Assuming it works for public helper functions) Maybe it’d be better if utility crate authors defined one trait per method instead of one big trait with all methods.

°Edit: I should clarify. I list FutureExt here more as placeholder for future (pun intended) crates with combinators for Futures. It’s likely that we’ll get a few of these crates.

How so? In Haskell, every function I write is subject to infix notation; I don't see why some functions should be called with free notation and some with method notation.

If everything is a method, then it doesn't really need to be mentioned in any standard library documentation (of course, it would be taught in the reference and the book...).

What I'm suggesting we do is eliminate the distinction "function" and "method" entirely, making the language more uniform :wink:

I also think there's a knock-on benefit that you can write: foo.Arc::clone() (even if you don't have to) which has a smaller delta than between foo.clone() and Arc::clone(&foo) but is still explicit.

Sure; That is a drawback! But I'm not sure it arises terribly often (we could experiment and see if it does...), and as you mention, you could use foo as bar to fix this or disambiguate with receiver.path::to::fun(args..).

Because the distinction between a free function and a method is generally helpful. Here's a free function that doesn't make sense as a method:

fn write_file(filename: &str, text: &str) { ... }
...
"text.txt".write_file("Lorem ipsum..."); // Weird

The self argument of a method serves a dual purpose: It's not only an argument, it also serves as a kind of logical context. E.g. person.talk() makes more sense than talk(person) because talk is something a Person can do. This distinction might not exist in Haskell, but it exists in most main stream programming languages with which the apps that we use everyday are built (C++, JavaScript, Java, Swift, etc.). I think this distinction is especially helpful when talking about the function, e.g. in documentation. Throwing it away is IMO not a good idea.

That implies that impl blocks are optional. Otherwise there's still a distinction.

Currently everything is done through traits. So, you won't find many clashes today. Instead let's image Rust in two years from now: Stream (async iterator) is now stable. Helper methods for it and Iterator are going to clash all the time.

I've found something interesting: Kotlin's extension functions

Also C#'s extension methods, where you use the this keyword to mark something as method-callable.

1 Like

It’s hard to quantify, but I personally don’t think this problem is really significant on average (for cases where it really is, you already can define an extension trait). Like, if you have many nested parenthesis, introduce a local variable!

I also don’t think that C++ motivation directly applies. In C++, it is very important to have uniform syntax because templates work syntactically. For example, C++11 specifically added begin(container) form, in addition to container.begin(), because only the former works for arrays (I haven’t double checked this, might be wrong), and you want to write a template which applies both to arrays and STL containers. I think this might be the case for D as well?

The third one: complexity of reading the code for an experienced Rust programmer. If I see x.foo() in Rust, it is already pretty hard for me, as a human, to understand what piece of code is called. Adding more rules there would make this even more complicated. We’ll add specialization to the mix in some form, which would make matters worse, but specialization expands expressively, and is not just minor syntactic convenience.

4 Likes

Note that one of the design goals of Kotlin is to be able to express type-safe embedded DSL (example: https://kotlinlang.org/docs/reference/type-safe-builders.html). Extension functions, extension lambdas, implicit receivers, Ruby-like “return from enclosing function” semantics of lambdas, special crafted lambda-as-a-code-block syntax all work together, in synergy, towards that goal. Extension functions are there not simply to make syntax nicer, they are a part of the mechanism to so achieve a specific language goal.

I’ve written a fair amount of code in Kotlin, and, when not specifically building a DSL, my preference was to use a free function, because they are easier to understand at the call site :slight_smile:

2 Likes

This example doesn't seem the least bit weird to me. The decision process for where "object sends message" and "send object message" should be used feels quite arbitrary.

What is it that you are trying to document?

This is a distinction with respect to overloading, and you can still deal with that by holding IntoIterator generic in the case of itertools. Notably, impl Foo { .. } and impl Bar for Foo { .. } also differ since you must use in the latter case.

This could be mitigated by having a trait resolution mechanism where the last / most locally imported trait / function wins instead of erroring; This also improves the itertools => libstd situation (.flatten(), etc.). I'm wary of prognosticating at this point.

This is interesting indeed! This could be a better solution.

This cure is worse than the plague for me and exactly what I'm trying to avoid. I profusely dislike temporaries (and will start naming them with one/two letter identifiers eventually..).

One potential option here is to just allow ~ as a character in identifiers, and possibly provide some mechanism to derive these pass-through methods automatically for specific functions or types. At that point using vec.~sorted() as a functional-style passthrough would be little more than a matter of convention.

This would also make these things easier to explain and document, since it wouldn’t be a new operator, just a new name for a function. It could then also be implemented manually, in case there is some reason to do so.

“text.txt”.write_file(“Lorem ipsum…”) looks weird because it doesn't work like a method on &str. For example to_lowercase is a method that operates in the logical context of a &str and so it makes sense that it is called using the method call style. This however is instead a function which does not operate in the logical context of a &str. Instead it is concerned with writing a file. It just happens to take a &str as filename parameter.

I'm talking about rustdoc being able to create a good index for everything. Listing the write_file function from my example under str doesn't make sense. However, rustdoc wouldn't be able to tell.

1 Like

@MajorBreakfast If you write it as "Lorem ipsum".write_to_file("text.txt") then I think it feels more natural. In the context of a file, "text.txt".append("Lorem ipsum") makes more sense; You have to think about which parameter should be first (what is the receiver?) in any case and a proper naming from the perspective of the receiver.

I don't see this distinction of free or not as mattering for the notion of a logical receiver, a message to send and the configuration of the message (arguments).

Of course, using free functions could be a way to say "this really doesn't belong to either context" but I find that seldom applies (even in Haskell).

rustdoc could do a search for free functions where the first argument is of the relevant type and then list those in a particular section on the type. In general, rustdoc could do a much better job of allowing you to find relevant free functions that operate on / return your type, irrespective of whether UMCS would get implemented or not.

Compared to http://hoogle.org/ and https://hayoo.fh-wedel.de/ I find our mechanisms for type based searching in documentation to be lacking at the moment.

Exactly. And here is where we disagree. To me a string slice cannot be seen as the receiver. If you really wanted to define a receiver, then it'd be the file system. But, not a string slice.

Probably yes. However, and I continue to use my example, as I say above, I consider the write_file function not relevant in the context of string slices. It feels unrelated and its first param doesn't act directly as the receiver. I think it should not be listed there. Listing it would just obscure the relevant stuff.