[Pre-RFC] Unify bindings to callables


#1

[Pre-RFC] Unify bindings to callables

See also rust-lang/rfcs/issue1287 for more background and some discussion.

Summary

The purpose of this RFC is to improve the ergonomics and teachability of Rust by easily allowing the creation of bindings to object methods, that can be called with the same argument as when using the method through the object:

let result = object.method(args...);
let object_method = object.method; 
assert_eq!(result, object_method(args...));

This is achieved by desugaring a binding to an object’s method object.method into a closure that appropriately captures its environment |args...| object.method(args).

This abstraction is zero cost, and symmetric to the creation of binding to free functions and their usage.

Motivation

Consider the following piece of semi-generic code (bear with me a bit):

fn apply0<U,         F: Fn()       -> U>(f: F                ) -> U { f()       }
fn apply1<U, A0,     F: Fn(A0)     -> U>(f: F, a0: A0        ) -> U { f(a0)     }
fn apply2<U, A0, A1, F: Fn(A0, A1) -> U>(f: F, a0: A0, a1: A1) -> U { f(a0, a1) }

It defines functions, that take another function as argument, and apply some arguments to it by calling the function with the arguments (these are called high-order functions: functions that take functions as arguments; they are used everywhere, e.g., Option::map). Using these high-order functions with free functions is very nice in Rust:

fn f0(              ) -> i32 { 2     } 
fn f1(x: i32        ) -> i32 { x     } 
fn f2(x: i32, y: i32) -> i32 { x + y } 

assert_eq!(2, apply0(f0));
assert_eq!(2, apply1(f1, 2));
assert_eq!(4, apply2(f2, 2, 2));

However, if we are given a struct S with the following methods:

struct S { x: i32 }
impl S {
 fn f0(&self                ) -> i32 { self.x         }
 fn f1(&self, x: i32        ) -> i32 { self.x + x     } 
 fn f2(&self, x: i32, y: i32) -> i32 { self.x + x + y } 
}

we need to qualify the method impl using S:: and pass an object value with the appropriate “reference-ness”:

assert_eq!(2, apply1(S::f0, &s)); 
assert_eq!(4, apply2(S::f1, &s, 2)); 
// there is no apply3 to call f2!

This differs a bit of how we would normally call those functions, e.g., s.f1(2) vs S::f1(&s, 2). We can get back the normal method can syntax by using closures, which are a bit more flexible:

assert_eq!(2, apply0(|    | s.f0(    )      ));
assert_eq!(4, apply1(|x   | s.f1(x   ), 2   ));
assert_eq!(6, apply2(|x, y| s.f2(x, y), 2, 2));

Still, wouldn’t it be even nicer if we could just write:

assert_eq!(2, apply0(s.f0));
assert_eq!(4, apply1(s.f1, 2));
assert_eq!(6, apply2(s.f2, 2, 2));

This RFC proposes to allow this, by desugaring the object.method calls into the closures of the previous example. While one can always write those closures by hand, that is a tedious task that the compiler can do for you. This RFC specifies how the compiler does it.

Detailed design

A binding to an object method let f = object.method desugars into a closure that captures the object appropriately in its environment: let f = |args...| object.method(args...) iff there is no struct field that shares the same name as the method (otherwise it produces a binding to the field). Depending on the method signature: method(self, ...) | method(&self, ...) | method(&mut self, ...), the object is either moved into the environment (in the case of self), or captured with a &Type or &mut Type reference (for &self and &mut self, respectively).

This is a backwards compatible improvement to the Rust language, since referring to an object method via object.method is currently not valid Rust syntax and is rejected by rustc.

A zero-cost abstraction

This abstraction does not introduce any extra cost because object.method desugars into a closure (not a function pointer or a fat pointer) which has its own unique anonymous type.

This means that the object’s method is always statically dispatched within the closure, and that inlining the closure completely eliminates the abstraction. Whether this is profitable will depend on what the program is being optimized for.

It is worth remarking that while the closure environment will contain either the object, or a reference to it, dispatching on a value or a reference is exactly what Rust methods always do. The closure does not introduce any new kind of indirection. Even when one binds on a trait object’s method using trait_object.method, the closure still does perform static dispatch on the trait object; the dynamic dispatch is then performed by the trait object itself.

Finally, one can type-erase the closure e.g. using a trait object, which would introduce dynamic dispatch before calling the closure, but the closure itself would still perform static dispatch on the object’s method.

@nagisa made a mention of function pointers in the issue, and I want to remark that this solution is potentially more efficient that storing a struct with a value/reference to an object and a binding of the method to a function pointer, and then using that to dispatch, because the function pointer essentially type-erases the method (although the compiler is probably able to see through this, it might not always be true through ABIs without LTO and whole program optimization).

Corner cases

Field with the same name

Having a field with the same name of a method is valid Rust. To preserve backwards compatibility we must return bindings to the struct field in this case.

This corner case can be mitigated by providing lints in rustc or clippy that warn on (public) methods / fields sharing the same names. When the user creates a binding to a struct field that does not type check, rustc could look for methods of the same name and provide an useful suggestion (did you mean to create a binding to this method? write |x, y, z| obj.method instead".

Adding a public field to a struct becomes a breaking change

Before, if the type had private fields, adding a new public field was not a breaking change. Since adding new public fields can collide with method names, and these will override them, adding a new public field becomes a breaking change.

TODO: These two drawbacks warrant doing a crater run to evaluate which part of the ecosystem is using struct fields with the same name as struct methods. With this information on hand, and depending on the result, it might be possible to propose an RFC that deprecates this behavior, which would further reduce the impact of these two caveats on rust programmers.

Multiple methods apply

If multiple methods apply, and the binding can be uniquely deduced from its points of usage, then the binding is uniquely determined. Otherwise creating the binding is ambiguous. In this case rustc could suggest the available bindings and the syntax required to disambiguate them (e.g. |x,y,z| Trait::method(object, x, y, z)).

How do we teach this?

Creating a binding to any kind of function in Rust returns a callable that can be called with arguments that the syntax used expects. Since a method can be called with object.method(args...), creating a binding using object.method returns a callable that can be called with (args...). A method or associated functions can be called with Type::function(other_args...), creating a binding using Type::function returns a callable that expects other_args....

Drawbacks

Besides increased complexity in the language, compiler… (as with every new feature), the main drawback is the special rule that “if a struct field of the same name exists, the struct field is preferred”. Lints and suggestions can reduce the impact of this drawback.

Another drawback is that there is arguably more happening implicitly when one creates a binding to a callable. Before one might need to manually create a closure, state the closure arguments, etc. With this RFC the compiler figures it out for you. This will borrow or move the object value, arguably, more silently than if the closure would have been written by hand. Compiler errors will catch these problems though, and arguably, closures already capture their environment “silently” (ehm… I mean… “appropiately”).

Alternatives

Do nothing.

Unresolved questions

TBD.

Acknowledgements

Oliver Schneider raised the issue of collision with struct field names and proposed a solution that is used in this RFC. @petrochenkov opened the original issue in the RFC repo and proposed some alternative solutions, one of them (variant 1) is the solution pursued in this RFC. @ker for raising the static vs dynamic dispatch issue and the issue of the layout of the closures capturing their environment. @Nemo157 for convincing me that object.associated_fn was a very bad idea.


#2

Generally endorsed. Some explicit discussion of lifetime issues would be nice (does this make it easier to get into one of the tangles discussed at Accepting nested method calls with an `&mut self` receiver , for instance?)


#3

These can already be represented by Type::function. I’d prefer that not to be possible, because object.method shows that something dynamic is going on, while object.function() doesn’t even work in Rust right now.

implementation detail, but it could simply turn into a fat pointer of Fn type with the pointer part pointing to the object and the vtable to a closure vtable with a pointer to the method. No intermediate closure object necessary.


#4

I’ve thought about this a bit, but I do not really exactly know what makes sense to mention here. One situation I have in my mind is the following.

fn foo(obj: &mut T, Type{option, enum_}:Type) -> Type {
  match enum_ {
    E::A => Type{option.map(|v| obj.foo(v)), E::A },
    E::B => Type{option: option.map(|v| obj.foo(v)), enum_(obj)},
  }
}

we could try to “simplify” this code by moving the closures out of the match:

fn foo(obj: &mut T, Type{option, enum_}:Type) -> Type {
  let m = |v| obj.foo(v);
  match enum_ {
    E::A => Type{option.map(m), E::A },
    E::B => Type{option: option.map(m), enum_(obj)},
  }
}

but now we run into life-time issues because m borrows obj mutably for the whole scope of foo, but we also want to use it in case B when we pass it to enum. However, using this RFC and doing:

fn foo(obj: &mut T, Type{option, enum_}:Type) -> Type {
  match enum_ {
    E::A => Type{option.map(obj.foo), E::A},
    E::B => Type{option: option.map(obj.foo), enum_(obj)},
  }
}

we don’t run into any life-time issues, since after desugaring we have the original code.

That’s a good remark, but that intuition is wrong. The binding object.method desugars into a closure that captures its environment as usual. That means that depending on the signature of the method self will be moved into the closure, or the object will be captured by storing a &self or &mut self reference in the closure. In those two last cases, the closure is not a fat pointer: it is just a single pointer, either &self/ or &mut self.

Since the method is then statically dispatched, a pointer to the method is not required. In fact, the proposed desugaringcan never result in dynamic dispatch. To get dynamic dispatch, the user would need to either explicitly store the closure in a trait object, or create a binding to a trait object’s method trait_object.method. But in this later case, the closure is still doing static dispatch, it does so on a reference to a trait object, and this reference performs then dynamic dispatch.


#5

I assume an associated method here means an associated function based on it capturing no environment (I wasn’t able to find anywhere else that refers to associated methods)?

I’m against being able to use an object to get a reference to an associated (non-method) function, the existing Type::func syntax or introducing an explicit typeof(obj)::func syntax seems preferable to distinguish between “methods” and “associated functions”. I’m also not sure why they would need to desugar to a closure, instead of just being the function pointer like normal.


#6

@Nemo157 said more what I wanted to say in a better way.

One additional issue not mentioned is that adding a public field is a breaking change. Before, if the type had private fields, adding a new public field was not a breaking change.


#7

I’ve replaced all mentions of “associated method” with “associated function” (sorry about that, I thought I had removed them all but I’ve missed actually a lot).

A closure without environment is just a function pointer. Having said this, you and @ker raise good points here: Type::func already creates a callable that can be used exactly as one would use it at the original call site. The only advantage of introducing object.associated_function is that one would not need to look up the type of an object to create such a callable, but given that object.associated_function(args...) is not a valid call syntax for associated functions, allowing one and not the other does not make sense.

I think it would be out of scope for this RFC to argue about allowing object.associated_function(args...) syntax for associated functions, so I am going to strip the mention of associated functions from the pre-RFC.


#8

@ker @Nemo157 so I’ve removed the mention of associated functions except from the “How do we teach this” section, which I think it’s gotten even simpler.

@ker could you check if the “corner cases.layout” sub section answers the questions you had about layout / fat pointers / etc. ? I don’t know if it is better now.


#9

Nit: Nobody knows what “binding” is exactly, there’s no clear definition and everybody has their own understanding. I suggest to avoid it and use human language.