Looking for RFC coauthors on "named impls"

I’m planning to write an RFC on named impls and I am looking for collaborators on this RFC. If this interests you and you want to reach out for more direct conversation - you can find me on #rust-lang @ irc.mozilla.org (CET timezone mostly).

What are named impls?

Consider a Monoid trait. For usize there are more than one valid monoids:

  • (usize, 1, *)
  • (usize, 0, +)

But which do we impl for? Well - what if we didn’t have to pick? With named impls, you can do:

mod my_mod {
extern crate frunk;

use frunk::monoid::Monoid;

impl Sum of Monoid for usize { .. }
impl Product of Monoid for usize { .. }

pub use Sum as _;  // Reexports Sum as anonymous (which are all hitherto valid impls).
}

In another module you may then:

use my_mod::Product; // Imports Product into the module scope.

fn main() {
   let x = <usize as my_mod::Sum>::combine(1, 2); // UFCS
   let x = <usize as my_mod::Sum of Monoid>::combine(1, 2);  // Equivalent
   let x = usize::combine(1, 2) // refers to <usize as my_mod::Product>::combine(1, 2);
}

Named impls must be referred to explicitly (including via UFCS or importing explicitly), therefore there should be no backwards compatibility issues. The named impls feature is also a sound (hopefully) escape hatch out of the orphan rule.

There are plenty of issues and kinks to work out, including:

  • is this always sound?
  • how do we deal with trait objects?
  • what is the syntax when generics are involved both for UFCS and defining impls.
  • what is the syntax for associated items?
  • how do we make this maximally ergonomic?

Thus I can’t promise that we will solve everything and make an RFC PR. However, it will be a fun experience =)

2 Likes

This could be “sugar” for this:

trait Monoid<T> { .. {

struct Sum;
struct Product;

impl Monoid<Sum> for usize { .. }
impl Monoid<Product> for usize { .. }

In which case, it is sound, but - in my opinion - hard to justify as sugar when you can just do it with type parameters.

In contrast, if you are allowed to create named impls for traits which have not explicitly opted into having named impls, that is decidedly unsound. The common explanation for this is the “Hash table problem” - HashMap assumes that <usize as Hash> will resolve to the same impl in every context. If it doesn’t our HashMap is unsound.

That is to say that coherence is an invariant that unsafe code is allowed to assume. Given the same set of input types, a trait instantiation will always resolve to the same impl. If a “name” is just another input parameter, then that’s fine, but again, not more expressive than what exists today.

You could search “named impls” on this forum and find previous discussions.

7 Likes

With the sugar idea, is the trait “copied” or does the original trait have to support a name parameter? The sugar idea is nice - and of course it is not more expressive then what exists today - but I think it would be more ergonomic than creating a bunch of newtypes and then impl:ing all the traits the base type had save for one.

With GeneralizedNewtypeDeriving (Haskell) newtypes would be less unergonomic, but it still incurs a lot of boilerplate.

Very interesting! A few months ago I came up with basically the same idea on Reddit. Luckily I found this thread.

Rambling about motivation

I really like this general idea because it is very similar to how we treat name conflicts. As I said on Reddit:

[...] That would shift the problem of "no overlapping impls in the whole world" to "no overlapping impls in scope". Which is actually exactly what we do with names: same names are allowed to co-exist in the world as long as they are not in scope at the same time.

And having similar rules for different parts of the language always helps with learning and makes the language feel more consistent IMO.

Additionally, I think a solution is really needed. Today, we need to force crates to be coupled although they shouldn't need to know about one another. One typical example is chrono (time and date types) and diesel (database abstraction). diesel has traits like ToSql and FromSql that need to be implemented to store/load something from the database. So who is responsible for implementing diesel::ToSql for chrono::Date? Neither of those crates! But due to orphan rules, there doesn't exist a good solution. So right now, diesel has a chrono feature and implements all traits for chrono types if the feature is activated.

This coupling already lead to a couple of breakages across the webdev eco system. Also: what if a new chrono alternative comes along?

Having a named, importable impl in a third crate would solve this problem nicely.

Rambling about problems

Anyway, apparently many people already came up with that idea:

One main problem that is always brought up is basically the hashmap problem (as @withoutboats also brought up here): a hash map has to be able to assume that the Hash implementation for its key is always the same. At least it always has to be the same within one instance of a hash map. It would be fine to have two hash map instances where each instance uses a different hashing algorithm.

A possible solution that is often proposed is to store not only the type (of the key) but also its Hash impl in the hashmap type. So the compiler internal type wouldn't be HashMap<usize, String> but HashMap<<usize as Hash with $this_specific_hash_impl>, String>. That would make it possible to always use the same impl, regardless of what other impls are imported into scope later.

Sadly, this becomes complicated when the bound is not present at instance creation. For example:

let mut v = vec![0u32, 1, 2, 0]; // no bounds on `T` when creating a `Vec`

{
     use SomePartialEqImplForU32;
     v.dedup();
}
{
     use AnotherPartialEqImplForU32;
     v.dedup();
}

The Vec::dedup() method is in an impl<T: PartialEq> Vec<T> block. We don't know about this bound at object creation.

To solve this (always use the same impl of all traits for all generic types), one could:

  • Store a list of all trait impls of T that are in scope at creation time in the Vec type
  • Store the point of creation in the Vec type and lookup each trait impl lazily.

However, this doesn't sound too great.

Unfortunately, there are more problems. Consider the following function:

fn foo(v: &mut Vec<u32>) {
    v.dedup();
}

This function is not generic, so we would expect it to be compiled exactly once and result in only one version. But if we store more information about the impls in the type of a HashMap or Vec, functions like foo() would basically be generic.

Furthermore, this is spooky action at a distance: only looking at the foo definition, we might think we know exactly what's going on in the function. But that might not be the case, because we pass hidden "behavior" into the function. That's probably not a good idea.

So maybe it's not a good idea to store any information about specific impls in the type? Maybe that should always be resolved at call site?


I think my point is: given my text above and what @withoutboats said about unsafe code being able to assume coherence, I'm pretty sure that the only way to make named impls sound, backwards-compatible and not dangerous is by annotating the trait itself. In other words: the trait has to allow named impls. And by allowing named impls, the programmer is restricted in certain ways of using the trait.

I'd love to see development in this area as I honestly think Rust is too restrictive and thus lacking in this regard. However, while reading the other threads and writing this (too long, sorry!) post, I noticed that there are in fact quite a few problems. But I hope someone is willing to dive into this to find a solution.

As for being coauthor: I don't think I know remotely enough about type theory, the compiler internals or other languages like Haskell for this. And available time is also a problem. But I'll certainly keep an eye on this discussion :slight_smile:

1 Like

I like the idea of “implicit newtypes” - conflicting impls of Hash can coexist, but their respective HashMaps are incompatible.

HashMap<Foo + my_mod::MyHashImpl> vs HashMap<Foo + AnotherHashImpl>

(I’ve mentioned this previously)

If you want your function to allow arbitrary named impls, either make it generic, or add a + _ or something to the type. I don’t believe named impls are useful outside generics.

I pointed out previously that "named impls" can already be implemented using an additional "Name" parameter on the trait, which allows any crate to define a new "Name" and then implement the trait using that name for the type. In other words, the feature already exists, with exactly the semantics it would have if it were implemented with a more obvious syntax.

And yet its very rare to see anyone do this to get around the orphan rules. Possibly its just not well known enough, but I think a big factor is that there are other solutions to orphan rules issues that resolve the same problems named impls would solve (like the newtype pattern, which conveniently can be applied to any trait).

4 Likes

I guess it is not because it is not well known enough, but because when designing the trait we didn't even think about it. And adding a name parameter seems to be a breaking change, people hate breaking changes.

For example, is it possible to make Clone defined as the following:

trait Clone<Name=()> {
    fn clone(&self) -> Self;
}

without breaking existing code?

Along this line I tried the following:

//Assuming the trait do have implementable items.
//Otherwise, it is a marker trait and do not need named implementations
impl<T> My Clone for T where T: Clone {
    fn clone(&self) -> Self {
        self.clone()
    }
}
///////////////desugar to///////////////
trait MyClone {
    //repeat the definition of Clone, with all items renamed
    fn my_clone(&self) -> Self;
}
impl<T> MyClone for T where T: Clone {
    fn my_clone(&self) -> Self {
        self.clone()
    }
}

Which sugars

   // use ~ to seperate the prefix
   v.my~clone();

to

   (v as MyClone).my_clone();

Unresolved problems

Some traits are used in special contexts.

  • Index/IndexMut for index access
  • Add/Sub/Mul/Div... for operators
  • Fn/FnMut/FnOnce for function calls
  • Iterator for the for grammar
  • Drop is not explicitly used
  • Deref/DerefMut for dereferencing operation
  • Try for the try operator (? and try block)
  • ... (hopping to be a full list but I am afraid not)

Those use site are special cases and we need a way to specify using the renamed/prefixed instance of traits instead.

#[naming] trait Foo {}

impl Foo for usize {}
impl<T> Foo for T {}

impl Hash for <usize as Foo> {
}

impl<T> Debug for <T as Foo> {
}

how does this look?

in places where it’s relevant, you use it as:

let hm: HashMap<<usize as Foo>, Bar> = ...;

note that the type of the HashMap is HashMap<<usize as Foo>, Bar>, NOT HashMap<usize, Bar>.

for special traits, just:

let x: <usize as Foo> = a;
let y: <usize as Foo> = b;
let z = x + y;

let w = a: <usize as Foo> + b: <usize as Foo>;

We can impl traits for trait objects now, right? So

impl Hash for dyn Foo { .. }

I’d think this achieve soundness similarly by operating on different types, except using monomorphisation instead of the vtable.

I’d propose the name view over naming and

// We must declare supertraits if we want to be able to access them from a view trait
#[view] trait Foo : Deref+DerefMut { } 
#[view] trait Bar : Hash+Eq { }

impl Hash for <usize as Foo> { .. }

let hm: HashMap<<usize as view Foo+Bar>, Baz> = ...;  
// Error:  impl of Hash for usize viewed as Foo has conflicting with impl of Hash for usize viewed as Bar, so view Foo+Bar is not a valid composite view type for usize.

In essence, we’re now just making wrapper types with more convenient delegation here, so maybe some real delegation scheme would be better, not sure.

I’m nervous about wondering syntactically towards first-class module land like this without actually doing much though. It’s obviously useful to explore that territory, but premature stabalization carries bigger risks.

1 Like

True – but if you're in a situation where the orphan rules are an obstacle, you probably don't own the crate that defines the trait in question, or else you'd usually be able to just add any necessary impls there. Thus, you're also not in a position to modify the trait to add a Name parameter.

On the other hand, you could imagine an idiom where crate authors, even if they didn't have a specific use case for crate-local impls, would add a Name parameter to all their traits just to maximize flexibility. But that would come at a significant ergonomic cost. To be fully compatible with alternate impls, anything that uses the trait would have to explicitly carry that parameter around. For example, instead of

struct Foo<M: Monoid> { m: M }

you'd need

struct Foo<T, M: Monoid<T>> { m: M }

That includes "infecting" any traits which have blanket impls for types implementing the first trait. For example, instead of

trait Semigroup { … }
impl<M: Monoid> Semigroup for M { … }

you'd need

trait Semigroup<T> { … }
impl<T, M: Monoid<T>> Semigroup<T> for M { … }

It would definitely be nicer if the compiler could automatically make this transformation for all traits.

1 Like

Named impls would also have to work exactly the same way to be coherent. The Semigroup instance would need to take the name of the Monoid instance in that code in order to to know how to resolve it. Probably you’d want to have some kind of higher rank parameter there, rather than parameterizing the semigroup type class by a monoid instance name. And that has all the implementation issues that higher rank type parameters have in their normal shape.

It also connects to the other big issue: with named instances you throw instance resolution inference out the window because you need to somehow guide the resolver with which named instances it should account for and which it shouldn’t. (This gets pretty hairy when some of those instances might be more generic, and you want two different types to resolve to two overlapping instances).

I don’t like to rule out possibilities, but I would be stunned by a new insight that makes this seem worth doing to me. Rust chose to have Wadler type classes instead of ML modules specifically so that we could have inferred method resolution, and the fact that instances are unnamed & must be coherent is the consequence of that decision.

3 Likes

I don’t really see an issue with automatic, generic newtypes.

1 Like

My vague idea of a design is that the Semigroup instance would take the name of the Monoid instance. You could see it as reifying the T: Trait bound into an 'evidence' parameter, in the form of an impl name, which is implicitly passed by all users – this would apply everywhere that trait bounds are evaluated. Kind of like how Scala implicits are used to simulate typeclasses.

Which doesn't necessarily make much sense for Monoid… but actually, I think Monoid is an unrepresentative example of the most compelling use cases for named impls. I think a more common use case would be just to work around the orphan rules – to allow a crate to impl a foreign trait for a foreign type. In that case, no potentially competing impls would be in scope, so there'd be no problem with instance resolution inference. In a case like Monoid where there are multiple impls in scope, sure, inference would break down, but that's only natural; if I call 2.mappend(3), the compiler can't guess whether I want a sum or product. Instead, I should have to explicitly specify the name at the point of instantiation. On the other hand…

(This gets pretty hairy when some of those instances might be more generic, and you want two different types to resolve to two overlapping instances).

This sounds like a use case that would be better served by specialization, not named impls. Or, if the generic instances are actually disjoint, it could be served by better ways to prove disjointness to the compiler, such as your own "mutually exclusive traits" proposal from years ago.

I envision that at any given instantiation point, if the compiler can't narrow the list of potential impls down to a single name, it should complain and force the user to write the name explicitly. Thus, named instances wouldn't be very useful as a way to manage sets of overlapping impls; but in exchange, they wouldn't be all that hairy either. If you're trying to figure out which impl is being used at a given point in the program, assuming an impl name isn't explicitly specified, it would largely be the same job as today: find an impl somewhere in the crate whose requirements are satisfied for the type. The main change is that you'd have to disregard named impls that aren't in scope. However, idiomatic code shouldn't play games with scopes; normally there shouldn't be any conflicting impls anywhere in the current crate or its dependency tree (again, except if the impl name is explicitly specified).

Though, one potential issue would be with backwards compatibility. If I use named impls to impl a foreign trait for a foreign type, but rely on inference rather than writing out the impl name everywhere in my crate… if one of those foreign crates later adds a normal impl for the same type, my crate will now be full of ambiguities. Arguably that could be acceptable, since adding impls can already be a breaking change today. However, it might be worth having a way to indicate that a local impl should implicitly override impls from imported crates. This would be hairy, but it would be intended for use solely as a temporary measure: the compiler could warn if a conflicting impl was found, and I would be expected to update my crate to either remove the impl (which is probably no longer necessary), or write the impl name explicitly at all usage points.

I do think the original named impl syntax has way too much strangeness cost without any clear benefit. Anyone who wants this should consider experimenting with newtype delegation via impl Trait for dyn Name and .. as dyn Name like I mentioned above.

There are possible advantages to intelligently monomorphising dyn Name, and work on them does not impact syntax, so maybe dyn should be the syntax even if people want to use it this way.

At minimum, if trait Name is private and impl Name for T only exists for one T, then dyn Name should be monomorphized, giving everything this RFC proposes.

For the hashmap situation (and similar cases), wouldn’t it be a good solution to capture the Hash impl in the scope where HashMap’s constructor is called by having an implicit instance parameter in the constructor (the parameter is an instance of trait Hash, it gets filled by the Hash impl that’s in scope where the constructor is called) and then the constructor stores a reference to this instance in a PhantomData member in the hashmap, which will be used by its other methods (insert etc) no matter which potential other Hash impl is in scope at those method call locations.

So, in other words, as I’ve said before, named impls are just a more ergonomic alternative for newtypes?

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.