Making more out of traits


#1

Hi, i’m starting a new thread because the related ones are quite old, but for reference: named instances, traits + ML modules, revisiting modules (this one mostly discusses syntax/usability).

What is this all about: I don’t like the hard restrictions that Rust puts on implementing instances and the bunch of newtype pattern it implies; also I like the ML way of expressing interfaces more.

Comparison of alternatives

Okay so in the field of languages with an expressive static type system there are 2 big contenders for expressing interfaces (a set of operations that multiple types can provide):

  • Haskell with typeclasses. I’m not gonna extend on this, it’s basically the same as Rust’s traits, but an important fact is that this system can’t handle two implementations of an interface for a given type.

    The biggest downside of Haskell-like modules is the fact that you have to ensure you never get two clashing instances but because they are not named you can’t let the user manage them properly: you have to either behave weirdly on imports (see orphan instances in Haskell) or have drastic restrictions on where you can implement an instance for a given interface and type (rust). Also, lots of people say we rarely need two implementations of the same interface for a given type. This is just not true given how much the newtype pattern is used in Rust and Haskell (sure, newtypes are also used for other things but even then), this is especially true for combinations of basic types and basic interfaces like Hash, Ord or Add for types like integers, floats or strings.

  • the ML family has powerful modules that are typed. The core language doesn’t know anything about interfaces but you can express it via a module type item:

    module type Display = sig
        type t
        val display: t -> string
    end
    module DisplayInt: Display = struct
        type t = int
        let display n = ...
    end
    

    On the other hand, I can’t argue with the fact that ML modules are too verbose: you must specify which instance you use every time (and making function which take modules as arguments – Rust calls this virtual dispatch – is quite a lot of pain).

  • Come a third contender to the party: explicitly implicit arguments! This is the path taken by Scala (implicit parameters), Agda (instance arguments) but also Idris has named implementations that look like it and some people on OCaml work on modular implicits which is again the same idea. The definition is ML-style: implementations are named, it boils down to a module conforming to some module-type (the interface) but the usage is typeclass-style. The trick to achieve that is to let the user explicitely mark instances to be added to the local implicits pool which is used to resolve which instance to use in function calls with trait bounds. This pool will maintain the condition that an unique instance should be available.

Surface level

The main thing is that impl items would be named, but there are a few different ways to make this implicits pool happen on the surface level (not exclusive to each other):

  • The marking is done automatically on definition with impl trait_type: Trait for Type {...}. When writing use foo::bar every impl item in foo::bar is added to the pool and there would be an opt-out with use foo::bar hiding disturbing_impl.
  • As python says, “explicit is better than implicit”, so the marking could be done on import with say implicit use foo::bar::useful_impl but this line could also be added in some prelude to make things less painful. There could also be the convention that impl items for a module foo::bar live in foo::bar::impls so that one can do implicit use foo::bar::impls::*;.

Note that in any case, this may also lift the requirement of use-ing a trait when calling a method (not completely sure about that): one would need to implicitely use an instance instead.

There would be a slight semantic change to the meaning of <T as Trait> which would now mean the “the concrete module derived from the local implicit pool” (the same would happen to Trait::method(arg) and arg.method(), with T and Trait just being the inferred from type of arg and from the method identifier as is currently done).

Instances, modules and anonymous traits

In the ML-style, instances are plain modules, but in Rust (unlike Haskell), every trait has one special parameter Self so instances are tied to a particular type. It may be valid to drop that but I don’t think so because almost every typeclass is used to describe an interface and interfaces always have this single special Self parameter. So I’m suggesting to keep current modules like they are (namespaces) and only make impl items more like ML-modules (but this can be interesting to discuss).

There is still one special thing to worry about: anonymous traits. What I call an anonymous trait is the unnamed trait in the current impl Type {...}. A good idea might be to have Type as a magic name for the anonymous trait associated with Type, to have the Type::method syntax. I think that these impls should still be named, because apart from the fact that the trait definition is inferred and unnamed, they are impl items. This would enable to get rid of the TypeExt (SliceExt, FileExt) pattern for which you define a trait that is ever only implemented for one type.

Misc

  • Trait objects should be left unharmed by this (just to mention I considered it).

Quick mockup

I don’t care about the syntax/keywords here (you may like use impl better), it’s just to make it clear how it may look like for people that never encountered this kind of implicit thing:

trait Foo<T> {
    fn bar(&self) -> Option<T>;
}

struct Baz<T> = { data: i32, more_data: T }

mod impls {
    impl foo_baz<T: Clone>: Foo<T> for Baz<T> {
        fn bar(&self) -> Option<T> {
            Option::Some(self.more_data.clone())
        }
    }
}

mod third_party_crate {
    use super::{Foo,Baz};
    
    // this is not currently added to the implicits pool
    impl my_foo_baz<T>: Foo<T> for Baz<T> {
        fn bar(&self) -> Option<T> {
            Option::None
        }
    }

    // this should fail because the implicits pool is empty
    fn test_1<T>(arg: &Baz<T>) -> Option<T> {
        arg.bar()
    }

    // use the foreign one by implicit use
    fn test_2<T: Clone>(arg: &Baz<T>) -> Option<T> {
        implicit use super::impls::*;
        arg.bar()
    }

    // use the local one by implicit use
    fn test_3<T>(arg: &Baz<T>) -> Option<T> {
        implicit use my_foo_baz;
        arg.bar()
    }

    // this lets the caller choose the impl
    fn test_4<T, U: Foo<T>>(arg: &U) -> Option<T> {
        arg.bar()
    }

    // explicitely overriding the implicit resolver
    fn test_5<T: Clone>(arg: &Baz<T>) -> Option<T> {
        implicit use super::impls::*;
        test_4::<T, my_foo_baz<T>>(arg)
        // same as my_foo_baz::bar(arg)
    }
}

Please make comments, note that this is not even a pre-RFC, i just want to know if the ML-cheerleaders back me up here, why you think this is bad/difficult (because similar things have already been discussed, even if not in this way afaik) or what awesome usecase you would have for such a feature.


Parameterized Modules
#2

This idea has been brought up several times before, but it is not coherent. Coherence is critical to Rust’s use of traits: for example, if you use different Hash impls in different functions, a HashMap passed between them will not behave reasonably.

This decision has downsides, but it is baked into Rust at this point.


#3

Just dropping this here…

(there was also a blog post I remember reading that nicely summarized how coherence and two other desirable properties of interface implementations are mutually incompatible, and did a survey of which property is sacrificed by various languages… but now I can’t find it. :V)
(oh, I was thinking about this, and I apparently entirely misremembered everything about it)


#4

It’s not that it isn’t coherant (i do put more trust in OCaml’s type system than Rust’s even if i know that you people do things right here!), it’s just that it would have a slightly different type. More precisely the particular instance would be contained in the type.

I hinted at that in my mockup, but we can see a trait bound T: Trait as being an instance not just a type, thus the instance(s) satisfying the bounds would be baked into the type. For example struct HashMap<K: Eq + Hash, V> = {...} would contain in the type the particular instance that was used because it’s “kind” would more or less be like HashMap: (T: type) -> Eq<Self=T> -> Hash<Self=T> -> type -> type. Note that usually the type of hash maps in ML is exactely that: it has only one parameter for the values and sits behind a functor taking what we would call here an implementation of K: Eq + Hash, ie it would be called 'value HashMap(TheInstance).t. You may say this is a dependent type: it is and does nothing to hide it, but every function with trait bounds already is (at least before the monomorphisation occurs afaik).

Edit: I think the last sentence is wrong (at least in Rust, probably not Haskell) but it’s not that important anyway.

Also I’m quite aware that this is probably not the kind of thing that will ever change (this is also why i said this is not a pre-RFC) so let’s make the context here clear: Please comment as if Rust was a new language not already implemented! :wink: I just want to talk about if Rust could have made another decision, if you think this is a better one, what implementation problems do you see, etc. One day I will be tired of thinking about Rust’s design (while legitimately not being able to change anything) and i’ll go bother @ubsan and his café.

Note that as a side remark on the real Rust, my point about baking the implementation into the type is already what I see as a good practice: reflecting trait bounds that are necessary for any implementation on the struct’s or enum’s type parameters (Copy or whatever bounds that enable to write specialized efficient impls are not in this category because you can implement the core functions of your type without them).


#5

You might be thinking ‘coherence’ means ‘soundness’, but this is not how we are using the term. ‘coherence’ in this case means that the rules we define need to resolve to ‘a single canonical instance’ globally, no matter how modules, crates, or imports are composed. Modular implicits in Ocaml are sound, but are not coherent, although they do possess much simpler and easier to understand behaviour with regards to implicit resolution than other non-canonical resolution systems, such as Scala’s.

The ‘hash-table problem’ is covered in the modular implicits paper, on page 53, under the heading: 4.4 An alternative to canonicity as a feature. This would probably require considerable reworking of the HashMap/HashSet API to work in Rust…


Edit: oh woops - missed this!


#6

Oh then you’re right: my very goal was to propose something incoherant! That’s the restriction that I don’t like!

I’m not even sure of this, upon reflexion. If the language gives up global coherance, to remain sound I think it is sufficient (and necessary) to desugarize type bounds in function signatures or in type parameters as being plain instances of the said bounds.

Just to make things clear I suppose we have a function + on instances that merges the two instances so that if Inst1: Trait1 and Inst2: Trait2 then Inst1 + Inst2: Trait1 + Trait2 and also I’ll use Trait<Self=T> to have a syntax a bit more logical for constraining the implicit Self parameter.

So naming things completely (remember, you can specify things like this but you could also implicit use and have the same lightweight code as before with local canonicity), a particular instance of the type HashMap<K: Eq + Hash, V> could be HashMap<String::eq + String::hash, V> where String::eq: Eq<Self=String> and String::hash: Hash<Self=String> are the named instances of the traits for String.

The thing is that it changes a bit the story for function, because the current syntax:

fn foo<K: Eq + Hash, V>(&K, &HashMap<K, V>) -> Option<&V>

would actually under the hood mean

fn foo<K, V, eq_hash_k: Eq<Self=K> + Hash<Self=K>>(&K, &HashMap<eq_hash_k, V>) -> Option<&V>

Notice how because K has bounds, the meaning of K on the right hand side is changed to be an instance iff the place where it is has bounds too. The first syntax is really more succint and I don’t think it’s that bad to have it mean something slightly different, we can still think of K as being a type bundled together with the instances satisfying the bounds. Maybe the error message for bound fail would need a slight rephrasing to satisfy expert users (from the trait Foo is not implemented for Bar to no instance of Foo found for Bar).

I never really touched to the rustc codebase so I don’t know how this would be doable (here I’m still in the context of would it be doable, not do we want to do this), but it seems to me that this is just a matter of uniquely naming instances (just like types). There would be a few things to look after when coding the equality judgement between instances – the paper modular implicits has a section on this p.21 – mainly that if you include/extend an instance Inst1 of Trait1 to an instance of Inst2 of Trait2: Trait1, the inner part of Inst2 pointing at Inst1 should still be equal to the original Inst1. Anyway if done carefully they say a simple definitional equality is sufficient so this doesn’t seem unsurmontable.


As a side remark, I wonder why the definition of Hash isn’t trait Hash: Eq { ... }, in the language I suggest this would definitely be required (because you can’t just compose any Hash and Eq, they have to be compatible and you can’t be compatible with every possible instance of Eq), but also in the real Rust I can’t see why there would be a valid reason to have Hash without Eq (if you want a weird Hash without garantees you can just have &T: Into<u64>).


Also thanks for linking the modular implicits paper I forgot to provide links, I updated the OP with links to other relevant papers too (scala, agda).


#7

Personally I’m pretty excited about OCaml-style modular implicits, especially if they were added to a 1ML-style module system, but it remains to be seen how well they actually work in practice. I know many Scala programmers are immediately allergic to the idea of anything called ‘implicits’ due to the usability nightmare of the language’s implicit resolution rules.

Unfortunately Rust’s trait system is already complex enough as it is, and unless a radical proposal such as does much to simplify the underlying concepts I would be hesitant to support it. I would look into @nikomatsakis’ work on Chalk to see if this slots in nicely with those plans (be sure to familiarise yourself with the blog posts).