Traits + ML modules


#1

The basic idea is traits/type classes and modules a la ML both have their advantages, and it is possible to have the best of both worlds. I choose to put this here rather than RFC as it is not concrete enough, and I’d like some feedback before I make my first RFC. If this is “cheating” or otherwise frowned upon, my apologies.

First, let me link www.mpi-sws.org/~dreyer/papers/mtc/main-long.pdf where this idea basically comes from. It seems like Ocaml Labs is currently working on adding this to OCaml under the name “Modular Implicits”

Second, I believe the changes proposed by @glaebhoerl in https://github.com/rust-lang/rfcs/pull/135#issuecomment-47713550 are a prerequisite to any progress in this arena.

To summarize the paper: Traits are modular signatures, and trait implementations are modules. The key to type classes is the coherence rules mean that there is exactly one implementation per Sig, so that the compiler can route the required instances to functions that need them (and in Rust’s cases inline/monomorphize everything). ML modules and traits are are combined by “separat[ing] the definition of an instance from its adoption as a canonical instance”. Programmers can create as many impls as they want, but they only offer one per scope to the compiler for propagation.

My only interesting change to what they propose in the paper is to represent traits today as Type -> Sig, rather than Sig. This better distinguishes between “input” and “output” types, which are / should be a keep part of the coherence definition and instance resolution algorithm.

Why add to Rust?

I can think of two reasons:

The first is with things like libstd facade, and https://github.com/rust-lang/rfcs/pull/185. The current system of putting undefined declarations in crates IMO makes for somewhat subtle-looking code in comparison to ML functors. The way errors aren’t caught until some sort of rlib link time also reminds me of one of the most obvious disadvantages of templates vs traits, in that one isn’t aware of any problems until “downstream” in build process.

The second is for allocators. As far as I can tell from trying to find previous discussion of the topic is basically sometimes one wants to statically define what allocator is used, e.g. Box<T, LibcMalloc>, and sometimes one wants it to be determined dynamically, with the allocator vtable, and perhaps some extra allocation metadata, as extra fields. Of course nobody wants to have to duplicate their allocator code to support both use-cases.

This sounds a lot like the problem trait objects solve, and I thought a while about how trait objects could be used to hack this together with things like https://github.com/rust-lang/rfcs/pull/9, and defining a bunch of unit types for each allocate instance. But everything I could think of seemed like hacks, and hacks that raise the bar for the implementation of allocators and result in lousy errors.

A better solution in my mind would be to define allocators as a parameter-less trait (Sig, not Type -> Sig like single - parameter traits). Then, give rust a better notion of existential types: exists (<ID : SIG> | <ID>) TYPE. Normal “object types” desugar / are rewritten as exists <T> (some-trait<T>, T). Boxes with dynamic allocators are defined something like type DynAllocBox<T> = exists<A : Alloc>Box<T, A>.

Applying trait impls with <IMPL> makes sense to me as a), trait implementations like types are “applied statically” in that all functions are monomorphized over trait and type arguments already, and b) in the conventional dependent type modelling of ML sigs as types in a higher universe, both sigs and and trait implementations are technically types. In a language like Rust, TYPE, each trait, (and arguably LIFETIME) are kinds. SIG itself would be a “super-kind”.

Lastly, let me add that I have much more Haskell than ML experience. I am sure somebody who has used ML modules systems more than I could enumerate many more advantages they would bring to Rust.

Why now?

As far as I can tell the trait coherence rules only depend on the adaption of implementations as canonical implementations, so this could be done after 1.0 in a backwards-compatible way by adding a new syntax both for canonicalization, and definition without canonicalization. However I think this change would have far reaching implications in the way the libraries and interfaces are designed, so would be best do do this before 1.0.

Alternatives

Some other system could be devised where functors were crate -> crate or rust module -> rust module instead of trait -> trait. Or do nothing of course.


Making more out of traits
#2

Which undefined declarations? Which errors exactly? As far as I know, our metadata should ping any missing functions.

I think this can just be handled by implementing the Allocator for Allocator trait objects, so one can just store Box<T, Rc<Allocator>>.

There’s no mention of object sizes here; Rust doesn’t have a uniform interface for values like OCaml or Haskell, so extential types need to be explicit placed/stored behind a pointer. I.e. this would have to be Box<exists<A: Alloc> Box<T,A>> (or something).


#3

Let me address each in a post.

Granted it has been a few weeks since I last was messing around with and https://github.com/pczarn/rustboot so the state of messages and what not could have changed. As far as I could tell with the current system it is assumed once the crates are linked the missing pieces will be provided, so each crate will happily compile.

With the ML system you need to explicitly apply the functor to the module, you can’t get the functions at all. Basically it is turning a link error into a type error. This gets most interesting if you want to put impls in e.g. libnative and libgreen to make sure they implement some sig in a third crate. Right now the only way I know of keeping two crates in sync is to make some dummy crate that uses everything they should have in common and try linking both ways.

http://www.openmirage.org/blog/ has some good entries on how their module hierarchy is designed for their vm exokernel. I’d like to organize any rust exokernel code I write similarly but I am not sure the current system is capable of making such a design explicit.

(note I am using impl in the sense of ML module, which in my proposal subsumes the current notion of impl.)


#4

Hmm. I had seen that before, and I can’t think of any problem with it now. Feeling a bit silly. That doesn’t give any way to put local state with each allocated item and support global allocator state, but multi paramter traits would solve that.

I don’t think sizing is too big of a problem. On the other hand, trying to limit the amount of ridiculous things one can write is:

enum List<A : Allocator, T> 
{
    Node(T, Box<A, List<A, T>>),
    Nil
}

Type Huh<T> = exists<A : Allocator, Box<A, List<A, T>

The idea is behind this seems reasonable, a list of T where ever node is allocated the same way, and thus the vtable is only stored with the outer existential. But code gen is a bit weird. If one can destructure, and recur on the list freeing as they go, how does the allocator get passed through the function? If one can’t destructure, then the only thing one could do would be deallocate the whole list I guess, not very useful.

I’d say forget the existential types part of what I proposed if allocators are fine without it. That can be done later and it makes this a bit less gargantuan a change.


#5

Oh, so you’re talking about language items, giving errors like

#![no_std]

extern crate core;

#[start]
fn main(_: int, _: *const *const u8) -> int {
    0
}
error: language item required, but not found: `begin_unwind`
error: language item required, but not found: `stack_exhausted`
error: language item required, but not found: `eh_personality`
error: aborting due to 3 previous errors

FWIW, this is actually a proper compile time error: the compiler is accurately detecting exactly what needs to be provided by the current program; the fact it’s not declared in a signature anywhere seems like a relatively minor point, compared to just focusing on it as a compiler diagnostic/documentation issue (i.e. it could be described as “using crate core requires the language item … to be defined”).

If this isn’t what you mean, could you provide a concrete example of the code and the error messages?

I don’t know enough OCaml to get a lot of value out of that blog (I don’t even know if I was reading the posts you were hinting at…), but it seems like you may be able to at least get the “sense” of it via traits/generics.

Why not? Either you can have each allocator argument be something like MyAlloc { local: ..., global: Rc<Allocator> } (possibly generic, MyAlloc<Rc<Allocator>>), or store the local state opaquely/internally inside the trait object pointer. (Maybe I’m misunderstanding… I also don’t quite understand how multi-parameter traits relate.)

I was careful to avoid saying it was a problem, because it’s not; it just needs to be explicitly mentioned and handled when talking about existentials. In any case, the current plan for how Rust will handle this is DST (mostly implemented, AIUI).


#6

An allocator primarily needs mutable global state. I’m anticipating this usage:

#[deriving(Clone)]
struct MyAlloc {
    state: Rc<RefCell<MyGlobalState>>
}
impl Allocator for MyAlloc { ... }

#7

Huon: So as far as I know, there are 5 kinds of vaguely import-like mechanism one can use in Rust:

  • extern crate
  • use (module)
  • lang items
  • extern function declarations
  • trait implementation

lang items shouldn’t be used in normal code so we can ignore those.

Use and extern crate mean exactly one satisfying module or crate can be used as a dependency within a program. More implementations can be made with conditional compilation, but that doesn’t work for instantiating the dependency different ways in one program.

trait implementation isn’t normally thought of as a import mechanism to my knowledge, but it does fit the bill in that by defining a few things local, one gets access to functionality defined elsewhere. Also it is the only one that allows multiple instantiations in that there can be multiple impls.

However what does one do if they want multiple instantiations where the functions to be imported have no common argument (that could be used to dispatch on for the trait). The best that can be done is add an argument of the same type to all of them and make a trait around that, with the idea that that each impl would involve a new zero-width type.


#8

type DynAllocBox<T> = exists <A> (Alloc<A>, Box<T, A>) with selfless traits/Alloc, which IMO is much cleaner.


#9

If you want to allow for dynamically-dispatched allocators, then allocators become non-zero-size (because of the vtable), and you do need to pass the allocator around as a parameter (because there are multiple allocators with the same type).