Blog series: Lowering Rust Traits to Logic

So I wrote some stuff about traits: =)

http://smallcultfollowing.com/babysteps/blog/2017/01/26/lowering-rust-traits-to-logic/

Over the last year or two (man, it’s scary how time flies), I’ve been doing quite a lot of thinking about Rust’s trait system. I’ve been looking for a way to correct a number of flaws and shortcomings in the current implementation, not the least of which is that it’s performance is not that great. But also, I’ve been wanting to get a relatively clear, normative definition of how the trait system works, so that we can better judge possible extensions. After a number of false starts, I think I’m finally getting somewhere, so I wanted to start writing about it.

In this first post, I’m just going to try and define a basic mapping between Rust traits and an underlying logic. In follow-up posts, I’ll start talking about how to apply these ideas into an improved, more capable trait implementation.

I'm opening this thread to talk about the post and any questions that may arise. I hope to be writing more posts covering other things; I'll re-use the thread in that case.

28 Likes

To elaborate on my pithy reddit comment:

Isn't your notion of equality through normalization equivalent to Homotopy Type Theory's Univalence Axiom, which treats equality and equivalence as...equivalent?

While you are approaching things from a different foundation, I suspect that some implications of the univalence axiom are relevant.

1 Like

Your parallel to LLVM IR brings to mind the issue of quality error messages. LLVM proper does not generate any errors or warnings; they all come from the language front-end. In the case of Clang, this leads to a fair bit of duplicated functionality such as control flow graphs and dominator trees which are computed both in Clang and LLVM.

LLVM can’t produce good diagnostics because it is working on a lower level representation, and it is not always clear if a construct comes directly from user code or a previous transformation. For example, some Clang warnings are disabled for code that results from expanding a preprocessor macro, and LLVM has no idea what the preprocessor is.

It’s important that a type checker can give better error messages than “No!”, so I think it is worth considering early on how such a solver might communicate with the user.

As an aside, LLVM can produce one error: “Ran out of registers during register allocation”. It means that you wrote inline assembly with a set of operand constraints that the register allocator couldn’t solve. Clang can’t type-check the constraints early because the formal definition of correct inline asm is “whatever works”, and it depends on many obscure code generation flags. It’s not a great user experience.

6 Likes

The relationship between the IR and the source code seems much more direct than in the case of LLVM IR. For example each impl’s well-formedness corresponds to one Horn clause, and each bound it depends on is one predicate. In fact, because this is specifically an IR for typechecking, the fundamental primitive is really a representation of a potential type error.

Its an important thing to keep in mind when designing the system though.

Very nice! Having official formal rules about the trait system should be a great starting point for reasoning about its abstract properties, and a stand-alone reference implementation may even be amenable to program verification - certainly more than any part of rustc itself.

Regarding equality modulo reduction (or "computation"), that would be the standard definitional/judgemental equality in Martin-Löf type theory. A quick example in Lean:

class Clone (Self : Type) :=
-- some part of Lean gets confused if you don't use the class parameters at all
(clone : Self → Self)

instance : Clone ℕ :=
{ clone := id }

class Iterator (Self : Type) :=
(Item : Type)
(next : Self → option (Self × Item))

structure IntoIter (A : Type) : Type

instance (A : Type) : Iterator (IntoIter A) :=
{ Item := A, next := λ x, none }

structure Enumerate (T : Type) : Type

-- `[Iterator T]` will be resolved by typeclass resolution
instance (T : Type) [Iterator T] : Iterator (Enumerate T) :=
{ Item := ℕ × Iterator.Item T, next := λ x, none }

-- we can explicitly ask things to be reduced...
eval Clone (Iterator.Item (IntoIter ℕ)) -- "Clone ℕ"
-- or rely on implicit reduction, such as during typeclass resolution
def f (it : Iterator.Item (IntoIter ℕ)) := Clone.clone it

(or play with it here)

Now, if you also want propositional equalities to be respected, your type theory must be extensional, such as the one in NuPRL. Which I know absolutely nothing about. If you are stuck with an intensional one.. well, you'll have to get creative. Lean does, however, feature a novel implementation of congruence closure in its proof automation.

For a nice overview over the different kinds of equality, see also this article.

7 Likes

Ah, I long had a nagging suspicion that the (current) MIR, as a core language for the perhaps the operational semantics of Rust, was only half the battle, but I struggled (and still struggle, I’m not at all sure “operational semantics” is what I’m looking for) to express that thought clearly. I’m glad @nikomatsakis evidently agrees, and a solution is on the way!

As an aside, I’d be curious to know if anyone has suggestions for related work around this area of “customizable equality”. In particular, I’m not aware of logic languages that have to prove goals to prove equality

Indeed, as @tupshin mentioned, this topic is covered by Homotopy Type Theory where type equality is the real cornerstone. I am strongly advising you to have a look.

By the way, the book is freely distributed on the official site: https://homotopytypetheory.org/

1 Like

I think that the encouragements to look at Homotopy Type Theory (HoTT) are well-meaning but rather off-topic. What you are looking for for associated types is type-level computation, in a form that has been formalized in various ways in programming language theory (for example ML module system’s research uses singleton types to compute with type definitions) and is essentially unrelated to the higher-dimensional nature of HoTT. Equality in general is a deep problem in type theory, and HoTT has a lot of exposure right now, so it is natural for people to think of a relation, but I don’t think there is any here. (In technical terms, F-omega already has a type-equality relation in its type system that does type-level computation, while it doesn’t have any notion of propositional equality.)

On the other hand, my comment on reading the blog post was that there was a bit too much Prolog and a bit too little type theory, in the following sense: you are not only solving a proof search problem, you are building a program fragment that has computational meaning – the execution behavior of your program depends on the fragment fragment that is inferred during trait elaboration

By elaboration, I am referring to the idea of transforming the source code that people write into a more explicit form in which the calls to trait-bound functions are replaced/enriched/annotated with the path of the specific implementation of the function that is actually going to run, and calls to methods with trait assumptions on their generic parameters are enriched/annotated with an extra argument carrying the specific implementation of the trait for the call’s instance. (This does not mean that trait dispatch has to be implemented this way in practice; you can always explain the specialization as inlining and specialization of these extra explicit arguments.) In particular, while the code before the elaboration does not satisfy the type erasability property (if you erase all information about types, you cannot execute the program anymore), the explicited program after elaboration has a type-erased semantics that is faithful to the language semantics. Understanding this transformation pass is important because (1) having an intermediate form that permits type erasure is an interesting property of your language (it tells you the minimum information a programmer needs to understand to accurately predict program execution) and (2) this intermediate form is a good way to think about human-program interactions in an IDE: it is precisely what you want to show (under some nice syntax) when a programmer wonders “what is going to happen with this trait-bound call here?”, and understanding the syntax of this elaborated forms also helps you write better error messages when elaboration fails (because there is an ambiguity, for example).

If you are lucky, you can design this elaboration in such a way that the post-elaboration program are expressed in exactly the same syntax as the pre-elaboration program that users wrote. This property trivially holds of Scala’s implicit parameters (an implicit parameter is just a value/term that the user could have written themselves), but it is a bit less obvious of Haskell’s type classes or Rust’s traits. Using the Haskell syntax that I am more familiar with, you could elaborate something like:

class Eq a where
  eq :: a -> a -> Bool

instance EqInt where
  eq n1 n2 = ...

instance Eq a => Eq (List a) where
  eq [] [] = True
  eq (x:xs) (y:ys) = eq x y && eq xs ys
  eq _ _ = False

discr :: Eq a => a -> a -> Int
discr x y = if eq x y then 1 else 0

assert (discr [1] [2] == 0)

into the more explicit transformed program

type eq a = (a -> a -> bool)
eqInt :: eq Int = ...
eqList :: eq a -> eq (List a)
eqList eqA [] [] = True
eqList eqA (x:xs) (y:ys) = eqA x xs && eqList eqA xs ys
eqList eqA _ _ = False

discr :: eq a -> a -> a -> Int
discr eqA x y = if eqA x y then 1 else 0

assert (discr (eqList eqInt) [1] [2] == 0)

Notice that:

  • a class/trait becomes the type of a value returning its operation(s): the class Eq a becomes the type eq a of comparison functions
  • a ground instance (Int is in the class Eq) is turned into a value of that type, eq Int
  • a conditional instance (if a is in the class Eq, so are lists of a) becomes a function from values of type eq a to values of type eq (List a)
  • the call discr [1] [2] in the source version has a behavior that depends on “which definition of the equality will be chosen”. In the explicit version, we can easily describe which definition has been chosen by a source term: (eqList eqInt). I could show this to the user; if there was a conflict with two possible definitions, I could show both sides to explain the conflict.

The prolog rules presented in the blog post explain whether there exists a solution to this elaboration problem, but not what the solution is. There are two ways to describe the relation between these two questions:

  • if you are familiar with logic programming and proof theory, you will say that the proof term(s) implicitly generated by the Prolog search has exactly the structure of the elaboration witness you are looking form; the only thing you have to do, after search succeeded (or possibly found two candidates, for example), is to transform these proof derivations into terms
  • if you are familiar with functional programming and type theory, you will say that you are actually doing program search, searching for a program (of type eq (List Int) in my example), and that the reason this looks like Prolog is the Curry-Howard isomorphism (the type system of your elaboration terms also describes a logic).

Of course, none of these two views is intrisically better than the other: they are both perfectly correct, and it is interesting to keep both perspectives in mind. Which one matters depends on the specific problem you are asking about the system you are studying. (I think this is important to point out because the original blog post is not balanced in this respect, it only presents the logic-programming view of the problem.)

Here are some reasons why I think also working out the type-theroetic views (that elaboration witnesses are terms that we are doing a search for) is important:

  • The terms in this story really influence the runtime semantics of your programs, there are not just some formal objects whose existence matters but whose precise details can be ignored (as is common in logic, and sometimes is the perspective on Prolog derivations).
  • You will want to show those terms to the user, and explain the behavior of the search by using them. (This is especially helpful if the terms can be expressed in Rust itself, as you end up talking about program search in the language the user knows about, and not about programming in a completely different logic language.) This is important for debugging, and also for understanding the compilation, optimizations opportunities, and performance models of the programs – which matters to the Rust community.
  • If there are cases where you cannot enforce consistency of trait resolution (there locally are two distinct solutions to a trait elaboration problem), the term view is essential, I think, to reason about what should be done about it. For now Rust has fairly strict restrictions on trait declarations to avoid inconsistency issues, but those are in tension with modularity, so you will get requests to relax them and may end up having to tolerate some amount of inconsistency as the design evolves.
  • Finally, and I think this is maybe the most important point, the equalities that you are concerned with are the equalities between the terms and the types of your language. Equalities between terms matter to understand when two elaboration paths are in fact equivalence (and thus unambiguous), for example Eq Int compared to Ord Int => Eq Int in Haskell (the witness of the latter resolution path is equivalent, as a term, to the witness of the former resolution path). Equalities between types matter as part of the type-directed search for witnesses. The blog post explains that trait queries are lowered to propositions in some unspecified logic whose propositions are given a somewhat Rust-like syntax; but what really happens is that the goals of elaboration precisely use Rust types, and the equalities are the equality relations in the Rust type system.

If you see trait search as elaboration into an existing logic, you will be able to apply existing proof search techniques exactly unchanged, but every desirable property of your design will have to be checked through the encoding from types to logic propositions, and from logic proof terms to programs or elaboration witnesses. If you see trait search as term search in your existing type system, you will have to re-interpret existing proof search methods inside a slightly different setting, but the terms and the types manipulated will be those that users know and think about. You need to combine both views to productively think about the problem.

6 Likes

Heh, I'm glad you think so, because to be honest I know absolutely nothing about homotopy type theory and I don't relish reading into another big area of work. :stuck_out_tongue:

Yes, so -- to start, I agree with everything you wrote, for the most part. =) I started out taking the "logic-centric" view that you described, where the proof tree defines the program semantics. But I've since shifted gears into a different approach.

The key is that, in Rust, every (monomorphized) function definition has a unique type. This is true of closures (of course), but also of function items. These types are zero-sized, but they can be "reified" into a function pointer. You can see that in the following program:

use std::mem;
fn foo() { }
fn main() {
    let f = foo;
    println!("{}", mem::size_of_val(&f)); // prints 0
}

The only time that it trait resolution affects program semantics is basically when it decides what code will execute. For example, when I call x.clone(), we will take the (monomorphized) type of x (let's call it X) and then find the appropriate version of clone for X. In my view, this is just associated type normalization.

In other words, you can view the impl of Clone as supplying a (special) associated type clone which is guaranteed to point to one of these unique function types after normalization (this could be modeled as a normal associated type with a further bound, if you cared to do so). My point is that "resolving" x.clone() to a specific bit of code, in this view, is basically equivalent to normalizing the associated type <X as Clone>::clone (where X is the type of x).

So let's say there is a normalization predicate: <X as Clone>::clone ==> T (as, indeed, there is). In this case, it doesn't matter what the proof tree is; if I can prove the normalization judgement, and because I know that T will be some unique function type, I have figure out what code to call -- I just have to reify T into a function pointer. Of course all of this relies on coherence to ensure that, indeed, every normalization has a specific and unique outcome.

(For trait objects, the way I view is that there is an (implicit) impl of Trait for the type Trait. That implementation is fixed to extract a pointer out of a vtable and call it.)

What this means for the proof search is that we don't mind if there are multiple proof trees, so long as they all prove the same judgement (i.e., they all normalize <X as Clone>::clone to the same type T). But we have to be very careful for those cases where we infer some parts of the judgement from others. As an example, if you have only impl Foo for i32, and you ask us to prove ?T: Foo, we will infer that (a) it is true and (b) ?T = i32 must hold. It would be bad if we did that even in the case where there existed more impls, since then we'd be arbitrarily deciding on a semantics for you. But in any case this process of inferring bits of the judgement is exactly the elaboration you were talking about, I believe.

Does that make sense to you?

1 Like

So Rust’s traits have always felt less type-theoretic to me too. I think a big reason is that in Haskell dictionaries often do get passed around so the proof tree does matter computationally even if coherence guarantees that the evaluated value is unique. Now, I’d like to really be able to explicitly work with vtables and existential types in Rust someday, but until then, yes, the approaches here will work.

BTW, the conversation for https://github.com/ghc-proposals/ghc-proposals/pull/32 is very interesting and somewhat tangentially relevant, and I do recommend it.


@nikomatsakis [hope this isn’t veering off-topic too much.] The function types approach does work, but IMO ultimate doesn’t feel completely satisfactory because working with singleton-type proxies alone is quite indirect. Thinking back to https://github.com/rust-lang/rfcs/issues/1037 what if every function singleton type inhabited a trait like

trait Fn<'a, Params, Ret> where Self::Size = 0 {
    const ptr: &'a TrueFn<Params, Ret>; 
}

Where 'a would always be 'static today, and TrueFn is a custom Meta = () unsized type. One thing I like about this is it puts tupled parameter types “at the root”, which I think is good for any future variadics work.

Not sure what to do for unsafety and ABI, but heavy solutions do exist (e.g. Haskell’s data promotion for an ABI enum).

So Rust's traits have always felt less type-theoretic to me too

Actually, logic programming can be very type-theoretic. Twelf is a great example, in this language propositions are literally types and programming is specifying types.

See Introduction to Twelf and the Twelf wiki for more details.

1 Like

After further reflection (and a lot of experimentation) on Rust’s trait system I believe its logic ends up being very similar to that made possible by Scala’s Path Dependent Types, and that there are enough similarities that it would be worth reviewing http://lampwww.epfl.ch/~amin/dot/fpdt.pdf as some very relevant prior art.

1 Like

New post on how chalk handles associated type normalization.

7 Likes

On the applicative/projective bikeshed: “applicative projections” are quite an odd animal, because they aren’t necessarily distinct to other applicative projections. IIRC the term for non-projection applicatives is “nominal types”.

1 Like

In Haskell, I think they are called generative type functions.

Definition (Generativity). If f and g are generative, then f a ∼ g b implies f ∼ g.

Source: R. Eisenberg, Dependent types in Haskell: theory and practice, §4.2.4, p. 58

Dead link

Indeed. Unification gives a negative answer -- but just because A = B cannot be proven does not mean that A != B can be proven. (Relevant code in chalk here.) The same is true of skolemized type parameters.

In terms of the "simplified grammar", at least, it may be worth making a category for "normalized projection" and skolemized type parameters that is distinct from other nominal types. I was debating about presenting it that way, but I couldn't think of a good reason to distinguish them.

Huh, yeah, in retrospect that seems like the obvious term to use. =) Not sure why I didn't adopt that, I think because my head was in the "classifications of functions" space (e.g., injective etc) and I wasn't really thinking about terms for types.

1 Like

Yeah, so I was using the term "generative" at one point -- in specific, in the congruence closure code I was writing -- but it didn't seem quite right to me here. I think because I had in mind applicative vs generative functors. If you then view Vec as a module, it acts more like an applicative functor (that is, Vec<T> = Vec<U> if T = U). Anyway, generative seems like also an ok term, but I think @arielb1 is right that the best term is nominal type, which has no other use.

So having thought about it, I think that probably the best way to think about things is that there are four categories of types:

  • Unnormalized types
  • Nominal types (e.g., Vec)
  • Skolemized types (e.g., T within a generic function, or T::Item when we can’t normalize in any other way)
  • Existential variables (e.g., ?T)

The unification procedure will normalize unnormalize types when it encounters them. Normalizing a nominal type has no effect (and we can “eagerly” do it). Normalizing a projection works as I described, but the fallback is to a skolemized type (both skolemied and nominal types fall into the “applicative” category).

The difference between skolemized and nominal types is slight, and has to do with the amount of certainty we get in a negative result. In terms of the yes-maybe-no results that @aturon discussed in his post on negative and modal logic, if you have some skolemized type T, then T = U (where U is some other type) is not provable, but it yields a “maybe” result. In contrast, Vec<T> = Rc<T> yields a “no” result. (This is not how chalk currently behaves, actually, it will give a No result for both; this is because I had not imagined the Maybe results being used in quite this way, and instead imagined a distinct A != B predicate – I’m not quite sure how this all works out, but I think @aturon’s approach will wind up more compact.)

Where does Vec<T> = Vec<U> fit on that nominal/skolemized line? i.e. can Vec be nominal while the T inside is still skolemized?