Blog post: Maximally minimal specialization: always applicable impls

nikomatsakis · February 9, 2018, 8:44pm

Promising idea for specialization -- now tell me why it's totally broken.

http://smallcultfollowing.com/babysteps/blog/2018/02/09/maximally-minimal-specialization-always-applicable-impls/

I've been thinking about the upcoming Rust Epoch. We've been iterating over the final list of features to be included and I think it seems pretty exciting. But there is one "fancy type system" feature that's been languishing for sometime: specialization. Accepted to much fanfare as RFC 1210, we've been kind of stuck since then trying to figure out how to solve an underlying soundness challenge.

As aturon wrote, I think (and emphasis on think!) I may have a solution. I call it the always applicable rule, but you might also call it maximally minimal specialization.

Let's be clear: this proposal does not support all the specialization use cases originally envisioned. As the phrase maximally minimal suggests, it works by focusing on a core set of impls and accepting those. But that's better than most of its competitors! =) Better still, it leaves a route for future expansion.

jmst · February 9, 2018, 9:31pm

Why not just let specialization depend on lifetimes, and implement it by preserving lifetimes until monomorphization? (and maybe doing monomorphization earlier)

That doesn’t require any cleverness, is simple to understand and more expressive than not doing so.

The rules outlined in the post might work, but they don’t seem particularly intuitive (esp. for people who never thought about type theory, which are probably the vast majority of Rust users) nor flexible (requiring the !Clone -> !Vec<Clone> annotation and the specialization_predicate annotation to be already present on traits defined in other crates to be able to do some specializations).

Of course, making lifetimes matter at run-time might not be great, but doesn’t necessarily seem to be a problem in practice (especially if specialization is used sparingly and for optimization and not to cause radically different behavior).

That said, if implementing these rules is easier than implementing lifetime preservation until monomorphization, it might be sensible to start with this approach since it’s always possible to remove all restrictions later with the lifetime preservation approach.

nikomatsakis · February 9, 2018, 9:32pm

In retrospect, I suppose maximally mimimal was overstating the case. You could get a lot more minimal than this. But it does drop some existing use cases. =)

nikomatsakis · February 9, 2018, 9:34pm

This is covered in detail in many blog posts. Let me name a few reasons though:

The region and borrow checker currently has a lot of freedom to pick regions for you. It tries to pick very minimal regions. If we tied them to specialization, it would be making decisions about what code executes. That is bad.
Resulting code-size explosion would be extreme.
We would not be able to have virtual methods that are lifetime parametric (which are omnipresent), just as you cannot have virtual methods that are type parametric today. That only works because we erase (we might be able to fashion a dictionary passing based solution, I suppose).

UPDATE: Sorry, don't like my tone here. I didn't mean to be dismissive. It's a good question!

arielb1 · February 9, 2018, 9:46pm

A few nitpicks:

Extension: Refining always applicable impls to consider the base impl

At the moment, these kinds of predicates fall outside the “Hereditary Harrop” (HH) predicates that Chalk can handle.

Why not? Can't you lift the existential out of the implication, and get the following equivalent predicate:

// τ,σ are "type schemes", and φ,ψ are "bound schemes".
// ρ is a "linearized" σ with no type parameter appearing twice.
//
// base impl: impl<B_0, B_1, ...> Foo<τ_1[B_0, B_1], ..> for τ_0[B_0, B_1] where φ[B_0, B_1]
// child impl: impl<C_0, C_1, ...> Foo<σ_1[C_0, C_1], ..> for σ_0[C_0, C_1] where ψ[C_0, C_1]
//
// "linearized" child impl: impl<D_0, D_1, ...> Foo<ρ_1[D_0, D_1], ..> for ρ_0[D_0, D_1]

forall<A_0, A_1, ..., B_0, B_1, ..., C_0, C_1, ...>
  if (
    for every i, τ_i[B_0, B_1, ...] = A_i,
    φ[B_0, B_1],
    for every i, ρ_i[D_0, D_1, ...] = A_i,
    WF(A_0: Foo<A_1, ...>),
    WF(B_i),
    WF(D_i)
  ) {
    exists<C_0, C_1, ...> {
      for every i, σ_i[C_0, C_1, ...] = A_i,
      ψ[C_0, C_1, ...]
    }
  }
}

i.e., in words, if the trait-ref is WF, the base impl matches, and the "linearized" child impl matches, then the real child impl matches.

This is exactly what we need for sound lifetime-free dispatch: monomorphization only needs to pick the highest-in-specialization-order child whose linearized impl matches, and we know from type-checking that the base impl matches, so this condition exactly says that the child impl holds.

Extension: Reverse implied bounds rules

This is more like an "inversion principle for impls", rather than an implied bounds rule (I don't see any WF bounds here that might be implied).

For example, it says that you can't add an impl<T: Clone> for Vec<MyNonCloneType> in the crate that defined Vec. OTOH, this is a potentially-useful generalization of negative impls.

Extension: Designating traits as “specialization predicates”

I consider TrustedLen as rather important, and it would have to be marked as a "specialization predicate" (or, as I prefer, a "decidable predicate").

jmst · February 9, 2018, 9:46pm

Doesn't necessarily seem bad as long as it chooses regions consistently, and ideally in a way that is best (choosing regions to be the same as much as possible).

It's true however that this makes the borrow checker harder to implement and maintain (which could indeed be a very serious problem).

Don't think so?

Pathological examples with lifetimes can be equivalently written with types, so they should not be worse.

But the key thing is the only thing that matters are not actual lifetimes, but rather whether lifetimes are distinct, whether they are 'static and their relationships, so it should be possible to canonicalize a lot.

E.g. if there is only one lifetime parameter, then only two versions are needed at most ('static and non-'static) and for 2 lifetimes parameters only 7 at most ('static 'static, 'static 'a, 'a 'static, 'a 'a, 'a 'b where 'a < 'b, 'a 'b where 'b < 'a, 'a 'b where 'a and 'b are incomparable).

And ideally the implementation would delay generating multiple versions until it can tell that there is a specialized impl that requires to do so.

Also if lifetime specialization is not used, the code generated should not change, so there is no reason for code-size explosion with an efficient implementation

You can still erase lifetimes by only generating the method with all lifetimes being incomparable although it means virtual methods will not be able to take advantage of specialization in the types/methods they use. It also means that code needs to handle lifetimes being incomparable, but I don't think that would be a problem, and it's also a nice feature to have in general to be able to have "synthetic lifetimes" for self-referencing, etc.

It's also possible to create all possible versions if the number of lifetime parameters is small, since there are only 2 versions for 1 parameter and 7 for 2 parameters as described above.

EDIT: and for higher numbers of lifetime parameters runtime dispatching is possible, by passing a representation of the partial order between lifetime parameters and 'static ether as an O(n^2) matrix of comparisons for smaller numbers of lifetimes or by representing each lifetime as an integer interval with the interval partial order (which is O(n log n) space)

Overall, the big advantage I see with this approach is that while it's still tricky to implement, it's much easier to learn and use since there are no restrictions. However the implementation complexity may be too high, so it may be better to start with the OP's proposal and postpone this as a possible later expansion.

nikomatsakis · February 9, 2018, 9:49pm

I was a bit hand-wavy there -- what it really can't handle right now is if (A = B) constructs. That requires a more expressive notion of equality. I suspect we can add it, but the current universe treatment assumes (for example) that the following is unprovable, regardless of the value of C:

exists<T> {
  forall<X> {
    if (C) {
      X = T
    }
  }
}

However, clearly if C could be X = T, then it ought to be provable. But supporting such equality in the general complicates things. In this particular case where we don't have goals that are outside of the if, I think we can still handle it in a fairly simple way.

(Or maybe there's a simple overall solution, but I don't see it just now.)

arielb1 · February 9, 2018, 9:55pm

I think you can just unify τ and ρ? This might work crappily in the presence of associated types, but that can probably be observed from "normal code" too.

leonardo · February 9, 2018, 11:45pm

Are instance chains with else/fail useful here?

https://ghc.haskell.org/trac/ghc/ticket/9334

comex · February 10, 2018, 1:22am

Hmm… Seems similar to the ideas in the "Can we make a knob" section from Aaron's post that you linked. But as he said:

But saying that you can never repeat a type within a specialization would be very restrictive.

Also reminds me of Haskell's type-parameter roles, which I compared to Rust's situation recently. In fact, I believe that requiring a particular impl to be "always applicable" is equivalent (in some formal sense) to requiring that all its type parameters be representational (AKA parametric), versus nominal. And putting #[specialization_predicate] on a trait is equivalent to requiring that its Self 'parameter' is parametric. However, Niko's proposed rules provide more flexibility than Haskell's in some cases, which I think is sound, as I'll describe later.

In both languages, we want a generic definition to stay representationally equivalent if you change one of its parameter between two types/lifetimes that are themselves representationally equivalent. In both languages, there's a specific language feature that provides a "base case" for representational equivalence - if two different struct definitions happen to have the same field types, that doesn't count; we only care about a case where the language guarantees that two types have the same representation.

In Haskell, that's newtype: newtype A = B defines a type A which is equivalent to B. The closest Rust equivalent would be something like #[repr(transparent)] struct A(B);. Importantly, Haskell lets you coerce between any two types with a representational equivalence guarantee. So you can coerce between A and B, and also between Foo A and Foo B if Foo's parameter is representational. (This was originally an accidental consequence of GeneralizedNewtypeDeriving, but later became an explicit feature.)
In Rust, that's lifetimes. Trans's blindness to lifetimes effectively allows you to coerce from Foo<'a> to Foo<'anonymous>, though it's not as general as Haskell's coercion mechanism.

So let's look at Haskell's rules for inferring representational versus nominal. To copy and paste from the GHC wiki page:

Next, we descend into each defining equation of the type family and inspect the left-hand and right-hand sides. The right-hand sides are analyzed just like the fields of a data constructor; see the Role inference section above for more details.

This example isn't on that page, but if you write something like

type family Foo x where
    Foo x = Bar x

where Bar is an existing type family, Foo's parameter inherits the representational/nominal decision from Bar. So if Bar's parameter is nominal, Foo's parameter is implied to be nominal, and ditto for representational.

This is roughly equivalent to "Condition 1: Relies only on implied bounds." By default, type parameters to Rust traits are nominal (i.e. not required to be parametric), including the Self 'parameter', so adding random trait bounds to an always-applicable impl would force its parameters to be nominal as well, contradicting the requirement that they be representational. But since marking a trait as #[specialization_predicate] asserts that its Self parameter is representational, it's okay to depend on those.

How is a restriction on bounds equivalent to a restriction on the right-hand side? Well, type families aren't the same as traits - they're basically a type-to-type lookup table, most directly analogous a trait with one associated type, which happens to be implemented for all types. As it happens, Haskell has a much much closer equivalent of traits, namely typeclasses, and they can even have associated types. But GeneralizedNewtypeDeriving has a blanket ban on typeclasses with associated types, regardless of representational-ness, so I can't use those as an example.

But there's a pretty simple transformation to simulate a predicate trait using a type family. This code:

trait SomeTrait {}
impl SomeTrait for isize {}

could be implemented as:

// define two types named True and False
data True
data False
// and the lookup table:
type family IsSomeTraitImplemented x where
  IsSomeTraitImplemented Int = True
  IsSomeTraitImplemented _ = False

And then if you want to add a "bound",

trait AnotherTrait {}
impl<T> AnotherTrait for T where T: SomeTrait {}

could be

type family IsAnotherTraitImplemented x where
  IsAnotherTraitImplemented x = IsSomeTraitImplemented x

(and you could define an "and" combinator if you wanted to simulate multiple bounds, etc.)

So the "bound" T: SomeTrait turns into a type expression IsSomeTraitImplemented x on the right-hand side.

Anyway, continuing on with the Haskell rules:

The more interesting analysis comes when inspecting the left-hand sides. We want to mark any type variable that is scrutinized as nominal. By "scrutinized", we mean a variable that is being used in a non-parametric fashion. [..]

To accomplish this, we check for any occurences of the either of the following sorts of scrutinization:

[I'm going to quote these out of order:]

Type patterns that are syntactically equal are all marked as nominal. For instance:
type family Eq w x y z where
  Eq a b (Either b a) c = a
Here, we have two variable names that are used in multiple places: a and b. As a result, the type variables which they inhabit (w, x, and y) are all marked as nominal.

That's easy: it corresponds to "Condition 3: Each type parameter can only be used once."

A type pattern that is not a single type variable. For instance, all of these equations provde examples of type patterns which do scrutinize a particular type variable:
type family Inspect x where
  Inspect Int          = Bool
  Inspect (Either a b) = a
  Inspect (f a)        = a
Any type variable that is scrutinized in this fashion (x in the above example) is marked as nominal.

This is the part where Niko's proposed rules are more flexible.

For lifetime parameters, this is covered by "Condition 2: Fully generic with respect to lifetimes." Similarly to how the above disallows Inspect Int = Bool, using a concrete type where we want a type parameter, an always-applicable impl can't say impl SomeTrait for Foo<'static>, using a concrete lifetime where we want a lifetime parameter.

But Niko's rules do allow for a more direct equivalent to the above example: we could impl SomeTrait for isize, and separately impl<A, B> SomeTrait for Either<A, B>. This is safe, though, and I think it would be safe in Haskell as well - i.e. Haskell is being overly conservative. After all, in both languages we know that Int and Either a b cannot possibly be representationally equivalent:

In Haskell, that's because Int is not a newtype of Either or vice versa; they are distinct data types. (Though there may be backwards-compatibility concerns, generally speaking, in allowing code to make these kinds of assumptions. It's unlikely that Int's implementation would ever be changed into a newtype of Either, but you could imagine that happening for other pairs of types...)
In Rust, that's because isize and Either<A, B> are not identical barring lifetimes. (No backwards-compatibiliy concerns here, since there's no newtype feature. Transparent structs don't count, because the type system doesn't treat them specially.)

Therefore, it should be fine for Inspect Int and Inspect (Either a b) to also not be representationally equivalent. (edit: but I might be missing something, in which case I’d love to be corrected.)

Is there a point to this post?

Maybe. If Niko's plan is equivalent in some sense to a simplified version of type roles, then that suggests it's on good theoretical ground. And it opens the possibility of moving toward a more full-featured version, either now or (more likely) as a backwards-compatible extension.

On the other hand, I'll again quote @glaebhoerl, regarding Haskell type roles:

The opinion of many leading members of the Haskell community is that this solution was ill-conceived and brings more complexity than it’s worth.

So perhaps, even if they are equivalent, that should be regarded as an unfortunate coincidence rather than a path for future exploration.

withoutboats · February 10, 2018, 1:24am

This is awesome Niko, and it will cover all of the most pressing needs for specialization that I know about. Really stoked!

pcwalton · February 10, 2018, 1:34am

I would be pretty terrified if the particular lifetime the borrow check assigned could cause arbitrary different code to be executed at runtime. How could anyone hope to understand code that relies on that?

canndrew · February 10, 2018, 3:18am

Nit: in the blog post

impl<T> From<T> for ! { .. }

should be

impl<T> From<!> for T { .. }

Sgeo · February 10, 2018, 7:57am

Can HList search by type (without the type inference trick) be implemented with this?

If I recall correctly, I’d need the following impls to work

impl<Head, Tail> Has<Head> for Cons<Head, Tail> { .. }
impl<T, Head, Tail> Has<T> for Cons<Head, Tail> where Tail: Has<T> { .. }
impl<Head, Tail> Has<Head> for Cons<Head, Tail> where Tail: Has<T> { .. }

As far as I understood when last seriously thinking about this, I would need the “lattice rule” for this, and this proposal is a lattice rule that excludes some scenarios?

glaebhoerl · February 10, 2018, 11:35am

(FWIW I haven't kept up and I don't know what the current situation is - that was a while ago now. One of the reasons people didn't like them is that they're not first class - you can't talk about things being nominal or representational beyond marking individual type parameters as being one or the other - but also making them first-class seemed too elaborate relative to the benefit. Here are a couple of relevant-seeming links, which themselves appear to be several years old already. In any case, I would ask a GHCer or two before taking my old word as gospel on this. It may still be true but I don't know.)

asajeffrey · February 10, 2018, 6:47pm

Ooh this is interesting! Linear uses of type variables was the original Generic Java proposal, but turned out not to be sound (http://www.seas.upenn.edu/~sweirich/types/archive/1999-2003/msg00849.html). The problem there was the interaction between linearity and subtyping: you can construct a type B<T> <: A<T,T> then use B linearly, but oops it’s non-linear in the supertype.

But yay Rust doesn’t have subtyping except on lifetimes, so reconstructing this example in Rust is tricky to say the least. The nearest I could come to it is:

trait A<'a, 'b> {}
trait B<'a> : A<'a, 'a> {}

impl<'a, 'b> A<'a, 'b> for Foo<'a, 'b> {}
impl<'a, 'b> B<'a> for Foo<'a, 'b> {}

With current Rust this fails to compile (https://play.rust-lang.org/?gist=ccf9c99a2ab26d0899c828b5ae3a255f&version=stable), due to lifetime errors, not overlap.

So the good news is that the obvious attempts to recreate the Java example have failed, another victory for not much subtyping!

asajeffrey · February 10, 2018, 9:37pm

One thing that occurred to me is how much this is related to proof objects for identities in type theory. For example we can code up the semi-decision procedure for type equality:

struct True;
struct False;

where a P:MaybeEq<T,U> acts as a witness for the the semi-decision procedure for type equality, in that if P is True then T == U.

trait MaybeEq<T, U> { fn maybe_cast(x: T) -> Option<U>; }
impl<T> MaybeEq<T, T> for True { fn maybe_cast(x: T) -> Option<T> { Some(x) } }
impl<T, U> MaybeEq<T, U> for False { fn maybe_cast(_: T) -> Option<U> { None } }

The thing rustc can’t do, since this would allow rustc to decide type equality is infer P:

trait DecideEq<U> { type P: MaybeEq<Self, U>; }
impl<T, U> DecideEq<U> for T { type P = False; }
impl<T> DecideEq<T> for T { type P = True; }

This fails the requirements for overlapping impls since the specialization (the True case) uses T twice, where the base impl (the False) case uses T only once.

Code at https://play.rust-lang.org/?gist=b67b1219269348be21831f3e1ce2edbf&version=stable

RReverser · February 12, 2018, 10:20am

It’s not arbitrary though - as was mentioned above, you only need special-casing for ‘static lifetime, and it’ s pretty easy to distinguish already.

We can already implement some traits only for 'static versions of structs, it would be neat to let them co-exist with impls for any other lifetime for specialization.

rpjohnst · February 12, 2018, 8:38pm

With lifetime inference and constraints it is arbitrary. Special casing 'static might be reasonable but I would like to see some actual use cases for that.

nikomatsakis · February 12, 2018, 9:02pm

In terms of the work @scalexm did, the real form of the rule, I believe, would be this:

FromEnv(T: Clone) :- FromEnv(Vec<T>: Clone)

That is, if the environment tells us that Vec<T>: Clone, it also tells us that T: Clone. This is why I called it an implied bounds rule. But it's not particularly important. I'd prefer not to pursue said changes too eagerly anyway.

Topic		Replies	Views
Specialization, coherence, and API evolution language design	9	1575	March 25, 2019
Why isn't trait specialization a higher priority for Rust development? traits working group	13	1110	December 20, 2024
Shipping specialization: a story of soundness language design	30	6967	March 25, 2019
[Idea] About soundness in specialization language design	10	2297	May 22, 2022
Blog post: Supporting Blanket Impls in Specialization language design	18	4094	March 25, 2019

Blog post: Maximally minimal specialization: always applicable impls

Is there a point to this post?

Related topics