Variable dependent types?

From time to time I’ve found myself wanting to express some weird concept that I have never found expressible in any language i know, something that might be called “variable dependent generic types”. For example:

struct VectorSpace {…}
struct Vector<let v: VectorSpace> {…}

impl<let v: VectorSpace> Add for Vector<v> {…}

fn main() {
    let a = VectorSpace::new();
    let b = VectorSpace::new();

    let x = Vector<a>::new(…);
    let y = Vector<a>::new(…);
    let z = Vector<b>::new(…);
    // x and y have the same type but type of z is distinct
    let u = x + y; // ok
    let v = x + z; // compilation error
}

Disclaimer: this is not a (pre-)rfc, not intended to become a (pre-)rfc in the near future. I’m not even sure there is a workable idea in the design space. I just hope to get feedback about similar design issues from other rust users, how they can be addressed, if related type system concepts are already known and whether these concepts might be considered for Rust in the future.

The problem i’m trying to solve is that sometimes we have a bunch of objects depending on what we could loosely call a common context (= a common set of data that all these objects “share” and rely on). The crux of the matter is that objects related to different contexts shouldn’t mix meaningfully. Consequently any attempt to connect them should be invalidated somehow. By “connecting them” I mean using them as parameters of the same method, just like the pairs of vectors that are added in the previous example. Generally this constrain is not (and as far as i can tell cannot be) encoded in the type system. As a consequence the only possible behavior left is to make every object keep a pointer/reference to the appropriate context, then to check if contexts are consistents, and finally to fail at runtime when incompatible contexts are erronously mixed.

In certain situations, the context may be represented by a value known at compilation time. In this case rfc for const generics might be enough. But this implies two strong limitations:

  • Compile time constants … must be known at compile time. This means the compiler must already know the complete list of actually used types from the abstract family A<const K : T>. By consequence it is impossible to create contexts dynamically.
  • With such an approach contexts cannot have internal mutable state: they must be pure values in the mathematical sense.

The example above may look a bit abstract but I can think of many situations where this could be used:

  • Imagine that you have two iterators pointing to different items of a collection. you may want to define some sort of CollectionSlice representing the sequence of items between these two positions. Of course you need the boundaries to belong to the same collection.
  • Similarly one can consider the task of finding the shortest path between two vertices of a given graph.
  • One can only apply arithmetic operations to square matrices if their sizes matches. Const generic can be used if you know this size at compile time but if you’re loading them from files or db the size may only be known at runtime. Still it would be nice to check size consistency once early on, encode it inside the type system and then apply whatever complex sequence of operations is needed.

Something that strikes me is the similarity between this kind of dependency to variables and lifetimes : Vector<b> should not outlive the scope of b just like Foo<'a> should not outlive lifetime 'a. This seems to suggest Rust may be a good contender to express this kind of idea. I’m aware many aspects would need to be clarified (How type unification works? Can we move/borrow the variable and under which conditions? Does it require monomorphization?) and investigation might possibly show that in the end this cannot work. Or there might already exist other good-enough strategies to encode similar constrains. Just asking…

2 Likes

This is often something I've vaguely wanted, but you can easily simulate it with locally-defined marker structs. In fact, I'm not even sure whether this has any less expressive power:

trait Tag {}
struct VectorSpace<V: Tag>(PhantomData<V>);
struct Vector<V: Tag>(PhantomData<V>);

macro_rules! new_vector_space {
    ($name:ident) => {{
        #[allow(bad_style)]
        enum $name {}

        impl Tag for $name {}

        VectorSpace::<$name>::new()
    }};
    () => { new_vector_space!{UnnamedVectorSpace} };
}

fn main() {
    // or 'let a = new_vector_space!{a};' for better error messages
    let a = new_vector_space!{};
    let b = new_vector_space!{};

    let x = a.new_vector(…);
    let y = a.new_vector(…);
    let z = b.new_vector(…);

    let u = x + y; // ok
    let v = x + z; // forbidden
}

This last example sounds like a use for "standard" dependent types. I'm not sure the variable needs to be included.

3 Likes

I’m suspicious of type information that is determined at runtime. There’s a lot of people who excitedly mutter about dependent types but I question what the monomorphization strategy is. Something like

// how do you momomorphize this without erasure (ew)
// or runtime monomorphization? (ewww)
fn add_zero<let V: VSpace, let W: VSpace>(v: Vector<V>) -> Vector<{V + W}> {
    v + W.zero()
}

I’m not convinced that a low-level language like Rust with a tiny std can deal with this kind of thing- the only languages I’ve seen with dependent types have monstrous runtimes and GC.

3 Likes

How about Idris? Idris compiles to C, it has a GC but I don't see why that's strictly necessary.

1 Like

I don’t know Idris’s internals- my fear is that there may be costs that can’t be paid at compile time if a type has runtime-dependent components. Here’s a better example of what I’m worried about:

fn foo(x: A) -> B<x> { .. }

At least, I don’t see the benefits of something like this that const generics (as proposed right now, they’re not even their final form).

This depends on how Vector is defined. If you have something like:

struct Vector<T, let N: usize> {
    vec: ::std::vec::Vec<T>,
}

then N should have no impact on monomorphization (assuming we don't let N affect the choice of some associated type). I think you can get far with restricting things this way. This talk seems relevant:

Here's a paper:

Also relevant:

4 Likes

Thanks @ExpHP. This is a very intriguing solution. Actually I didn't know you could define types locally inside functions. It shares some aspects with what I'd like to express but it is not as expressive. In your example Tag implementations can only provide information known at compile time. They cannot for example contain the vector space dimension if this one is defined at runtime.

One could think about solving the previous issue by letting the Vector<> type handle the dimension, for example by changing the signature to new(dim : usize). Now a consumer is indeed allowed to encode the distinction between two distinct VectorSpace<> in the type system and to give them distinct dimensions. However he is not forced to do it. He can as well declare:

let a = VectorSpace::<MySingleTag>::new(2);
let b = VectorSpace::<MySingleTag>::new(3);

This makes the whole idea far less appealing. Nothing prevents the user to create some a and b that actually represent incompatible data and still make this distinction invisible in the type system. Then as a provider of an hypothetical vectorspace crate, when I want to provide impl Add for Vector I cannot rely on the desired invariant that vectors with the same tag should necessarily have the same dimension.

I was considering something similar to what @Centril is suggesting: no impact on monomorphization. Dependency to variables should be a pure compilation concept just like const or lifetimes. Vector<vectorspace> is then compiled as a single type.

@mcy : I suppose that V + W represents the direct sum of vector spaces? If yes I'm not considering something as powerful. The idea was to rely on variables only where V + W could either be a value or an expression depending on the point of view. The idea of using expressions crossed my mind but it seems to be a far more complicated issue.

1 Like

VectorSpace::new can be made doc(hidden) or renamed to something scarier-sounding, and the macro can be pushed forward as the proper way to generate them. Because it encapsulates the struct in a block (the second { hiding in the => {{), the macro is incapable of producing two spaces with the same type.

If you want to be able to rely on this assumption even in unsafe code, new can be made unsafe as well.

@ExpHP That is indeed a possibility but I found it dubious to consider macros as valid safe wrappers around unsafe code.

Another downside of your method is that the Tag trait feels like a very ad hoc solution (semantically it does not represent anything) and more importantly it must be anticipated in the design of the actual context type (=VectorSpace in the current example). If you are considering the CollectionSlice example in the opening post, I wouldn't expect all pre-existing collections types to include such a mechanism but I would still like to be able to define such slices in a child crate.

Anyway I must aknowledge this pattern goes further than what I would have expected to be doable with current Rust. I'll try to make additional experiments.

@Centril I've started to read your reference and it does seem relevant but even with some familiarity with Haskell I found the syntax of Idris quite difficult to decipher. Just a vague question: is there any type unification possible in this context? If yes how and when can they determine that Palindrome (makeList x y z) and Palindrome [1, 2, 3] are or aren't the same type?

The Tag trait was unnecessary, but it could mean something semantically if you wanted to add methods to the tag types (a common use case I have is an associated fn name() -> &'static str that is used in some custom Debug impls).

I think the “Phantom Type” solution is the right one to do this statically.

If you want to do it at run time why not just use ‘if’ statements to test at runtime?

The problem with dependent types is that they case a failure just like a type-system compile time failure at runtime. Is that really the user experience you want? If you use plain old ‘if’ statements to catch the conditions you want to avoid at runtime then you can fail gracefully or give meaningful errors.

Really the question you are asking is can the compiler prove my code cannot ever call the matrix operation with inappropriate types. As soon as runtime polymorphism is concerned the compiler cannot make that guarantee.

There are static solutions using refinement types that can help, what they do is they can force you to put the ‘if’ statement in to get the code to compile. So if the vector operation is called with runtime polymorphic arguments, so the compiler cannot statically prove the function preconditions, you will need to insert an ‘if’ statement to check the precondition to get it to pass type-checking. I think this is the best solution, because there is no hidden cost, you can see any checks that you need to add in your code, and they type-checker will prevent any runtime errors, and enforce the invariant you want.

2 Likes

I suppose I'm the addressee.

What I had in mind is apparently distinct from the academic standard concept of dependent types. I'm definitely not an expert here but my comprehension from the few links provided by @Centril is that type unification for such dependent type depends on runtime values whereas I was considering some alternative idea that only need check at compilation time: type dependent on variables. My idea was that if you define:

let x = ...;
let y = x.clone();

let u = Foo<x>::new(...);
let v = Foo<y>::new(...);

variables u and v are not of the same type even if x == y.

Not sure what you're calling runtime polymorphism here. If it is just the usual stuff (dyn Trait or any form of OOP dynamic binding) I've never proposed that. In the example I considered for matrices, something would certainly need to check somewhere during runtime the actual dimension of loaded matrices. But it certainly does make a difference if you are checking this compability for a bunch of matrices once and then encode it in the type system (provided the type system can express this information) or if you have to check compatibility of operands every time some arithmetic operation is involved.

It seems to me this have similarities with what I was actual proposing. But again I'm not an expert. I don't know what are refinement types for example.

I'm by no means an expert on the type theory of dependent types... But... Dependently typed languages have some varying rules for unification... For example, to unify Palindrome (makeList x y z) and Palindrome [1, 2, 3] the compiler could run (by reductions / normalization) both sides and determine that they are structurally equal (think #[structural_match]) (see definitional equality, Equality).

To ensure that such unification is sound and is erasable, many dependently typed languages will require such type level expressions to be both pure and total (they can't diverge by any mechanism such as panics or infinite loops). A notable exception is Dependent Haskell (which hasn't been fully implemented yet, but you can read about it in Richard Eisenberg's thesis, [1610.07978] Dependent Types in Haskell: Theory and Practice)

Where the compiler can't determine that two types are equal by itself since the languages type theory of definitional equality is , you usually have some library-encodable mechanism for propositional equality of types by which you can massage proofs yourself:

data a :~: b :: k -> k -> * where
    Refl :: a :~: a

http://hackage.haskell.org/package/base-4.9.1.0/docs/Data-Type-Equality.html

data (=) : a -> b -> Type where
   Refl : x = x

http://docs.idris-lang.org/en/latest/proofs/pluscomm.html

data _≡_ {a} {A : Set a} (x : A) : A → Set a where
  refl : x ≡ x

http://agda.readthedocs.io/en/v2.5.2/language/built-ins.html#equality

Some more links:

Now, you can also add in more axioms into the language such as the commutativity of addition for usize. This all depends on what primitive types exist in the language and so on. Typically, the more axioms a type theory has, the more it can do for you and the less you will need to prove yourself.

I don't agree; Using PhantomData pervasively for type level programming typically leads to encoding of monstrosities such as singletons: A framework for generating singleton types (N.B. Eisenberg who authored this package agrees that this is a bad approach and that it is a hack).

They don't have to. As previously mentioned, most dependently typed languages ensure termination and thus you can't fail (diverge) at run-time. Now, given user input, most programs will have some notion of "failure" such as when an incorrect input is given by the user, but this is normal program behavior and not logic bugs.

For the same reason as I've noted above this does not seem true.

That does not necessarily entail that type checking must wait until the actual runtime values are known; for example, if you have two abstract values x: usize and y: usize, then the type checker could unify x + y and y + x even tho the value of x and y are not known at run-time.

If the user has no recourse to prove this manually and the type system can't do this for us automatically, then such a system does not seem particularly useful.

Refinement types are a type system feature with which you can refine existing base types by a predicate. For example, you may state that let foo : {x: usize | even(x)} = make_even_number(); and then the type of foo is all even usizes. In contrast, dependent types are usually built up by-construction.

See:

Thinking a bit more on the topic of monomorphization, it seems to me that what we really want to avoid is a situation such as:

data Bool = True | False

-- These types have different memory representations...
data TypeA = ...  
data TypeB = ...

pickType : Bool -> Type
pickType True = TypeA
pickType False = TypeB

We can get the same situation with:

// These types have different memory representations...
struct TypeA(Vec<&'static str>);
struct TypeB;

trait Func { type Output; }

struct Input<let cond: bool> { ... }

impl Func for Input<{true}> { type Output = TypeA; }
impl Func for Input<{false}> { type Output = TypeB; }

fn pick_type(cond: bool) -> <Input<{cond}> as Func>::Output {
    match cond {
        true => TypeA(vec!["foo", "bar", ..]),
        false => TypeB,
    }
}

It seems to me that we want to prevent the pattern matching on types that is occuring at the implementations of Func for Input<{x}>. Two ways come to mind to prevent such pattern matching:

  1. Ensure that x is polymorphic in the input and that you can't specialize on let in impls.
  2. Ensure that let parameters and associated types can't be combined in implementations.

I think that if you ensure either 1. or 2. you will avoid having runtime values affect the memory layout of types.

1 Like

I don’t agree; Using PhantomData pervasively for type level programming typically leads to encoding of monstrosities such as singletons: A framework for generating singleton types 1 (N.B. Eisenberg who authored this package agrees that this is a bad approach and that it is a hack).

You can certainly add some syntactic sugar and some helper functions to make it nicer, but this is the underlying representation that you would use. We used a lot of phantom types in the HList paper. The realisation after writing that paper, and the follow up on OOHaskell, was that you don't actually want to write all the boilderplate, but that the techniques are logically sound. In effect it is working in the "types that depend on types" corner of the lambda-cube. Using polymorphic-recursion you can even reify values to types, like in this Haskell example:

class T t where
    count :: t -> Int

instance T () where
    count () = 0

instance T t => T [t] where
    count [x] = 1 + count x

f :: T t => Int -> t -> Int
f x t | x == 0 = count t
f x t | x >= 1 = f (x - 1) [t]

main = do
    v <- getLine
    putStrLn (show (f (read v :: Int) ()))

Consider we have a refinement type like Int < 23, now consider we get a normal integer from stdin, and we then try and assign this integer to a variable of type Int < 23. If the value entered by the user is greater than 22, then the program will fail with a type-system error at runtime, and we do not really have any good mechanisms for catching type-errors at runtime.

let v : Int < 23
let x = userInput
v = x // can be a badly typed expression at runtime.

What we can do is use path-dependent types to do this:

let v : Int < 23
let x = userInput
if x < 23 {
    v = x // with path-dependent types x has type 'Int < 23' here
} else {
    print "integer too small" // x has type 'Int >= 23' here
}

So we can use path-dependent types to prevent runtime errors from IO. However we can see that the if test has to look very similar to the type refinement. If we have something like:

if mandelbrot_converges(x,x) {
   v = x
}

Now we have given the compiler the task of proving or disproving that whether mandelbrot set converges at co-ordinates (x, x) is the same as being less than 23. In effect we end up having to prove general function equivalence, which is unsolvable.

I'm suggesting that we eventually (and by eventually I mean in plenty of years -- we should start with GADTs and such things first...) move out of that corner and enter a limited subset of λP2 (not λPω because of the lack of HKTs) such that we can encode a function:

fn replicate<T: Clone>(len: usize, elem: T) -> Vector<T, {len}> {
    ...
}

as well as:

struct Sigma<T, H: Func> {
    fst: T,
    // We still need to prevent picking types with
    // different memory layouts based on `fst` here.
    snd: H::Proj<{fst}>,
}

of course, when you encode such a thing, we can't erase len at runtime, but I find that desirable in this case. Imposing rules 1 & 2 in my previous comment should prevent us from having memory layouts depend on run-time values.

The solution is to prevent the unification of v : Int < 23 and userInput : Int until we have a proof that userInput : Int < 23 which your second example illustrates.

As for catching type-errors at runtime (not suggesting that we do this for Rust, since it would probably be a bad fit...), Haskell has this: 7.13. Deferring type errors to runtime. It is not necessarily a good approach, but it exists :wink:

Yes, but this is also a trivial example... If we say something like:

fn replicate<T: Clone, let n: Int>(elem: T) -> Vec<T, {n}> { .. }

let n = userInput : Int;
let v : Vec<Vec<String, {23}>, {n}>;

let x = userInput : Int;
if x == _ {
    // We know that:
    //     v : Vec<Vec<String, {23}>, {n}>;
    // We know that:
    //     replicate(replicate("foo".to_owned())) : Vec<Vec<String, {?0}>, {?1}>;
    // We unify ?0 with 23
    // We unify ?1 with n
    v = replicate(replicate("foo".to_owned()));
} else {
    diverge();
}

Of course, you need to make this inference happen (which Idris / Agda should be capable of at least), but even this is simple yet the test does not look at all similar to me.

So the compiler does not have the axiom of function extensionality (and we presumably don't add the univalence axiom so we can't encode it as a lemma either...) and inference will give up when faced with solving that... The user will have to use manual proofs using library encoded propositional equality / refl as is usually done in Idris / Agda / friends...

@Centril. Thanks for all the information.

Nevertheless, the previous sentence sounds like a fallacy. You seem to deduce from "Any extension of the type system of Rust that can express dependent types would be particularly useful." that "Any extension of the type system of Rust that cannot express dependent types is necessarily useless." I don't plan to engage in a debate on the design / benefit / inconvenient of the idea I had in mind compared to dependent types because @ExpHP's solution, though a bit ad hoc, meets my current needs reasonably well. But I still see much value for multiple problems (some mentioned in the OP) in having some Foo<let x : Bar> type where all that counts is the variable used (for which the compiler could reason about), not the value it represents (which might only be known at runtime and even more strikingly which could change during the execution flow).

1 Like

This is sounding a lot like Scala's path dependent types.

Not at all; In fact, I have several type system extensions orthogonal in mind that I have in mind and will eventually propose...

... but if we are going to have dependent types, then unification seems vital to it being usable enough to warrant the increase in complexity. If all you have is that you can refer to variables and fields of those in scope, then you can at most encode things like:

let u = Foo<a + b>::new();
let v = Foo<a + b>::new();
let v = u; // OK, these types unify.

However, not even this will work:

let u = Foo<a + b>::new();
let v = Foo<b + a>::new();
let v = u; // Nope.

let u = Foo<(a + b) + c>::new();
let v = Foo<a + (b + c)>::new();
let v = u; // Nope.

Further, keep in mind that we're adding const generics, a form of value-dependent types limited to compile time values. It would be quite inconsistent to have unification in const generics but not for runtime-value-dependent-types if we ever add the latter.

Fair enough :slight_smile: I'm happy to end the discussion to here if you don't want to continue.

From what I have read so far it does not seem related. The feature explained in this paper does not seem to give you the possibility to express that leo.Food and simba.Food should be different types. on the contrary if leo and simba are actually both instances of Lion these types will be the same (Meat). This is not what was proposed here.

I didn't mean that there was no point in going on with this discussion. Sorry if I gave you this impression. I am interested in discussing whatever extension of the type system that might be useful. Just that now that I know how to move on with my pet projects, I was just worried about extending a thread that potentially no one else would care about.

I'm aware it won't work, that dependent types would apparently allow this piece of code and that they are a more natural extension of const generics. But at the same time I consider these questions: do I really need this ? what price (in term of code complexity, increase of compilation time, binary bloat) am I ready to pay for something like this ? And even more interestingly I'm considering this other piece of code:

let u = Foo<a>::new();
a.modify()
let v = Foo<a>::new();
let equal = v == u; // valid

As already said, I have no deep understanding about how dependent type works but my gut feeling is that such code would be rejected because - correct me if I'm wrong - the values those types depend on must be immutable. They must be mathematical values with no inner state, which suits Haskell (and I suppose Idris and Agda) very well. But Rust is certainly not this kind of language, right ? It deals with states all the time. Why should it be a better idea to support your case rather than mine? Not to say the answer is obvious ; at least it is not to me (and in the end each of these two ideas don't even have to exclude the other one).

1 Like