Variable dependent types?

This depends on how Vector is defined. If you have something like:

struct Vector<T, let N: usize> {
    vec: ::std::vec::Vec<T>,
}

then N should have no impact on monomorphization (assuming we don’t let N affect the choice of some associated type). I think you can get far with restricting things this way. This talk seems relevant:

Here’s a paper:

Also relevant:

4 Likes

Thanks @ExpHP. This is a very intriguing solution. Actually I didn’t know you could define types locally inside functions. It shares some aspects with what I’d like to express but it is not as expressive. In your example Tag implementations can only provide information known at compile time. They cannot for example contain the vector space dimension if this one is defined at runtime.

One could think about solving the previous issue by letting the Vector<> type handle the dimension, for example by changing the signature to new(dim : usize). Now a consumer is indeed allowed to encode the distinction between two distinct VectorSpace<> in the type system and to give them distinct dimensions. However he is not forced to do it. He can as well declare:

let a = VectorSpace::<MySingleTag>::new(2);
let b = VectorSpace::<MySingleTag>::new(3);

This makes the whole idea far less appealing. Nothing prevents the user to create some a and b that actually represent incompatible data and still make this distinction invisible in the type system. Then as a provider of an hypothetical vectorspace crate, when I want to provide impl Add for Vector I cannot rely on the desired invariant that vectors with the same tag should necessarily have the same dimension.

I was considering something similar to what @Centril is suggesting: no impact on monomorphization. Dependency to variables should be a pure compilation concept just like const or lifetimes. Vector<vectorspace> is then compiled as a single type.

@mcy : I suppose that V + W represents the direct sum of vector spaces? If yes I’m not considering something as powerful. The idea was to rely on variables only where V + W could either be a value or an expression depending on the point of view. The idea of using expressions crossed my mind but it seems to be a far more complicated issue.

1 Like

VectorSpace::new can be made doc(hidden) or renamed to something scarier-sounding, and the macro can be pushed forward as the proper way to generate them. Because it encapsulates the struct in a block (the second { hiding in the => {{), the macro is incapable of producing two spaces with the same type.

If you want to be able to rely on this assumption even in unsafe code, new can be made unsafe as well.

@ExpHP That is indeed a possibility but I found it dubious to consider macros as valid safe wrappers around unsafe code.

Another downside of your method is that the Tag trait feels like a very ad hoc solution (semantically it does not represent anything) and more importantly it must be anticipated in the design of the actual context type (=VectorSpace in the current example). If you are considering the CollectionSlice example in the opening post, I wouldn’t expect all pre-existing collections types to include such a mechanism but I would still like to be able to define such slices in a child crate.

Anyway I must aknowledge this pattern goes further than what I would have expected to be doable with current Rust. I’ll try to make additional experiments.

@Centril I’ve started to read your reference and it does seem relevant but even with some familiarity with Haskell I found the syntax of Idris quite difficult to decipher. Just a vague question: is there any type unification possible in this context? If yes how and when can they determine that Palindrome (makeList x y z) and Palindrome [1, 2, 3] are or aren’t the same type?

The Tag trait was unnecessary, but it could mean something semantically if you wanted to add methods to the tag types (a common use case I have is an associated fn name() -> &'static str that is used in some custom Debug impls).

I think the “Phantom Type” solution is the right one to do this statically.

If you want to do it at run time why not just use ‘if’ statements to test at runtime?

The problem with dependent types is that they case a failure just like a type-system compile time failure at runtime. Is that really the user experience you want? If you use plain old ‘if’ statements to catch the conditions you want to avoid at runtime then you can fail gracefully or give meaningful errors.

Really the question you are asking is can the compiler prove my code cannot ever call the matrix operation with inappropriate types. As soon as runtime polymorphism is concerned the compiler cannot make that guarantee.

There are static solutions using refinement types that can help, what they do is they can force you to put the ‘if’ statement in to get the code to compile. So if the vector operation is called with runtime polymorphic arguments, so the compiler cannot statically prove the function preconditions, you will need to insert an ‘if’ statement to check the precondition to get it to pass type-checking. I think this is the best solution, because there is no hidden cost, you can see any checks that you need to add in your code, and they type-checker will prevent any runtime errors, and enforce the invariant you want.

2 Likes

I suppose I’m the addressee.

What I had in mind is apparently distinct from the academic standard concept of dependent types. I’m definitely not an expert here but my comprehension from the few links provided by @Centril is that type unification for such dependent type depends on runtime values whereas I was considering some alternative idea that only need check at compilation time: type dependent on variables. My idea was that if you define:

let x = ...;
let y = x.clone();

let u = Foo<x>::new(...);
let v = Foo<y>::new(...);

variables u and v are not of the same type even if x == y.

Not sure what you’re calling runtime polymorphism here. If it is just the usual stuff (dyn Trait or any form of OOP dynamic binding) I’ve never proposed that. In the example I considered for matrices, something would certainly need to check somewhere during runtime the actual dimension of loaded matrices. But it certainly does make a difference if you are checking this compability for a bunch of matrices once and then encode it in the type system (provided the type system can express this information) or if you have to check compatibility of operands every time some arithmetic operation is involved.

It seems to me this have similarities with what I was actual proposing. But again I’m not an expert. I don’t know what are refinement types for example.

I’m by no means an expert on the type theory of dependent types… But… Dependently typed languages have some varying rules for unification… For example, to unify Palindrome (makeList x y z) and Palindrome [1, 2, 3] the compiler could run (by reductions / normalization) both sides and determine that they are structurally equal (think #[structural_match]) (see definitional equality, http://adam.chlipala.net/cpdt/html/Equality.html).

To ensure that such unification is sound and is erasable, many dependently typed languages will require such type level expressions to be both pure and total (they can’t diverge by any mechanism such as panics or infinite loops). A notable exception is Dependent Haskell (which hasn’t been fully implemented yet, but you can read about it in Richard Eisenberg’s thesis, https://arxiv.org/abs/1610.07978)

Where the compiler can’t determine that two types are equal by itself since the languages type theory of definitional equality is , you usually have some library-encodable mechanism for propositional equality of types by which you can massage proofs yourself:

data a :~: b :: k -> k -> * where
    Refl :: a :~: a

http://hackage.haskell.org/package/base-4.9.1.0/docs/Data-Type-Equality.html

data (=) : a -> b -> Type where
   Refl : x = x

http://docs.idris-lang.org/en/latest/proofs/pluscomm.html

data _≡_ {a} {A : Set a} (x : A) : A → Set a where
  refl : x ≡ x

http://agda.readthedocs.io/en/v2.5.2/language/built-ins.html#equality

Some more links:

Now, you can also add in more axioms into the language such as the commutativity of addition for usize. This all depends on what primitive types exist in the language and so on. Typically, the more axioms a type theory has, the more it can do for you and the less you will need to prove yourself.

I don’t agree; Using PhantomData pervasively for type level programming typically leads to encoding of monstrosities such as https://hackage.haskell.org/package/singletons-2.2 (N.B. Eisenberg who authored this package agrees that this is a bad approach and that it is a hack).

They don’t have to. As previously mentioned, most dependently typed languages ensure termination and thus you can’t fail (diverge) at run-time. Now, given user input, most programs will have some notion of “failure” such as when an incorrect input is given by the user, but this is normal program behavior and not logic bugs.

For the same reason as I’ve noted above this does not seem true.

That does not necessarily entail that type checking must wait until the actual runtime values are known; for example, if you have two abstract values x: usize and y: usize, then the type checker could unify x + y and y + x even tho the value of x and y are not known at run-time.

If the user has no recourse to prove this manually and the type system can’t do this for us automatically, then such a system does not seem particularly useful.

Refinement types are a type system feature with which you can refine existing base types by a predicate. For example, you may state that let foo : {x: usize | even(x)} = make_even_number(); and then the type of foo is all even usizes. In contrast, dependent types are usually built up by-construction.

See:

Thinking a bit more on the topic of monomorphization, it seems to me that what we really want to avoid is a situation such as:

data Bool = True | False

-- These types have different memory representations...
data TypeA = ...  
data TypeB = ...

pickType : Bool -> Type
pickType True = TypeA
pickType False = TypeB

We can get the same situation with:

// These types have different memory representations...
struct TypeA(Vec<&'static str>);
struct TypeB;

trait Func { type Output; }

struct Input<let cond: bool> { ... }

impl Func for Input<{true}> { type Output = TypeA; }
impl Func for Input<{false}> { type Output = TypeB; }

fn pick_type(cond: bool) -> <Input<{cond}> as Func>::Output {
    match cond {
        true => TypeA(vec!["foo", "bar", ..]),
        false => TypeB,
    }
}

It seems to me that we want to prevent the pattern matching on types that is occuring at the implementations of Func for Input<{x}>. Two ways come to mind to prevent such pattern matching:

  1. Ensure that x is polymorphic in the input and that you can’t specialize on let in impls.
  2. Ensure that let parameters and associated types can’t be combined in implementations.

I think that if you ensure either 1. or 2. you will avoid having runtime values affect the memory layout of types.

1 Like

I don’t agree; Using PhantomData pervasively for type level programming typically leads to encoding of monstrosities such as https://hackage.haskell.org/package/singletons-2.2 1 (N.B. Eisenberg who authored this package agrees that this is a bad approach and that it is a hack).

You can certainly add some syntactic sugar and some helper functions to make it nicer, but this is the underlying representation that you would use. We used a lot of phantom types in the HList paper. The realisation after writing that paper, and the follow up on OOHaskell, was that you don’t actually want to write all the boilderplate, but that the techniques are logically sound. In effect it is working in the “types that depend on types” corner of the lambda-cube. Using polymorphic-recursion you can even reify values to types, like in this Haskell example:

class T t where
    count :: t -> Int

instance T () where
    count () = 0

instance T t => T [t] where
    count [x] = 1 + count x

f :: T t => Int -> t -> Int
f x t | x == 0 = count t
f x t | x >= 1 = f (x - 1) [t]

main = do
    v <- getLine
    putStrLn (show (f (read v :: Int) ()))

Consider we have a refinement type like Int < 23, now consider we get a normal integer from stdin, and we then try and assign this integer to a variable of type Int < 23. If the value entered by the user is greater than 22, then the program will fail with a type-system error at runtime, and we do not really have any good mechanisms for catching type-errors at runtime.

let v : Int < 23
let x = userInput
v = x // can be a badly typed expression at runtime.

What we can do is use path-dependent types to do this:

let v : Int < 23
let x = userInput
if x < 23 {
    v = x // with path-dependent types x has type 'Int < 23' here
} else {
    print "integer too small" // x has type 'Int >= 23' here
}

So we can use path-dependent types to prevent runtime errors from IO. However we can see that the if test has to look very similar to the type refinement. If we have something like:

if mandelbrot_converges(x,x) {
   v = x
}

Now we have given the compiler the task of proving or disproving that whether mandelbrot set converges at co-ordinates (x, x) is the same as being less than 23. In effect we end up having to prove general function equivalence, which is unsolvable.

I’m suggesting that we eventually (and by eventually I mean in plenty of years – we should start with GADTs and such things first…) move out of that corner and enter a limited subset of λP2 (not λPω because of the lack of HKTs) such that we can encode a function:

fn replicate<T: Clone>(len: usize, elem: T) -> Vector<T, {len}> {
    ...
}

as well as:

struct Sigma<T, H: Func> {
    fst: T,
    // We still need to prevent picking types with
    // different memory layouts based on `fst` here.
    snd: H::Proj<{fst}>,
}

of course, when you encode such a thing, we can’t erase len at runtime, but I find that desirable in this case. Imposing rules 1 & 2 in my previous comment should prevent us from having memory layouts depend on run-time values.

The solution is to prevent the unification of v : Int < 23 and userInput : Int until we have a proof that userInput : Int < 23 which your second example illustrates.

As for catching type-errors at runtime (not suggesting that we do this for Rust, since it would probably be a bad fit…), Haskell has this: https://downloads.haskell.org/~ghc/7.6.2/docs/html/users_guide/defer-type-errors.html. It is not necessarily a good approach, but it exists :wink:

Yes, but this is also a trivial example… If we say something like:

fn replicate<T: Clone, let n: Int>(elem: T) -> Vec<T, {n}> { .. }

let n = userInput : Int;
let v : Vec<Vec<String, {23}>, {n}>;

let x = userInput : Int;
if x == _ {
    // We know that:
    //     v : Vec<Vec<String, {23}>, {n}>;
    // We know that:
    //     replicate(replicate("foo".to_owned())) : Vec<Vec<String, {?0}>, {?1}>;
    // We unify ?0 with 23
    // We unify ?1 with n
    v = replicate(replicate("foo".to_owned()));
} else {
    diverge();
}

Of course, you need to make this inference happen (which Idris / Agda should be capable of at least), but even this is simple yet the test does not look at all similar to me.

So the compiler does not have the axiom of function extensionality (and we presumably don’t add the univalence axiom so we can’t encode it as a lemma either…) and inference will give up when faced with solving that… The user will have to use manual proofs using library encoded propositional equality / refl as is usually done in Idris / Agda / friends…

@Centril. Thanks for all the information.

Nevertheless, the previous sentence sounds like a fallacy. You seem to deduce from “Any extension of the type system of Rust that can express dependent types would be particularly useful.” that “Any extension of the type system of Rust that cannot express dependent types is necessarily useless.” I don’t plan to engage in a debate on the design / benefit / inconvenient of the idea I had in mind compared to dependent types because @ExpHP’s solution, though a bit ad hoc, meets my current needs reasonably well. But I still see much value for multiple problems (some mentioned in the OP) in having some Foo<let x : Bar> type where all that counts is the variable used (for which the compiler could reason about), not the value it represents (which might only be known at runtime and even more strikingly which could change during the execution flow).

1 Like

This is sounding a lot like Scala’s path dependent types.

Not at all; In fact, I have several type system extensions orthogonal in mind that I have in mind and will eventually propose…

… but if we are going to have dependent types, then unification seems vital to it being usable enough to warrant the increase in complexity. If all you have is that you can refer to variables and fields of those in scope, then you can at most encode things like:

let u = Foo<a + b>::new();
let v = Foo<a + b>::new();
let v = u; // OK, these types unify.

However, not even this will work:

let u = Foo<a + b>::new();
let v = Foo<b + a>::new();
let v = u; // Nope.

let u = Foo<(a + b) + c>::new();
let v = Foo<a + (b + c)>::new();
let v = u; // Nope.

Further, keep in mind that we’re adding const generics, a form of value-dependent types limited to compile time values. It would be quite inconsistent to have unification in const generics but not for runtime-value-dependent-types if we ever add the latter.

Fair enough :slight_smile: I’m happy to end the discussion to here if you don’t want to continue.

From what I have read so far it does not seem related. The feature explained in this paper does not seem to give you the possibility to express that leo.Food and simba.Food should be different types. on the contrary if leo and simba are actually both instances of Lion these types will be the same (Meat). This is not what was proposed here.

I didn’t mean that there was no point in going on with this discussion. Sorry if I gave you this impression. I am interested in discussing whatever extension of the type system that might be useful. Just that now that I know how to move on with my pet projects, I was just worried about extending a thread that potentially no one else would care about.

I’m aware it won’t work, that dependent types would apparently allow this piece of code and that they are a more natural extension of const generics. But at the same time I consider these questions: do I really need this ? what price (in term of code complexity, increase of compilation time, binary bloat) am I ready to pay for something like this ? And even more interestingly I’m considering this other piece of code:

let u = Foo<a>::new();
a.modify()
let v = Foo<a>::new();
let equal = v == u; // valid

As already said, I have no deep understanding about how dependent type works but my gut feeling is that such code would be rejected because - correct me if I’m wrong - the values those types depend on must be immutable. They must be mathematical values with no inner state, which suits Haskell (and I suppose Idris and Agda) very well. But Rust is certainly not this kind of language, right ? It deals with states all the time. Why should it be a better idea to support your case rather than mine? Not to say the answer is obvious ; at least it is not to me (and in the end each of these two ideas don’t even have to exclude the other one).

1 Like

If you haven’t seen it before, this might interest you:

I’m perhaps an oddball, but dependent types interest me deeply :slight_smile:

Naturally “need” is a subjective matter, but I would note a few things:

  • Just because you have dependent types does not mean you must use them everywhere; you can use them in the parts of your code where it is most natural and where it is most important to prove properties about your program behavior.
  • If you already specify QuickCheck properties about your code, the step to dependent types is natural as you’ve already specified invariants to some degree.
  • With any advanced type-system feature you’ll get increased compile times, but only if you use the features so you pay for what you use. You can already do a bunch of type hackery with traits and such, but with dependent types, these are expressed more naturally.
  • As for binary bloat, I think you can avoid that if you avoid monomorphization, and I detailed how this could be avoided. Where it comes to functions such as fn replicate(length: usize, elem: Foo) -> Vec<Foo, {length}> {..} this should be zero cost in the sense that you couldn’t have hand-written better code. When the values are known at compile time, the compiler should be able to use this information to optimize further.
  • Languages like Ada have refinement typing and it is used in industry for safety critical applications (c.f. aerospace). I think the notion of upping our ability to give more fine-grained guarantees jives well with Rust’s overall focus on correctness and safety.

The snippet you wrote brings a few questions to front. First, it seems to me that writing Foo<a>::new() immutably borrows (or moves) a until u is no longer out of scope. v should do the same thing. I think therefore that a.modify(); must be rejected unless typeof(a): Copy. It further occurs to me that a may have no interior mutability. It is highly non-trivial to integrate linear types or region-typing with runtime-value-dependent types. Neel Krishnaswami discusses this wrt. linear types in a lecture series, https://www.youtube.com/watch?v=5i3YDgQyIwE. Still, even if we impose these restrictions, a whole bunch of types are Copy or do not have interior mutability.

I think we can more or less regain your use-case here by having some a.token() which is initialized when a is constructed and then even if you do modify a, the token stays the same. Thus we have:

let token = a.token();
let u = Foo<{token}>::new();
a.modify();
let v = Foo<{token}>::new();
let equal = v == u; // valid

Not necessarily. It may be desirable to borrow in some case (or even possibly to move a) but it does not have to be either mandatory or even the default case. The minimal semantics are just that u and v are somehow related to a. More specifically u and v may be used together as parameters of certain functions only because their types match (=i.e. they are related to the same a). This naturally implies that:

  • a must be specified when u and v are created ;
  • neither u nor v can outlive a.

But that does not mean that the value of a needs to be used anywhere in functions that consumes items of types Foo<a>. And even when such value is actually needed, the borrow can perfectly start when the function is called, not when u is created. Certainly a dedicated syntax would be necessary to specify which behavior is intended.

Let’s imagine something like this (the syntax proposed here is cumbersome and ugly, but let’s agree this is not the point):

// a is not used anywhere inside bar_tag
fn bar_tag<let a: MyContext>(u: &Foo<a>, v: &Foo<a>) { … }
// a is borrowed when calling bar_ref
fn bar_ref<let ref a: MyContext>(u: &Foo<a>, v: &Foo<a>) { … }
// a is mutably borrowed when calling bar_ref
fn bar_mut<let mut ref a: MyContext>(u: &Foo<a>, v: &Foo<a>) { … }

…

let a : MyContext = …
let u = Foo::<a>::new( … );
let v = Foo::<a>::new( … );

// a is not borrowed when we reach this point
bar_tag(u, v); // u and v are borrowed at this point but a is not impacted
bar_ref(u, v); // a is immutably borrowed here
// ... and released here
bar_mut(u, v); // ... and reborrowed mutably here
// but neither u not v can outlive a

An interesting aspect is that u and v do not necessarily have to carry any internal reference to a (that would require a to be locked). However they do represent data that may only be interpreted in the light of a (sometimes the actual value of a is required, sometimes we just need to know that it is the same a, independently of its actual value). Of course you may also want to borrow a from the start (when u or v are created) but this can be handled independently by lifetimes.

That may work indeed, although it requires that token() copies some information.

It seems to me that if we have dependent types (with unification), when you want to ensure that u and v are related to the same a, you must prevent a from having been modified in-between. This is because I’m imagining that you would be able to use a inside bar_tag and that the language does not prevent this. Your snippet is interesting; I’ll have to think it over some more.

Sure; but I would like to have sigma types in the language as well; and then you could carry an internal reference to a.

True; hopefully the compiler is smart enough to optimize away when it realizes that it is just a constant value used the same way in both places. But since compiler optimization is black magic, you never know.

1 Like

Bluss has written a very nice indexing library that uses a lifetime, 'id, to express that an index Index<'id> belongs to a the static interval Range<'id, P>. In other words the lifetime acts as a kind of type level reference from the index to the range.

I don’t know if this approach can be tailored to solve your problem, and it may get annoying to have lifetimes in all structs, but have a look.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.