[Pre-RFC] Genericity over static values

The idea to include types generic over static values in Rust has been around for a while and would like to ask for some feedback on the current state of it, as I have recently been working on improving a draft RFC. Note that there is an open issue on the rfcs repo and a working branch for this idea.

Summary

Implement genericity over values similar to C++ and D by allowing generics to have static values as parameters in addition to lifetime and type parameters.

Motivation

Generic types are very useful to ensure type-safety while writing generic code for a variety of types. Parametrisation over static values extends this feature to certain use cases, where it is necessary to maintain type-safety in terms of these values.

To illustrate this further, consider the following use cases for this feature:

  • Algebraic types: algebraic vectors and matrices generally have a certain dimensionality, which changes their behaviour. For example, it does not make sense to add a 2-vector with a 3-vector or multiply a 3x4 matrix by a 5-vector. But the algorithms can be written generically in terms of the dimensionality of these types.
  • Compile-time sizes: define sizes for small buffers and other data structures at compile-time. This can be quite important to only allocate larger chunks of memory, when they are really required. If this is unlikely, it can have a meaningful impact on performance. For instance, Servo currently resorts to a range of types implemented with macros to deal with this issue.
  • Physical units: In science one often deals with numerical quantities equipped with units (meters, hours, kilometers per hour, etc.). To avoid errors dealing with these units, it makes sense to include in the data type. For performance reasons, however, having the computational overhead in every single calculation at run-time might be prohibitively slow. In this context value parameters could allow to convert between units and check formulae for consistency at compile-time.
  • Range and mod types: This would allow range and mod types similar to Ada in Rust, which enforce certain range constraints and implement modular arithmetics respectively. In general, this feature allows for nicer abstractions that are dealt with at compile-time and thus have no cost in terms of run-time performance. It is also a very expressive feature that Rust is currently lacking in comparison with similar languages such as C++ and D.

Drawbacks

First of all, allowing another kind of parameter for generics adds a certain degree of complexity to Rust. Thus the potential merits of this feature have to justify this.

Furthermore, it is not entirely clear, how well this feature fits with Rust’s current approach to meta-programming. When implemented completely, this feature requires compile-time function execution (CTFE), which has been discussed in the past without a clear outcome. This feature would also introduce the possibility for template metaprogramming. However, Rust’s type system is already Turing-complete.

Detailed design

Basic syntax

Currently generics can be parametrized over type and lifetime parameters. This RFC proposes to additionally allow for static values as parameters. These values must be static, since they encode type-information that must be known at compile-time.

To propose a concrete syntax, consider these examples of a function and a struct:

// Add a static integer to another one
fn add_n<static n: int>(x: int) -> int {
    x + n
}

// An nĂ—m matrix
struct Matrix<T, static n: usize, static m: usize> {
    arr: [[T, ..n], ..m],
}

The syntax "static" ident ':' type closely resembles the syntax for type parameters with trait bounds. There is the additional keyword static to clearly distinguish it from type parameters. This keyword is already used in a different context. This use has a semantic relation to other uses of static as a keyword. There is no ambiguity due to the context of generic parameters in which it appears here either. Therefore it makes sense to use it in this context.

Alternatively, one could also omit static or use a different keyword. Traits would probably not be allowed as the type of a value parameter, since they could not be statically resolved at compile-time. Therefore the parser should be able to distinguish type parameters from static value parameters despite the similarity, even without static. However, the difference between type parameters and value parameters might become less obvious that way.

Here are a couple of alternatives:

fn add_n<n: i32>(x: i32) -> i32 { /* ... */ }
fn add_n<static n: i32>(x: i32) -> i32 { /* ... */ }
fn add_n<const n: i32>(x: i32) -> i32 { /* ... */ }
fn add_n<let n: i32>(x: i32) -> i32 { /* ... */ }
fn add_n<value n: i32>(x: i32) -> i32 { /* ... */ }
fn add_n<val n: i32>(x: i32) -> i32 { /* ... */ }

The static variant is arbitrarily chosen for the remainder of this document.

Generic instantiation

When a generic item is instantiated, a concrete value needs to be statically inferred. The simplest way to do this is to explicitly provide a static value:

let x = add_n<3>(2);

It may also be inferred from the context, as much as types can be inferred. Consider this function returning a struct generic over some value parameter n: usize:

struct Bar<static n: usize> { /* ... */ }
fn bar<static n: usize>(a: [f64, ..n]) -> Bar<n> { /* ... */ }

Given an array of statically known size n, it should be automatically instantiated, so that the following type checks:

let x = bar([1.0, 2.4, -5.3]);

Here, x should have type Bar<3>.

Default arguments

It is reasonable to specify a default value for a value parameter. This could be expressed completely analogously to default type and lifetime parameters as "static" ident ':' type '=' expr:

struct SmallVec<static size: usize = 16> {
    data: [T, .. size],
}

Ordering of the kinds of parameters

When both value and type or lifetime parameters are present, the question arises how they should be ordered. Lifetime parameters always come first in the current system, thus it makes sense to place value parameters either between lifetime and type parameters or at the very end. For this proposal the latter rule is adopted.

Alternatively, one could also allow type and value parameters to be mixed. This may be beneficial for API design as a grouping of parameters unrelated to their kind could make sense. However, it would be inconsistent with the previous restriction to have lifetime parameters come first.

fn foo<'a, T, static n: usize>(a: &'a [T,..n]) -> &'a T;

This also potentially has the advantage that the keyword could be omitted for all but the first value parameter:

fn foo<T, U, static n: i32, x: f32, b: bool>();

Allowed types

The type of an instantiated generic must be known at compile time. This naturally limits the types available for value parameters. Most obviously, the lifetime of the type must be static.

The most conservative approach would be to only allow usize. This is effectively what C++ does and covers many use cases. However, there are still a number of use cases that can not be addressed this way.

A generalization of this is to allow concrete sized types with a total equality (those implementing Eq).

/// A function with some compile time flag
fn foo<static flag: bool>(a: i32) -> i32;

/// An integer within a range
struct Range<static lower: i32, static upper: i32> {
    n: i32,
}

/// A range type, which may be bound
struct OptionRange<static lower: Option<i32>, static upper: Option<i32> {
    n: i32,
}

There are a couple more possible generalizations. First of all, it might be enough to expect the type of a value parameter to be PartialEq. This would extend the eligible types to f32, f64 etc. However, this means that there are types, which can never successfully type check.

struct Foo<static x: f64>;

#[test]
fn nan_type() {
    // This is a type mismatch, since NaN != NaN
    let x: Foo<0.0 / 0.0> = Foo<0.0 / 0.0>;
}

To resolve this the compiler could only allow the subset of values of a type, for which x == x holds, i.e. check this property for every value used to instantiate a type.

Another issue is, whether types such as arbitrarily large integers and rational numbers should be allowed.

Furthermore, it is conceivable to be generic over the type of a value parameter, e.g.

/// A more generic range type
struct Range<T, static lower: T, static upper: T> {
    n: T,
}

A special trait may signal the compiler eligibility as a type for a value parameter. This trait could be imposed in the above example as T: Value.

Functions in arguments (CTFE)

A very diserable feature is to do some algebra with the parameters, like this:

fn concatenate<T, static n: usize, static m: usize>
    (x: Vector<T, n>, y: Vector<T, m>) -> Vector<T, n + m>
{
    let mut new_data: [T, ..n + m] = [0, ..n + m];
    for (i, &xx) in x.data.iter().enumerate() { new_data[i] = xx; }
    for (i, &yy) in y.data.iter().enumerate() { new_data[i + n] = yy; }
    Vector::new(new_data)
}

This feature should be restricted to a certain subset of functions, namely pure functions without side-effects. Compile-time function execution (CTFE) as tracked in this RFC is required to instantiate the types parametrized with value parameters that are the result of some function. Describing CTFE is beyond the scope of this RFC but it certainly is required for this particular feature. However, the feature can be omitted, as the rest of this proposal works nicely without it.

Type inference can potentially become complex, especially when arbitrary functions must be executed. It seems reasonable to restrict the use of algebraic expressions in types in such a way, that it is still possible to immediately infer a type from an expression without resorting to any kind of function inversion.

Consider the following functions:

fn new<static n: i32>() -> T<n>;
fn inc_explicit<static n: i32>(x: T<n>) -> T<n + 1> {...}
fn inc_implicit<static n: i32>(x: T<n - 1>) -> T<n> {...}

#[test]
fn test_inc() {
    // This should work with x: T<4>
    let x = inc_explicit(new<3>());
    // This can not be inferred
    let y = inc_implicit(new<3>());
    // However, with explicit annotations it should type check
    let z = inc_implicit<4>(new<3>());
}

Traits and associated items

Given a complete implementation of associated items and particularly associated constants, the canonical use cases for value parameters in generic traits can be replaced by these concepts. Consider this example of a trait generic over value parameters p: P, q: Q…:

trait Abstract<static p: P, static q: Q, ...> {
    /* ... */
}

struct Concrete<static p: P, static q: Q, ...> {
    /* ... */
}

impl<static p: P, static q: Q, ...> Abstract<p, q, ...> for Concrete<p, q, ...> {
    /* ... */
}

This can be rewritten like

trait Abstract {
    static p: P;
    static q: Q;
    // ...
    /* trait methods, other associated items */
}

struct Concrete<static p: P, static q: Q, ...> {
    /* fields */
}

impl<static p: P, static q: Q, ...> Abstract for Concrete<p, q, ...> {
    static p: P = p;
    static q: Q = q;
    // ...
    /* method impls, other associated items */
}

It is also possible to generically require a particular kind of Abstract in a where-clause here:

fn foo<T: Abstract, static p: P, static q: Q, ...>() -> T
    where T::p == p, T::q == q, ...
{ /* ... */ }

There are examples though that would be impossible without value parameters in traits, e.g.

struct Foo;

impl<static p: P, static q: Q, ...> Abstract<p, q, ...> for Foo {}

So for completeness this aspect should be included as well.

Extended where clauses

In the last paragraph, where clauses comparing generic value parameters and associated items were briefly mentioned. Given CTFE, another extension could be to allow arbitrary boolean expressions containing value parameters as where clauses.

An example:

/// Returns the middle element of a slice with odd length
fn center<T, static n: usize>(a: &[T, .. n]) -> T
    where n % 2 == 1
{ /* ... */ }

There are possible collisions:

impl<static n: u32> Foo<n> for Bar<n>
    where n % 2 == 1
{ /* ... */ }

impl Foo<3> for Bar<3>
{ /* ... */ }

How should this be resolved?

Alternatives

Parts of the functionality provided by this change could be achieved using macros, which is currently done in some libraries (see for example in Servo or Sebastien Crozet’s libraries nalgebra and nphysics. However, macros are fairly limited in this regard.

Unresolved questions

  • Which exact syntax should be used? (static, const, value etc.)
  • In how far is compile-time function execution acceptable to support this?
  • How does this proposal interact with other planned language concepts, such as higher-kinded types or variadic generics?
  • A grammar specification is required
3 Likes

Immediate suggestion: "generic value parameters". At first glance, I thought this was about something like static<T> FOO: T; or something.

Also, to say this up front: I'm not an expert on the type inference Rust uses. Personally, I would want to see the RFC go into specifics on how this is going to impact the inference rules.

I think this proposal should probably stick to things that can be done without CTFE. If (hopefully when) Rust gets CTFE, it should be a reasonably natural expansion of this. Predicating anything here on CTFE just seems like a really bad idea in terms of defining something that is likely to be implemented.

For what it's worth, it should be const. static implies storage, which in this case does not exist. A const doesn't imply storage at all, so I feel it's the correct keyword.

This should probably be let x = add_n::<3>(2);.

And that should be [f64; n]. Actually, it should probably be [f64; N] since the convention is to write constants in ALL_CAPS.

My only concern here is: how would this interact with variadic generics?

How do you propose to implement this without CTFE? This implies that I can do this:

#[derive(Eq)]
struct Blah(i32);
impl PartialEq for Blah {
    fn eq(&self, other: &Blah) -> bool {
        /* insert arbitrarily complex code here */
    }
}

Also: whatever types are allowed, the compiler has to be able to serialise them as part of a symbol's mangled name. This is why D limits them to (if I remember correctly) the basic atomic types, strings and maybe tuples and arrays thereof.

Aside: this led to people making templated types so complicated that the object file format couldn't store the mangled name, which is why some D symbols are unreadable gibberish: I believe Walter ended up DEFLATE compressing them over a certain size! :smiley:

Another aside: it might be worth considering D's concept of an alias. This could be used to basically pass a variable or function "by name". You could use it to write generic wrappers around functions that allowed the compiler to use static dispatch. In Rust, the closest analogue I can think of would be something like a use param. Anyway...

You don't need CTFE for this. You just need to be clear about what constant folding is supported. Honestly, I really like the idea, but I'm terrified about the possible implications. Because type inference in Rust goes both ways, I'm not sure there's a difference in the two examples.

I mean, what do you do about N / 2? For most values, there's going to be two possible values of N it could have been.

Yeah, I suspect the coherence checker would need to be very conservative, if this is even feasible.

  • What, exactly, are the changes to the type inference algorithm?
  • How does coherence work?

In addition, I'd like to see more examples, in terms of breadth and complexity. The application to fixed-size arrays is obvious, but what else could you do with it, practically speaking? I just feel like that for something which should be as useful as I want this to be, solid motivating examples should be abundant.

Closing thoughts: I want this, but I think this is skipping too many big, important, and hard details.

1 Like

@aepsil0n Thank you for starting this work, this is an important feature covering a major hole in expressiveness of the language. I had some comments, but most of them were already voiced by @DanielKeep

Not only that, but mangling and implementation of Eq should be guaranteed to correspond to each other.

When associated const items are implemented, we will have an ugly poor man’s substitute for value generic parameters, which however covers the main use scenarios:

trait Extent {
    const extent: usize
}

impl Extent for [(); 0] {
    const extent: usize = 0
}
impl Extent for [(); 1] {
    const extent: usize = 1
}
// ...
impl Extent for [(); 100] {
    const extent: usize = 100 
}

struct SmallVec<T, N: Extent> {
    // `N::extent` becomes available here as a number
}

let sv = SmallVec::<u8, [(); 10]>::new();
1 Like

I don't think relying on value-level Eq is likely to be the right direction here. We shouldn't rely on the user-defined implementation of the == operator; instead we should rely on the built-in language facilities for deciding whether a value matches a pattern (which, in this case, is the other constant value). I'm not sure about mangling and whatnot, but from a type system perspective any type which implements Copy (and perhaps doesn't have private members) seems like it might be reasonable as the type of a generic constant.

2 Likes

I think we should follow Haskell’s approach. That is, type-level nats are of a different kind and actually have nothing to do with normal value-level integers.

First, introduce type enum, which declares a new kind:

#[lang="type-nat"]
type enum Nat {
    Z,
    S<Nat>,
}

Then we need a new set of type-level operator traits for types of any kind:

// `K Foo` means "Foo is of kind K"
// Are we going to allow LHS and RHS (and Output) to be of different kinds?
#[lang="type-add"]
trait TypeAdd<Kind RHS> where Kind Self {
    type Output where Kind Output;
}

// 0 == Z
impl<Nat LHS> TypeAdd<0> for LHS {
    type Output = LHS;
}

impl<Nat LHS, Nat N> TypeAdd<S<N>> for LHS {
    // `A + B` is just syntax sugar for <A as TypeAdd<B>>::Output
    type Output = S<LHS> + N;
}

...

Now we can do some type-level crazy stuff such as fn foo<T, Nat N1, Nat N2>(v1: Vector<T, N1>, v2: Vector<T, N2>) -> Vector<T, N1 + N2> { ... }

Not sure how name mangling or coherence would work, though.

const is for compile-time constants, static is for a guarantee of being placed in static memory.

1 Like

Urgh. This reminds me of classical C++ templates, where you can do any kind of computation with templates you can do with values, but the syntax is completely different and far more verbose. (C++ and Rust’s type systems are Turing complete; Haskell’s isn’t, but that doesn’t mean it can’t do practical computations in a sufficiently ugly fashion that quicksort is “advanced”.) C++14 constexpr functions are going in the right direction - as much as possible should be done using normal value semantics, reserving generic types for actual data types, containing data, that need to be generic. (This is likely the only aspect in which C++'s type system can be said to be better designed than Haskell’s! I guess solving that properly is the point of dependently typed languages.)

Also, your example doesn’t require special kind syntax. Nat can just be a trait; I think this is implementable today minus the + operator sugar, and with HKT we’ll be able to make things prettier in the future. But not pretty enough.

I suspected as much to be honest. But it's really helpful to get some critical feedback, so thank you for the extensive reply. I might look into this in more detail, when I have the time.

Nonetheless, if there are more people interested in this (esp. those who have more experience in language design & implementation than me), please feel free to advance this, sketch out details, etc. I am mostly trying to get this process started, because it is a feature whose lack feels really limiting, imho similar to higher-kinded types or variadic generics.

Haskell is also Turing complete with those extensions and even lambda calculus is implemented in the link you post. The kind system won’t conflict with CTFE, and maybe people can figure out some hybrid approach and make them interoperable: Now I’m thinking that const values of a type should automatically be promoted into types of a corresponding kind when appearing in type parameter.

If Nat weren’t a kind but a trait, people can do nonsense such as implementing Nat for their own type, so it would be harder to guarantee coherence. Nat being a kind is more intuitive because it tells people, “Don’t put any normal type parameter here.”

My thoughts:

  1. I would really like to see this stripped down to focus on integers (or even just one or two types, say i64 and u64), since that’s where I think there are solid use cases that can be demonstrated. Allowing arbitrary types leads to all the issues mentioned above, and I think could lead to a lot of user confusion.

    I am also strongly against allowing types to be parameterized by floats, ever. It’s not just the NaN issue you mention, it’s the fact that floating point arithmetic is inexact. I’m kind of horrified by the mere thought of debugging a program where the precision of a floating-point representation is key to whether or not things type check.

    Parameterizing by any other types can raised in a future RFC, if there are identifiable use cases that can’t be addressed with integers.

  2. This would be highly useful for certain types of numerical code, for reasons well beyond the ones you gave. As an example, let’s say that I have a function defined over a three-dimensional grid, and I want to do an integral over one of those dimensions. So I have some 3D array type, then I do some possibly-weighted sum to get a 2D array type, which is simple enough to code by itself.

    But then let’s suppose that I have a whole host of different multidimensional arrays. This could be because:

    • I have to do reductions on many arrays of different dimensions (say, anywhere from 1-10)
    • I am classifying multidimensional arrays by implementation details that need to be encoded in the type for performance reasons (such as whether they are completely contiguous in memory or broken up internally into various chunks).
    • I am distinguishing between array types according to some semantic characteristic not directly related to the implementation details.

    In this case, I don’t want to go through every combination and say that Array3<T> reduces to Array2<T>, Array5<T> reduces to Array4<T>, FooArray3 reduces to FooArray2, and so on (and then repeat this for every other operation that has to be implemented for each type and can’t be expressed using traits). Instead I want to say that Array<T, const N> reduces to Array<T, const N-1>, and FooArray<const N> reduces to FooArray<const N-1>, and that’s it. Which, importantly, means also specializing Array<T, 1> to reduce to a scalar.

    A simpler example would be if I want to index ArrayN<T> by [usize; N]. which is not possible to do generically right now.

    (I am actually experimenting with this sort of thing. The problem is that I have to ridiculously abuse macros or resort to outright code generation in some cases. In this instance, Rust takes more work for the same type guarantees than C++, or in some cases even Fortran!)

  3. As @DanielKeep says, you don’t need CTFE as long as you only allow parameterizing by certain primitive values, and as long as the compiler will do basic arithmetic. He is also correct in pointing out that your solution to inference issues is not obviously going to fix most inference issues; Rust uses inference in both directions, so inc_explicit and inc_implicit are equally problematic:

    #[test]
    fn test_inc_backwards() {
        /* The compiler can't figure out which `inc_explicit` or `new` to call without inverting the arithmetic expression. */
        let x: T<4> = inc_explicit(new());
    }
    

    What’s even worse is that with more constants in play, you can encode some really nutty stuff:

    /* Boring function signatures that seemed plausible, off the top of my head. */
    struct Tensor<const N: usize> {};
    fn new<const N: usize>() -> Tensor<N>;
    fn inc_rank<const N: usize>(a: Tensor<N>) -> Tensor<N+1>;
    fn tensor_product<const M: usize, const N: usize>(a: Tensor<M>, b: Tensor<N>) -> Tensor<M*N>;
    
    /* Time to abuse the rules. */
    fn static_assert_is_prime<const N: usize>() {
        /* Since we use unsigned integers, the constant parameter of x is at least 2, the one for y is at least 1. */
        let x = add_rank(add_rank(new()));
        let y = add_rank(new());
        /* Types for x and y can only be inferred without ambiguity if N is prime. */
        let z: Tensor<N> = tensor_product(x, y);
    }
    

    You can also write functions encoding Diophantine equations and all kinds of other non-trivial maths. I see only two reasonable paths forward:

    • Have inference simply give up on using any expression as a constraint if it would have to be inverted to be used. So in your original example, inc_explicit would only allow inference in one direction, and inc_implicit only in the other direction, and that would be the end of that. Maybe this is what you originally intended?

    • Have inference invert an expression if and only if both of the following conditions are met:

      1. The expression involves no more than a single use of a single unknown parameter (so the tensor_product function above could not be used to deduce anything just from knowing the return type parameter).
      2. Only arithmetic operations that can be trivially inverted, given the output and all but one input, are used. This means +, -, and * only. (You could add / if the compiler interprets it as division that never has a remainder, and throws an error if the numerator is not an exact multiple of the denominator. You could also add the bitwise operators ! and ^, but I see no point in doing so.)

      These two conditions should be sufficient to ensure that inverting an expression is trivial, and therefore that the inversion can reasonably inform inference.

    Regardless of the above, arithmetic for inference of constant parameters should always use checked arithmetic. In particular, we want to guard against underflow on an unsigned integer, which could result in a huge constant parameter being inferred in buggy code.

  4. I’m not sure how I feel about the where clause part. To cover the example I gave in point 2 above, I would need to be able to treat Array<1> specially, which does mean that something like this feature needs to be present. What’s not so good is that, inside the where clause, the parser doesn’t know if it’s about to hit a type or an expression. This flexibility is also no good for the coherence check.

    My suggestion would be to require the part in the clause to be of the form const N binop expr, like so:

    impl<T, const N: usize> Array<T, N>
        where const N > 1 {
        /* ... */
    }
    
    /* Also OK, but note that the where clause has to be checked first to avoid any false positive underflow/overflow errors from the function signature. */
    fn decrement<const N: usize>(a: Foo<N>) -> Foo<N-1> where const N != 0;
    

    This syntax doesn’t allow arithmetic on the LHS, and therefore doesn’t allow the center function you suggested, but I think that function was a bit weird anyway. :confused: It’s a bit self-serving, but I feel like my case (and anything involving induction) is a fairly simple and important one, whereas a function that’s only defined for odd array lengths could cause strange problems.

    With this limited form of where clause, coherence checking should be straightforward, because it just boils down to whether two ranges of numbers overlap. But note that the RHS of each where clause cannot be used for inference. Otherwise you could have something like const N == (2*M-1), and if M was free we would have the same coherence issue as in the n % 2 == 1 case.

  5. It’s not clear to me that this proposal actually interacts with variadic generics at all. Presumably if constant value parameters came after type parameters, then they would also come after the ..T (or whatever the syntax is going to be for the tuple parameter in variadics). Otherwise I don’t see how the two connect.

    Likewise I have trouble seeing a direct interaction with higher-kinded types, other than that higher-kinded types could improve the flexibility of constant value parameters in the same way as they do with lifetime and type parameters.

  6. I think that there should be a default type for integer constant parameters where none is explicitly specified. It should probably be unsigned (presumably usize or u64), since the vast majority of use cases I’ve seen only need unsigned integers, and since the compiler should be checking for underflow. It’s a pain to type const N: usize for every case.

Yes, this was my intention, even though I did not cover the other direction explicitly.

From my point of view, it is fairly important to allow for more types than just integers. The use case I am thinking about most are rational numbers, as they are very useful to encode some physical unit systems in the type system. There may be plenty of other use cases, but generally I find compounds of integers with custom arithmetic most relevant to this proposal.

The prime number example pretty much shows that it's not a good idea to have the compiler do these kinds of function inversions. Imho this requires a full-flegded dependent type system to do properly, which is unlikely to happen in Rust. Of course, it is possible to limit this to predictable integer arithmetic, but then you're only dealing with one special case. It's certainly the most useful one, but looking at C++ it prevents you from doing some useful things.

I'm not saying that we shouldn't start with integers first. That is certainly a nice starting point. But I'd avoid too narrow semantics, so that it would be harder to extend it to more general types later.

btw, I agree that it's not the best idea to use floating point numbers. On the other hand it may allow us to provide very interesting compile-time guarantees for numerical computations. Admittedly this may be beyond the scope of a language like Rust, as the type system is not formulated on an underlying type theory, which makes these things hard to express in the first place.

The use case I am thinking about most are rational numbers, as they are very useful to encode some physical unit systems in the type system.

I did think about this a bit. For such cases, you presumably want the rational to be reduced, which definitely requires CTFE. (If you don't care about reducing values, you only actually need integers, since you can work with types like Foo<Rational<const M, const N>>. But then Foo<Rational<2, 4>> and Foo<Rational<1,2>> are different types, which is not ideal.)

A bigger issue is that, if we aren't limiting where clauses to very narrow operations on primitive types, the coherence problem is probably insoluble. Right now the compiler only allows two impls to coexist if it can be sure that they can never overlap for any type. This check becomes difficult or impossible if you can add arbitrary bounds on arbitrary constant values in a where clause. I seriously think that you either have to limit the extended where clause proposal to a limited set of types and operations that the compiler can handle, or rethink how the coherence rules work entirely (which I know has been brought up before, and may actually happen, but it's way outside the scope of this RFC).

Of course, it is possible to limit this to predictable integer arithmetic, but then you're only dealing with one special case.

I'm not saying that we should give up entirely on having at least some struct/enum types be usable as constant parameters. But I think that realistically we're looking at a huge gap between the narrower and more general case. For the integer-only case, we have pretty clear applications, lots of precedent in other languages, and a clear path forward. If we allow something much more general, there's a lot less precedent, there are much fuzzier possible use cases, and there are a lot of harder issues to figure out about how this works with the rest of the language.

That said, I'm OK with either of the two options I mentioned for inference. I think that having the compiler be able to do some limited inversion for arithmetic expressions would be nice, but not a terribly important feature.

I agree that it's not the best idea to use floating point numbers. On the other hand it may allow us to provide very interesting compile-time guarantees for numerical computations.

I suppose it would. My main feeling about this is that for numerical computations users must be able to count on the compiler doing inlining and constant folding/propagation to a reasonable extent, because otherwise Rust just won't ever be able to deliver a good combination of abstraction and performance. So, assuming that you do have such optimizations triggered reliably, I think that it becomes hard to argue that you really need to deal floats on the type level badly enough to justify the extra complexity.

Hmm, this also brings to mind GHC's typechecker / constraint solver plugins. (In this domain I imagine constraint solving would be a simple (if not quick) matter of factorization.)

While this is a good point, I don't think the fallout would have to be that dramatic -- another option is for the coherence check to simply be conservative, and allow an impl where it can prove that it doesn't violate coherence, and reject otherwise. In other words, if a where clause is too sophisticated to analyze, behave as if it weren't there. Rust's coherence checking is already kind of bleeding edge for taking constraints into consideration at all; in the worst case this would fall back to only considering the types in the "head" of the impl, which happens to be the only part GHC ever does.

Out of curiosity, do you know of any systems which actually allow only type-level integers but not other types? Even C++ allows pointers and references (including functions) and enums in addition. And the impedance mismatch between types usable with constexpr functions vs. with templates can get pretty annoying. As far as precedent for type-level ADTs goes, (as far as I'm aware) GHC's DataKinds has been completely uncontroversial and held no major surprises. (In fact it seems to be have been considerably easier than doing type-level naturals, a.k.a. TypeNats!)

That said I know @darinmorrison also has opinions on this, which I believe are somewhere along the line of "it's not worth bothering with the intermediate C++/GHC-style systems and better to move straight to dependent types".

1 Like

That might be reasonable. I think my main concern would be that coherence checking might turn into this very fuzzy or very complex process, where you can't predict what's going to pass, and/or it depends on which particular compiler version you use. I'd really like to see it spelled out explicitly what is or isn't "too sophisticated to analyze".

I'll admit, I'd forgotten about pointers/reference parameters, in part because I've never felt that they were useful except perhaps in very niche situations. The case I've seen most often is parameterization by pointer-to-member, which of course Rust doesn't even have. (Of course, Rust doesn't have what C++ lables "enums" either, but something far more general.) It's a fair point that C++ does allow other types, but integers seem to get the most use by far..

To answer your broader question, I don't know of any cases that are integer-only right now, but several systems started with dependence on integers and later grew to encompass other types. E.g. I think that Dependent ML only had dependence on integers.

Also, a huge range of systems have had special built-in types that are parameterized by integers. E.g. Rust's [T; N] arrays, and there are the aforementioned Ada range/mod types. Of course that's not the same thing as having a truly general type dependence, but integers do seem to be the "gateway drug" for type dependence.

(Actually, I was mistaken. I just remembered that I do know of at least one language that's still being developed and used, and which has integer-only dependent types. Fortran! But that's hardly a good model. E.g. in Fortran 2003, which was influenced by C++, you can define a type that's parameterized by a compile-time constant (confusingly enough, constant parameters are called "kinds", which is completely unrelated to what the word "kind" means in other languages). But you can't define a function that's parameterized by an integer, so this does not actually save you much of the effort that generic programming is supposed to help with. They've been batting around the idea of parameterizing functions and modules for at least a decade now, but Fortran is not exactly a cutting-edge language, to understate horribly.)

I do like DataKinds, though I think it would take some thought to put a similar system into Rust. I guess that the main reason I wasn't thinking of GHC as a good "precedent" is because there are such big gaps between the two regarding the type system in general (e.g. Rust still lacks HKT).

Anyway, I'm trying to put together my own version of an RFC for at least the integer part. It will be big and preliminary, but I should have something soon.

I feel like this is all too complicated.

I just want to be able to say Array<T, n>, where n is a constant, integral, capacity. I don’t care about proving that you never get overfull at compile time, or being able to produce an Array<T, n + 1>. I don’t care about being able to have a Foo<T, const range: Range<T>>. I just want to have associated const integers for a type that I can elevate to a runtime value and do some work with. I want to be able to talk about [T; n] generically. I see no expressive value in being able to talk about lists with n elements in the type system for writing and using sane code. It only lets you prove properties and maybe avoid bounds checks if you invest a lot of effort into propagating types everywhere.

I’m more than happy to just panic at runtime on capacity exhaustion. At most I need to be able to talk about two collections having the same associated constant, but that doesn’t seem any different from having the same associated type. I’m sympathetic to the n-dimensional array usecase, but I don’t think it’s worth blocking the core functionality of being able to talk about [T; n] properly.

This is all we need to get working code for talking about fixed-sized collections on the stack. Critically, only supporting the primitive integer literals (and maybe bool, I guess, since that’s just a u1), allows us to completely punt on all of the hard problems.

  • Name mangling: it’s the literal value
  • Equality: integer equality has all the nice properties, and doesn’t need any user-defined code
  • CTFE: Don’t need it.
  • Traits: Don’t need it.
  • Non-equality constraints: Don’t need it.
  • Inference: should be pretty straight-forward, assuming the n in [T; n] is understood.
  • Default values: Should be pretty straight-forward. Can’t infer a const, use the default. Also don’t really need it.
  • Equivalent-but-not-equal types: Not a thing.

Any sane solution, as far as I can tell, can be backwards-compatibly bolted on top of this minimal functionality.

4 Likes

I do want to point out that this compiles and runs today, and it needs to continue doing so:

const STRUCT_SIZE: usize = 4;
const BUFFER_WIDTH: usize = 1024;
let x: [u8; STRUCT_SIZE*BUFFER_WIDTH] = [0; STRUCT_SIZE*BUFFER_WIDTH];

As @Gankra suggests, working with arrays generically is probably the most urgent use case here. (I don’t think necessarily that it’s the most important in the long run, but definitely the most immediate motivation from a systems programming perspective.) So that really constrains us to at least support usize constant expressions as parameters in a way that’s highly compatible with usize array sizes.

I still would really like to get arithmetic in here, but I have pared things down significantly to almost the bare minimum needed for working with arrays and the other use cases I’ve wanted to deal with. See this draft:

The only thing in that proposal that is really specific to integer (or at least number) semantics is the way that ranges are used, and even that could potentially be overloaded for other types in the future (e.g. anything that implements Ord, or something more like a type-level operator or macro).

Perhaps I’ve overlooked some things, but my feeling is that CTFE and more types of constant value parameters could be added in the future backwards-compatibly, and leverage the same features as I’ve proposed, even though this proposal only introduces integers and booleans as const parameters at first.

Thanks for writing that up; it looks mostly reasonable to me.

A nit: in the list following “There are several more cases where it would be helpful to use constant values as parameters”, I would refer to “algebraic types” using different words (“linear algebra”?), to avoid confusion w.r.t. the usual meaning of ADTs (product and sum types).

I like the syntax as proposed in the main text; in particular allowing only literals and identifiers as const arguments without explicit disambiguation seems like a good compromise. Using {} to explicitly disambiguate seems fine, but another idea (which I prefer) would be to use the const keyword here (as also occurs in the alternatives). So given struct Foo<const N: u64> { ... }, you could write Foo<99> and Foo<MY_STATIC_U64> as before, while Foo<{1 + 2}> would become Foo<const 1 + 2>. (This could mesh well with potential kind ascription syntax given HKTs, e.g. I could also imagine writing Bar<type Baz> or Bar<type<type> Quux> in the far future.)

WRT compile-time overflow, I think a simpler way to summarize the motivation is simply that (having accepted the integer overflow RFC) we already consider over/underflow to be a program error, and at compile time, it would be crazy not to check for it.

I’m not really comfortable with the where clauses part… while it seems like a reasonable approach, the tradeoffs relative to e.g. allowing arithmetic predicates (or just using that syntax to express the same things, for now) are not clear, and it seems more prudent to put those decisions off until later.

I like your proposal, as I think it makes sense to limit the effect of this feature for the moment. However, I would like to point out several points, where I deem it worthwhile to consider the backwards-compatibility of future extensions in the spirit of my original proposal.

First of all, overflow checking: in principle, I agree that it is a good idea. But then a + b means something different on the type level than normally. In my opinion this inconsistency should be avoided even more than overflow. Especially if you want to generalize this to further types, you’d always have to think about in how far your types will be treated differently on the type level. I’d rather look into some way of marking functions (incl. operators) safe for use at the type level somehow. On could introduce the concept of pure functions. But then again overflowing integer arithmetic is pure… maybe add an extra integer type that only allows checked addition?

You should also take into account, that the allowed expressions for array sizes currently don’t do overflow checking, e.g. the expression [0; 0us - 1us] is not checked and merely leads to “the type [i32; 18446744073709551615] is too big for the current architecture”. But [0; 0us - (0us - 1us)] on the other hand is perfectly valid. This makes adding overflow checking a breaking change compared to today’s Rust (likely similar to 1.0).

Where clauses: I think your proposal adds a strange special case with the range constraint. I would actually limit this to only the equality constraint, i.e. where N == 3. I’d also keep using == here instead of : to be consistent with general predicates, in case they are added in the future.

Edit: Thinking about that last point a little more: maybe we could allow a limited < and > as well, to provide the equivalent to the range feature?

1 Like

First of all, overflow checking: in principle, I agree that it is a good idea. But then a + b means something different on the type level than normally.

According to rfcs/text/0560-integer-overflow.md at master · rust-lang/rfcs · GitHub a + b overflowing is a program error at the value level as well, and will panic in debug builds.

This is a good point, and should also IMO be considered an error. But I'm not sure what the plan is - cc @nikomatsakis?