Focus of Rust evolution

Yes, I said it because I just thought about this workaround :slight_smile:

1 Like

Let's not forget that Rust is a very young language, and given time, I'm confident it will rise to the task in the number crunching department. So although the road may me long, there's light at the end of the tunnel.

With an RFC similar to "const bounds and methods" (which needs to be significantly rewritten) we can also almost entirely get rid of the ecosystem split between const and non-const.

I think we can have fixed size iterators using const generics, see: Fixed-size lazy iterators? - #4 by Centril Also, I'd like to see some form of ArrayVec in the standard library.

There's also "const repeat expressions" which recently got accepted.

I think the solution here should be more of dependent types (we'll get dependent types for const values, let's also get them for runtime values) and refinement types atop of that. Dependent types also gives the user an extreme amount of power to program the type system for comparatively little complexity, that is: the power / complexity ratio is small.

This BsC thesis discussing a compiler explorer for CakeML might be of interest, http://publications.lib.chalmers.se/records/fulltext/251308/251308.pdf

Instead of special syntax, we can instead introduce forms of effect polymorphism where you can say: x: Arr , Arr: unsafe Index<usize> and now you can use unsafe fn index(&self, index: Idx) -> &Self::Output; without changing the trait like so: arr[0] (see the const bounds and methods RFC for a discussion).

2 Likes

Just a quick note that I’d personally be quite interested to see a Numerics WG in 2019, hopefully after a couple more foundational features (like const generics) are available.

12 Likes

Would you be willing to elaborate on that a bit more (for those not fully familiar with "Dependent Types")?

I may be reading too much into your comment, but I think it's useful to clarify something about this:

Const generics lead to a very limited form of dependent types that are very far removed from full dependent typing. Though full dependent types do lead to a leap in expressiveness, I think it's completely untrue that this would lead to "comparatively little complexity", both in their implementation and in their understanding (for users inexperienced with dependent types, which is going to be most users).

I certainly think that increasing the expressiveness of the type system is a good thing, but I think this is significantly understating the complexity involved.

3 Likes

Certainly; But please do consult Dependent type - Wikipedia for a more elaborate review.

If we consider the question "Can X depend on X?" where X = {values, types}, and categorize languages by that then we get the following questions:

  • Can values depend on values?
    • If yes, we get the simply typed lambda calculus (λ→)
  • Can values depend on types?
    • If yes, we get System F, 2nd order lambda calculus (λ2), or just parametric polymorphism / generics, and we can encode: fn id<T>(x: T) -> T { x }
  • Can types depend on types?
    • If yes, we get λω or type operators where you can encode type Foo<X> = Vec<Vec<T>>; and specifically to rust, you can encode type level functions such as trait TypeFn<X> { type Y }.
  • Can types depend on values?
    • If yes, then you get (value) "dependent types".

So, what (value) dependent types such as in languages like Idris, Agda, Coq, F*, etc. allow is for types to be dependent on values. And so you could encode things such as (with Agda notation):

repeat : {A : Set} // Quantify the type of elements.
      -> (x : A)   // The element x : A we wish to repeat.
      -> (n : Nat) // The number of elements to repeat.
      // Notice how Vec a n  depends on the input n passed!
      -> Vec A n   // A list of exactly n As.

append : {A : Set}
      -> {n m : Nat}
      // Concatenating a list of n elements and one with m gives you n + m elements.
      -> Vec A n -> Vec A m -> Vec A (n + m)

index  : {A : Set}
      -> {n : Nat}
      -> {prfLess : idx < n} // Proof that: idx < n
      -> (idx: Nat) // The index we wish to extract.
      -> Vec A n
      -> A // We know that idx < n, so we can safely get an A out always.

This roughly corresponds to the following with pseudo-Rust notation:

fn repeat<A>(x: A, n: usize) -> Vec<A, n> {..}

fn append<A, n: usize, m: usize>(a: Vec<A, n>, b: Vec<A, m>) -> Vec<A, {n + m}> {..}

fn index<A, n: usize>(idx: usize, vec: Vec<A, n>) -> A where idx < n {..}

Thus far, we've only considered what are called pi-types, that is, types of the form: Π (x: A) B(x). But there are also so called sigma-types, or dependent sum types of the form Σ(x: A) B(x). With these sigma types, you can encode things such as (pseudo-Rust, so I'm quite heavily making things up):

struct AnyLenVec<A> {
    n: usize,
    v: Vec<A, n> // Notice how we are referring to the field `n`!
}

fn index_gvec<A>(idx: usize, vec: AnyLenVec<A>) -> Option<A> {
    if idx < vec.n {
        Some(index(idx, vec))
    } else {
        None
    }
}

Finally, I'd like to leave a few assorted resources on dependent typing:

Yes, I think you are.

I will agree to it being limited - and more so in the RFC than what const generics could be. You can still build some pretty powerful abstractions with it (cf. the fixed size iterators). The "only" difference, from a user's perspective, between constant and run-time dependent types is that the former does not allow types to be parameterized by run-time values. So the former allows you to encode Π (const x: A) B x while the latter allows you to encode Π (x: A) B x. This does of course translate into large differences in expressiveness.

I was only comparing the complexity of dependent typing relative to the gain in expressiveness, not the the complexity of dependent typing in absolute terms (which of course will be complicated), wherefore I think it is entirely true wrt. the ratio between power and complexity. I was not referring to complexity in understanding at all.

3 Likes

As quick answer: just a small part of the things discussed here are about what is usually called numerics, and only a part of the things here are going to be solved by the const generics feature.

1 Like

By the way, tuples in Rust is in very bad shape. It is deeply associated with fn_traits and make things like callback closure composition very hard if not possible at all.

Just a reminder that you can already emulate dependent types in Rust :slight_smile::

https://docs.rs/indexing/0.3.2/indexing/

In fact, I think it’s possible to take a much more general approach than that crate. Instead of special-casing the idea of a range that might or might not have been proved to be non-empty, go for arbitrary integer relations:

fn index<A: Slice<i32>, I: Usize>(a: &A, i: I, lt_proof: Lt<I, Len<A>>) { ... }

…Now I really want to write a proof-of-concept of this.

2 Likes

Oh noes :stuck_out_tongue: I think (?) this falls under the "type hackery" of the singletons hackage package that Haskell folks are using cause we don't have full dependent types in Haskell yet ^,- Tho the encodings in indexing are quite interesting.

Perhaps not just fixed sized arrays, but slices in general:

The thing that repeatedly annoys me is that you cannot index a slice with anything but a usize. I'm sure this has annoyed everyone at times, but using usize everywhere is often fine. In my specific application with hundreds of millions of indices, using u32 over usize is an important optimization, but it clutters code with conversions everywhere (in both directions). It would be much more convenient and tidier if there was an implicit as usize cast for indexing. That's what is always meant, so please save me the effort. This is easily accomplished by adding a few impls for std::slice::SliceIndex.

3 Likes

The danger (as usual with adding impls) is in inference. Nobody wants to be forced to v[0_usize] every time they index something with a literal, but that's exactly what happens if you impl both Index<usize> and Index<u32>. (And indexing by signed types is controversial.)

2 Likes

Could we make the indexing operator implicitly do a conversion to usize, and leave it up to the portability lint to warn/error/whatever when that’s potentially hazardous?

Some compilers have pragmas which basically state “please vectorize this loop if you can, and throw a compiler error otherwise”. Rust could probably take a similar approach for bounds checks, with some kind of #[elide_bound_checks] loop attribute.

I do like this idea on some level. From what I've heard, though, these sorts of heuristics tend to be very brittle from version to version of the compiler; most of the people I've spoken with who work with tools that try to automate proofs about things like bounds checks have gone out of their way to express their displeasure with anything that causes compilation failure nondeterministically. It seems like it would be a pretty polarizing feature. I wonder if it could be done as some sort of "lint" on the generated code (enabled for a given function or code block, maybe)? I'm pretty sure Rust doesn't support anything like that today, but it feels like other checks that only some people tend to go in third party lints...

1 Like

Do you mean some kind of #[warn(bounds_checks)] attribute that can be put on blocks (including loops)?

1 Like

Some time ago, I summarized all that I would do if I could change the Rust syntax. Considering Rust would get fully dependent types in the future, I imagined standard primitive types could be ranged and then checked at compile time.

Using a more conventional Rust syntax this would look like this:

Primitives with Range

Current primitive types would have their matching ranged types : ISize, I8, … (the old primitive types might be deprecated). All these integer types would be optionally parameterized by any kind of range. The type I32<1..=10> would be a 32 bit signed integer, that is compile time checked to only accept values from 1 to 10. If the range has unbound sides, the limit of these sides will be the hard bounds of the type. The type inference would automatically reduce the range on immutable variables.

Example:

let a = 10;                  // the actual type is I32<10..=10>
let mut b : I32<1..=10> = 4; // b is limited from 1 to 10
let mut c : I32<1..> = 1;    // c is limited from 1 to I32::max
let mut d = 20;              // the actual type is I32 (same as I32<..>)

Arithmetic

All usual arithmetic operators would check, at compile time, there can be no overflow.

The range of the result of an arithmetic operation would be determined by the range of the operands. For instance, an addition between two I32<-5..=5> would return a I32<-10..=10>. If one of the bounds of the result would be over the hard limits of the type, or if the range of a divisor contains 0, there would be a compile time error.

For operations that can't be checked at compile time there would be a whole new set of arithmetic operators: +?, -?, … that return a value with a range truncated to fit the hard limits. These operators would cause an early return (like the ? operator) if there is an overflow or division by zero.

There could be another set of arithmetic operators that panics instead. In my original text, I planned to use +!, -!, … but without the other syntax changes I was proposing, it would conflict with the negation operator. I'll use , , … .

Example (using variables from previous example):

let x = a + b;    // x is a I32<11..=20>
let x = a + c;    // compilation error : x would be I32<11..=I32::max+10>
                  // but the upper bound can't be over I32::max
let x = a +? c;   // Early return on overflow. x is a I32<11..>
let x = a +§ c;   // Panic on overflow. x is a I32<11..>
let x = a / b;    // compilation ok. x is a I32<1..=10>
let x = a / d;    // compilation error : the range of the divisor contains 0
let x = a /? d;   // x is I32<-10..=10>. Early return if d == 0. 

Array indexing

Using bounded integers allows to control array indexing at compile time. Like for arithmetic, there would be new syntax for runtime checked indexing : array[?index] that early return on overflow, and array[§index] that panics on overflow.

Example (using variables from previous example):

let array = [1,2,3,4,5,1,2,3,4,5,1,2,3,4,5]; // array is a [I32<1..=5>;15]
let x = array[b];       // ok since b is I32<1..=10> so inside array size
let x = array[b+10];    // compilation error: b+10 is I32<11..=20> so outside array size
let x = array[?b+10];   // Early return if overflow
let x = array[§b+10];   // Panic if overflow
8 Likes

Ahh yes, I thought of this shortly after posting. Bummer. Perhaps a separate RFC to annotate some impls as not contributing to inference but still contributing to typechecking would resolve this. Anyone familiar with the compiler know if inference/typechecking are distinct stages or if it would be possible to separate which impls are useable where?

I believe any investigations on tweaks to the trait system will only happen after the chalkification happens, to make it easier to understand the consequences of a change as well as to simplify actually implementing the change.

Just a quick note: There has been some discussions about HPC / numeric computing in these posts:

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.