Tensors static typing

[EDIT: somehow, the thread hadn't showed me a lot of the more recent posts when I made this post, so apologies for repeating some things others have said]

Interesting example… We have both conciseness issues and performance issues to think about here.

I had a look at this in Godbolt [Edit: forgot to include Godbolt link]. Note that the assembly for foo and foo2 are identical (i.e. the runtime check for the try_into was already optimized out), but both versions still have a runtime check for the ordering of the 2 indices, which... hmm. Which can't be optimized out in the language as-is, because i + 4 can do wrapping overflow in release builds.

Maybe what you want is a function like (don't mind the awkward implementation):

impl<T> [T] {
  pub fn get_array_mut<const LEN: usize>(&mut self, i: usize) -> &mut [T; LEN] {
    if i.checked_add(LEN).unwrap() < self.len() {panic!()}
    unsafe{
      match std::slice::from_raw_parts_mut(self.as_mut_ptr().add(i), LEN).try_into() {
        Ok(b) => b,
        Err(_) => std::hint::unreachable_unchecked()
      }
    }
  }
}

I've used this to implement foo3 in my godbolt link, and it indeed appears to save one cmp instruction, even despite the awkward double-check on the first line.

Once the min_const_generics feature stabilizes, it should be straightforward to implement a clean version of this in the standard library.

I also have PR #76635 open which would allow that as .as_chunks().0[0] which is shorter, if not really more obvious.

You could consider a PR to make it just a[i+1..].prefix_array_mut().unwrap() or whatever.

That said, I'm not sure that any of these are much better than the [..3].try_into().unwrap() versions -- they're just moving around the bounds checks and panics.

1 Like

Yes, but it is compile dependent typing something akin to Dlang.

Compile time dependent typing is unfortunately not easier to decide than runtime dependent typing when tied with specialization. It doesn't suffice to evaluate the bounds of each trait method when more than one method applies to the input. In this case you need to solve a constraint system which is maybe of exponential complexity and the user doesn't always see this immediately and wonders why the compiler is so slow.

For the above reason, Dlang doesn't support specialization over value expressions not equal to simple membership expressions (like trait bounds for instance).

The other point is what did you do if your vector may be of length 3 or 4, or it may depends non-deterministally on foreign events requiring compile time parametrized vector lengths.

Hi, wouldn't const-generics allow you to write function signatures you desired?
Provided all the sizes are known at compile time..

I don't know how much fast this is in practice.

I already came up with an idea to solve the speed problem here. Put in an extra compiler flag that specifies how long the compiler can take to execute the SAT solver; once the time is up, no more SAT solving. Instead, you emit less optimized code that is known to be correct, but possibly slower than fully optimized code.

Another problem we could face is operator overloading, e.g. a heavy product computation of two mathematic structures implemented with while unintended including the option to loop forever prolonging compilation time infinitely.

The main problem is, even if you can see your own macros, others cannot view your macros. You compile your lib with calls to a const generic constrained function which are all finitely solvable. Another Guy calls your lib with a different const value breaking the compilation. This Guy may not understand why it doesn't work and may can't view your macro as it is enclosed in a private module.

Turning to runtime checking doesn't help either in case of specialization unless you change the dispatch strategy in this case.

But the compiler still has to execute my macros, even if they are private to my module, correct? That means that the compiler can still emit warnings or errors when the time limit is exceeded, correct? I agree though that even in that case trying to figure out why your own code is causing the compiler to fail is still going to be a real headache...

I'll have to trust you on that; I don't know enough to comment on this one way or another.

Knowing all sizes is too strict in practice. In most cases, we desire our algorithm can process images with adaptive sizes and with fixed number of channels. Also it's not possible to define compile-time restrictions like IsOdd or IsPrime for const generics. It is still far away from building a truely static API.

In contrast to adding static checkers in dynamic typed language, Rust goes on the reverse direction of gradual typing. It relaxes type checks when some information is missing in compile time and leave it to runtime checkers. Though dependent typing can somewhat a solution, we cannot have it because it violates the Rust's design principals, and we don't want type checkers in runtime. Instead, we can combine a Dyn type with Rust's generics. That's the way we can achieve a weak form of gradual typing. The code below illustrates the basic idea.

use typenum::UTerm;  // a number type standing for zero
struct Dyn(usize);

impl Add<UTerm> for Dyn {
    type Output = Dyn;
    fn add(self, _rhs: UTerm) -> Self::Output { /* runtime checker goes here */ }
}

Towards building the static tensor API, the core issues can be arranged into two threads.

  • How to encode complex logic in types. For example, can we statically compute the dimensions of tensor product of two tensors?
  • Dispatch b/w static checkers and runtime checkers (at best, without runtime overheads).

The first issue can be somewhat solved by type operators. Those interested can look at my typ to see how it's possible in Rust.

For the second issue, we could interpolate runtime checkers by dispatching UTerm + UTerm and UTerm + Dyn into different impl blocks, and insert checkers accordingly. I did an attempt in my type-vec. However, once the types become more complex, writing in this way is in fact a tedious process. I would rather think of an approach that we insert runtime provers only when needed in a middle of complex type operations.

Also, there is a thread in tch-rs talking about static tensor types.

Interesting

I think that this will be possible someday with const generics, when the logic only involves addition and multiplication.

I think too, that this will happen someday with const generics but there are other problems incured by const generics and dispatching, see this thread.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.