In the '60s they invented a programming language named APL, meant to process first of all tensors (n-dimensional arrays). Its syntax was arcane and baroque. It was dynamically typed, so the operations and tensor dimensions were verified at run-time (and often adapted in flexible ways). Later other persons have invented derived languages like K that share a similar semantics but with a less esoteric syntax (no language-specific glyphs). I think few persona tried to create a static type system for languages like APL, but I think such experiments didn't lead to long lasting developments. Much later people have invented Python libraries to process tensors, and such usage has become quite common in several scientific and technological fields (today there's NumPy).

Later people have created libraries, often in C++, to allow the creation and training of neural networks, and to run them on CPUs and GPUs. Now the use of such libraries has become common enough, from Python too. Meanwhile dynamic languages like Python, Ruby, JavaScript, PHP, Racket, and others have started to develop ways to add static typing (gradual typing) and to verify such type annotations (both in a sound and unsound ways. On such topics I remember many fun articles like: https://journal.stuffwithstuff.com/2010/10/29/bootstrapping-a-type-system/ or https://blog.acolyer.org/2016/02/05/is-sound-gradual-typing-dead/ ).

So lately people are trying to design a static verifier (again a form of gradual typing) for the shapes and size of tensors used by such neural network libraries used by Python (like Pyre). Some of such persons say they do not want to use dependent typing, I am not sure why, but they are close to that.

At first sight it may seem curious that advancements in the field of static (gradual) typing of such tensors doesn't come from the world of statically typed languages. But I think dynamic languages (or flexible mostly statically typed languages like Swift) get used a LOT for such machine learning purposes, so advancements are expected mostly were the real work is done. Another factor is that such dynamic languages are flexible, they can do almost anything, so it's simpler to put on top of them a static type system to restrict their flexibility, catch some bugs, and make the code easier to write and understand. So I think such type systems are simpler to design in languages as Python (but Haskell programmers may disagree with that), and later some of such ideas may be added to static languages as Swift and Rust (or partially static languages like Julia). I hope some of such ideas will become usable in Rust, eventually.

Another curious thing is that such advancements seem to come from the world of practical language implementations and not from the Academy where there are several (lot of) people very expert about advanced type systems.

Some links about such recent explorations, with slides, videos and more:

See also:

Some extracts:

```
# How to type this?
image: Tensor[64, 128, 3]
a = tf.reduce_mean(image, axis=0) # Shape: 128, 3
a = tf.reduce_mean(image, axis=1) # Shape: 64, 3
a = tf.reduce_mean(image, axis=2) # Shape: 64, 128
# Ideally:
def reduce_mean(x: Tensor[AxesSet], axis: int)
-> Tensor[AxesSet - {axis}]): ...
(But I think this is not exact, AxesSet needs to be an ordered sequence. And this looks close to being dependent typing.)
# For now:
@overload
def reduce_mean(x: Tensor[A0, A1, A2], axis: Literal[0])
-> Tensor[A1, A2]): ...
@overload
def reduce_mean(x: Tensor[A0, A1, A2], axis: Literal[1])
-> Tensor[A0, A2]): ...
...
# Variadics:
def add(x: Tensor1D[A], y: float) -> Tensor1D[A]
def add(x: Tensor2D[A,B], y: float) -> Tensor2D[A,B]
def add(x: Tensor3D[A,B,C], y: float) -> Tensor3D[A,B,C]
==>
Shape = ListVariadic("Shape", bound=int)
class Tensor(Generic[Shape]): ...
def add(x: Tensor[Shape], y: float) -> Tensor[Shape]
# Type Arithmetic:
def concat(x: Tensor[A], y: Tensor[B]) -> Tensor[A + B]
def _getitem_(t: Tensor[N], start: A, end: B) -> Tensor[B - A]
def range(start: A, end: B, step: S) -> Tensor[(B-A) //S]
def duplicate(t: Tensor[N]) -> Tensor[N * 2]
def flatten(t: Tensor[Shape]) -> Tensor[Prod[Shape]]
def _len_(t: Tuple[Ts]) -> Length[Ts]
# Convolution:
def __call__(t: Tensor[N, C_in, H_in, W_in]) -> Tensor[
N,
C_out,
1 + ((H_in + 2 * P - D * (K - 1)) - 1 // S),
1 + ((W_in + 2 * P - D * (K - 1)) - 1 // S),
]: ...
# Reduce operators on variadics:
def _len_(x: Tensor[Ts]) -> Length[Ts]
def flatten(x: Tensor[Ts]) -> Tensor[Prod[Ts]]
def view(x: Tensor[Ts1], a: L[ -1], *ts: Ts2) ->
Tensor[Prod[Ts1] //Prod[Ts2],Ts2]
# Equality:
# Tensor[A+B] and Tensor[B+A] are not the same type
# Any expression with addition and multiplication can be normalized to an ordered list of monomials.
# GCD on multivariate polynomials is not trivial.
x^2-1/x+1 -> (x+1)(x-1)/x-1 -> x+1
# Fully decidable equality would require Gröbner basis.
# For simplicity we limit ourselves to:
(2xy + 4xz)/(6x + 4x2) -> (y + 2z)/(3 + 2x)
```