A few months ago, I summed up the state of Machine Learning in Rust. Since then Rust hasn't improved much in the required areas listed and the trend of Machine Learning / Deep Learning system design has gone more and more into compiler by leading language designers and compiler engineers such as in tensorflow-swift first class support (point of this proposal). There're some libraries in julia/Zygote or tvm that have implemented this idea either in tvm DSL or because julia allows such IR manipulations. For the record, this came up before but the difference is there's an actual implementation. See Swift differential programming mega-proposal.
A bird's-eye view of the matter is as follows:
The central data structure in ML/DL is multi-dimensional array aka ndarray aka tensor. An ndarray can hold some fixed data or some parameters that we want to fit to the data. A layer has some parameters and may have some data and it actually behaves like a function (for some reason, it has taken years for people to realize that). So basically we have some ndarrays and some functions expressing a DAG of computations.
To be able to find a good set of parameters, we need to do Mathematical optimization. In case of Deep learning, it's a form of gradient descent. To able to do gradient descent we need to assume our involved functions are (almost everywhere) differentiable. When we calculate derivatives, we go through iterative updates of parameters and hoping our "model" has learned something.
The status quo approach for finding derivatives is Automatic Differentiation and for efficiency is done by constructing the reverse DAG of expressions aka adjoint DAG, so computing the derivatives backwards aka backpropagation. The leading idea is supporting differentiation at compile time aka differential programming.
What does it mean in Rust and for Rust?
The sketch means supporting something like (slightly misleading since we don't have good syntax for it)
#[differentiable]
fn f(x: Tensor<f32>, w: Tensor<f32>, b: Tensor<f32>) -> Tensor<f32> {
return x * w + b // matrix multiplication and addition
}
where w, b
are parameters and x
is our data and the compiler finds df/dw
and df/db
at x
that expands to
fn df_dw(x: Tensor<f32>, w: Tensor<f32>) -> Tensor<f32> {
return x // or transpose of x to keep it column oriented
}
Or better idea for expansion is through closure-like
#[derive(Differentiation])
struct Linear {
#[differentiable]
w: Tensor<f32>,
#[differentiable]
b: Option<Tensor<f32>>,
}
impl FnOnce<Args> for Linear { ... }
let f = |x: Tensor<f32>| Linear(x);
and compiler finds df_dw
and df_db
where they're
let df_dw = |x| x;
let df_db = |x| I; // the identity_tensor
More importantly, this is not only about neural nets. Such supports can close the existing huge gap in Rust for Scientific Computing in general and in particular in areas where solving differential equations comes into place (finance, weather prediction etc.) and I believe will open up much greater possibilities for Rust.
I know this would entail a huge undertaking in rustc, but this would server at least as a reference if community wants to do it in future.
Some other useful references: