Native Differential Programming Support for Rust

Is there any plan to support automatic differentiation natively in Rust ala what Swift is doing ? I think it would really accelerate machine learning development in rust and welcome orgs to build their deep learning frameworks in Rust. As rust provides memory safety and basically aims to be a better C++, it would be a really awesome feature. I would love to know what the dev team are thinking about this.

Also, to cater to the deep learning folk, Rust would have to be a first class citizen for GPU support along with other neural accelerators (like TPUs). I recognise that the GPU support for CUDA and OpenCl is wanting (afaik).

I would just like to start this discussion, so that maybe the Rust team would give a long hard look at this. I think it would really drive up the popularity of the language among mainstream developers. People have a tough time coding in C++ and frameworks such as Tensorflow/ Pytorch have to deal with the inane bugs of C++ daily.

Differential Programming is I think here to stay and Rust could one of the pioneers. Swift is doing something similar and if one would like to know their progress, you can find it through this link here


If someone wants to read why the Google people chose Swift for automatic differentiation support. This official post is a great read.


What exactly do you mean by "native" support?

That post says:

We love Rust, but it has a steep learning curve that may exclude data scientists and other non-expert programmers who frequently use TensorFlow

I highly disagree with this sentiment — I think that if someone wants to do serious programming, one has to make an effort in order to learn whatever language is chosen; furthermore, subjectively, I don't find Swift the least bit "easier" than Rust. For example the tooling is a pain, and in particular while the language is touted for being cross-platform, the Linux experience is not exactly smooth.

However, this doesn't change the fact that this particular team at Google found Rust's usability worse than that of Swift, and the problem wasn't (or wasn't primarily) the lack of automatic differentiation.

1 Like

By native support I mean first class automatic differentiation support. That means one can calculate the gradients of variables of native types. That is one big hurdle that makes it very easy to code deep learning frameworks in. The second thing would be better GPU access API’s or wrappers around libraries like CUDA (which is a long way off, I imagine)

Although the Google team chose Swift, that doesn’t mean what they’re doing can’t be done for Rust. The developers themselves state in talks that their model towards building Swift for Tensorflow can be applied to other languages. They gave a great talk at a recent LLVM conference linked here

1 Like

I'm sorry but that's just another word. Do you mean it should somehow be built into the language? If so, how exactly? Or as a library? If not why not? I know what it's for; I'm interested in how you think it should be done.

Sure, and I don't believe I asserted anything like that.


A friend of mine who works on CUDA support for LLVM was just telling me about an automatic differentiation intrinsic he was glaring at, last night over dinner.

I suspect that what they're looking for is first-class support for the relevant LLVM intrinsic. I assume the only way this can be surfaced is by an attribute or Rust intrinsic that just emits the obvious LLVM IR.


Thanks, that makes sense.

1 Like

Not only that, but afaik Chris Lattner himself works in that group. Of course he'd pitch for Swift, his own language. Their opinion can't be more biased than this.


More to the point, it’s easier to enhance a language and compiler that they control.


What a funny "coincidence". I Looked at the history of the post and – yes – Chris Lattner is one of the authors. Who would have expected this outcome :stuck_out_tongue:


For those who want to check this out without watching the entire Swift for TensorFlow presentation, the part on automatic differentiation starts here and runs for about 7:30.

I suspect that Rust can achieve much of this through macros, perhaps using some of the techniques of uom to extend the concept space in an orthogonal direction.


One thing that deeply bothers me looking at the swift implementation:

public protocol VectorNumeric {
    associatedtype ScalarElement
    associatedtype Dimensionality
    init(_ scalar: ScalarElement)
    init(dimensionality: Dimensionality, repeating repeatedValue: ScalarElement)
    func + (lhs: Self, rhs: Self) -> Self
    func - (lhs: Self, rhs: Self) -> Self
    func * (lhs: Self, rhs: Self) -> Self

If you want to obtain the gradient of a function with respect to some vector type, it must implement this interface.

I assume the last three are overloaded operators. But elementwise multiplication of two vectors (func *) is not an operation that is recognized in the mathematical formalism of linear algebra. Most vector types in my application deliberately do not implement this operation, or they call it something like mul_diag (for multiplication by a diagonal matrix). In my entire 44kloc codebase, I only call this function once.

I would hope a rust implementation would not require me to compromise the mathematical integrity of my types in such a manner.


So in deep learning, element wise multiplication is actually performed a lot of times. People use these element wise operations in various loss functions, image convolutions. It’s actually pretty handy for researchers and I suspect it is needed in constructing the models. As an example, when two layers of different modalities (say image and text) are needed to be input into the network, they are generally concatenated or element wise added.


Hey, I’m sorry I don’t actually have a background in Rust. Hence, I’m unaware as to how this would even be implemented. I really like the principles on which Rust was build such as memory safety without the need of a garbage collector along with it’s great performance.

The technical details would best be answered by another person on what designs should be followed to implement this thing.


Notice that my issue is specifically with multiplication. Addition is fine.

As I see it, when working with real vectors of a fixed size (i.e. if we forget about non-square matrices), linear algebra provides the following core operations:

struct V3([f64; 3]);
struct M33([[f64; 3]; 3]);

impl V3 {
    fn zero() -> V3;

impl M33 {
    fn zero() -> M33;
    fn identity() -> M33;

// Scalar multiplication
impl Mul<f64> for V3 { ... }
impl Mul<f64> for M33 { ... }

// Matrix multiplication
impl Mul<M33> for V3 { ... } // V * M -> V
impl Mul<V3> for M33 { ... } // M * V -> V
impl Mul<M33> for M33 { ... } // M * M -> M

// Inner product (a.T * b)
fn dot(a: V3, b: V3) -> f64 { ... }

Note there are some catches when trying to generalize beyond this. For instance, in physics, we often work with complex vectors and define the inner product as a.conjugate().T * b. Physicists often treat this like an atomic operation, and in many contexts it is basically a “code smell” to see one operation performed without the other. (i.e. it’s something that usually needs justification).


Oh okay, thanks anyway.

1 Like

Fair enough.

1 Like

Basic Rust supports FOUR different kinds of signed and unsigned integer addition that are usually invoked via method syntax or macros: checked, overflowing, saturating, and wrapped. That’s in addition to the default + sigil for the Add trait on signed and unsigned integers, which panics on overflow in debug builds and acts like wrapped addition (i.e., ignoring overflow) in release builds.

There are many different types of products in mathematics, any of which could lay claim to the * sigil. The component-wise multiplication of equal-length vectors used in deep learning is yet another. Rust’s method syntax provides an unambiguous way to discriminate among these product algorithms. Macros can be used to override the default sigil meanings in a region of code, just as @lambda-fairy did years ago for Rust’s Wrapping trait.


Ehm, Hadamard product, no?


In fairness, the Hadamard Product is usually conceptualized for 2-dimensional matrices, but obviously it generalizes to N-dimensional ones, for which the deep-learning vector product is the 1-dimensional case.

1 Like