This still should be at least prototyped as a library if you have a concrete idea based on those research papers. To me adding this as a language feature is dubious at best, as it targets a very narrow field and it's not clear that the benefits outweigh the complexity. Meanwhile Rust has excellent metaprogramming capabilities that allow for easy and sophisticated construction of EDSLs with any desirable properties domain experts need.
My memory of those previous discussions and is that "why shouldn't this be a crate?" never actually got a meaningful response, and the blog posts I've read about Swift's implementation never mentioned anything that would require this to be a core language feature in Rust, or even benefit from it being a core language feature. I don't know Swift that well but I got the pretty strong impression that it just doesn't have as powerful metaprogramming as Rust does. Even one of the posts you linked to is literally arguing that there's no reason this couldn't be done in a crate.
Someone just needs to actually write a crate for this.
Building it would make it heavily adopted my the machine learning community. Since Swift still uses ARC garbage collection, it's not quite preferable in production, unlike C++.
If Rust would be interested to support this, it's a significant advantage that over the short-term to long-term the deep learning community would be switching over to Rust instead of C++.
Additionally, the benefit of (any) differential support is still far from been explained to people who don't immediately understand said benefit. Not having used it and not having spent the time to actually digest any research papers on it, I still have no idea what "automatic differentiation support for Rust" even means, let alone how it would be beneficial.
(And as far as I do understand, it's basically only useful for machine learning applications, which does not do a good job of arguing that it should be a first class feature in a general programming language that still has an embarrassing backlog of accepted features to flesh out and add/stabilize.)
Something like clad (as explained in the linked papers in the opening of this thread) should be able to be implemented as a crate.
With that settled, the real question is if there's anybody who wants to work on it. I'd be surprised if there were many, given how the previous thread died out. But maybe it's just an issue of organization.
Do we have any good spaces for trying to kickstart collaborative rust projects? I'd wager a guess that people skilled in macros don't have much overlap with people who really grok the auto-differentiation.
Allow me to write a full explanation on why this should be done via an internal compiler support instead of a crate. This is from my observation seeing the development of Theano, Pytorch, Tensorflow and the evolution of compiler graph -> source-to-source code transformation.
Give me two days at max. Please don't close the thread.
I think it would extremely helpful to have some kind of working, end-to-end example before writing up a proposal for a compiler change. This could be something very simple - e.g. a macro_rules! definition that only supports a few hardcoded types of functions. However, it should be a runnable example that demonstrates what the expected input and output will look like.
Please do, I'd be very interested to hear why support for AD needs direct language support in Rust rather than using eg a proc macro.
Because as I currently see it, AD requires only local analysis ie if you have the definition of an eligible function, then that is all you need in order to compute its derivative.
Contrast that with eg type checking which in general cannot be done locally because other type definitions generally need to be incorporated into the analysis.
Separate from the implementation details (which relate to whether it could just be a crate) it would be nice to see a proposed API. Knowing what is going to be accomplished would help with understanding the trade-offs. I've implemented a form of automatic differentiation, and there are huge differences between what different people mean by it. In particular, there is a huge difference between computing the derivative work respect to a few scalar variables versus the gradient with respect to a few huge arrays.
Edit: versus, I might add the challenge of finding and storing the derivative of a huge vector with respect to another huge vector, which isn't feasible simply because of memory limitations.