The Living Thing / Notebooks :

Automatic differentiation

Getting your computer to tell you the gradient of a function, without resorting to finite difference approximation.

There seems to be a lot of stuff to know here; Infinitesimal/Taylor series formulations and computational complexity. Reverse-mode, a.k.a. Backpropoagation, versus forward-mode etc. But for special cases you can ignore most of this.

There is a beautiful explanation of the basics by Sanjeev Arora and Tengyu Ma.

You might want to do this for optimisation, batch or SGD, especially in neural networks, matrix factorisations, variational approximation etc. This is not news these days, but it took a stunningly long time to become common; see, e.g. Justin Domschke, Automatic Differentiation: The most criminally underused tool in the potential machine learning toolbox?.

See also symbolic mathematical calculators.



Amari, S. (1998) Natural Gradient Works Efficiently in Learning. Neural Computation, 10(2), 251–276. DOI.
Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M. W., Pfau, D., Schaul, T., & de Freitas, N. (2016) Learning to learn by gradient descent by gradient descent. arXiv:1606.04474 [Cs].
Carpenter, B., Hoffman, M. D., Brubaker, M., Lee, D., Li, P., & Betancourt, M. (2015) The Stan Math Library: Reverse-Mode Automatic Differentiation in C++. arXiv Preprint arXiv:1509.07164.
Giles, M. B.(2008) Collected Matrix Derivative Results for Forward and Reverse Mode Algorithmic Differentiation. In C. H. Bischof, H. M. Bücker, P. Hovland, U. Naumann, & J. Utke (Eds.), Advances in Automatic Differentiation (pp. 35–44). Springer Berlin Heidelberg
Neidinger, R. (2010) Introduction to Automatic Differentiation and MATLAB Object-Oriented Programming. SIAM Review, 52(3), 545–563. DOI.