The Living Thing / Notebooks :

Automatic differentiation

Getting your computer to tell you the gradient of a function, without resorting to finite difference approximation.

There seems to be a lot of stuff to know here; Infinitesimal/Taylor series formulations, and closely related dual number formulations, and even fancier hyperdual formulations. Reverse-mode, a.k.a. Backpropoagation, versus forward-mode etc. Computational complexity of all the above. But for special cases you can ignore most of this.

There is a beautiful explanation of the basics by Sanjeev Arora and Tengyu Ma.

You might want to do this for optimisation, batch or SGD, especially in neural networks, matrix factorisations, variational approximation etc. This is not news these days, but it took a stunningly long time to become common; see, e.g. Justin Domschke, Automatic Differentiation: The most criminally underused tool in the potential machine learning toolbox?.

See also symbolic mathematical calculators.


can automatically differentiate native Python and Numpy code. It can handle a large subset of Python’s features, including loops, ifs, recursion and closures, and it can even take derivatives of derivatives of derivatives. It uses reverse-mode differentiation (a.k.a. backpropagation), which means it can efficiently take gradients of scalar-valued functions with respect to array-valued arguments. The main intended application is gradient-based optimization.

This is the most pythonic of the choices here; not as fast as tensorflow but simple to use and can differentiate more general things than Tensorflow.

autograd-forward will mingle forward-mode differentiation in to calculate Jacobian-vector products and Hessian-vector products for scalar-valued loss functions, which is useful for classic optimization. AFAICT there are no guarantees about computational efficiency for these, but practically it’s often pretty good.

Another neural-net style thing like tensorflow, but with dynamic graph construction as in autograd.

allows you to differentiate functions implemented as computer programs by using Algorithmic Differentiation (AD) techniques in the forward and reverse mode. The forward mode propagates univariate Taylor polynomials of arbitrary order. Hence it is also possible to use AlgoPy to evaluate higher-order derivative tensors.

Speciality of AlgoPy is the possibility to differentiate functions that contain matrix functions as +,-,*,/, dot, solve, qr, eigh, cholesky.

Looks sophisticated, and indeed supports differentiation in an elegant way; but not so actively maintained, and the source code is hard to find.

a symbolic framework for numeric optimization implementing automatic differentiation in forward and reverse modes on sparse matrix-valued computational graphs. It supports self-contained C-code generation and interfaces state-of-the-art codes such as SUNDIALS, IPOPT etc. It can be used from C++, Python or Matlab

[…]CasADi is an open-source tool, written in self-contained C++ code, depending only on the C++ Standard Library. It is developed by Joel Andersson and Joris Gillis at the Optimization in Engineering Center, OPTEC of the K.U. Leuven under supervision of Moritz Diehl. CasADi is distributed under the LGPL license, meaning the code can be used royalty-free even in commercial applications.

Documentation is minimal; probably should read the source or the published papers to understand how well this will fit your needs and, e.g. which arithmetic operations it supports.

It might be worth it for such features as graceful support for 100-fold nonlinear composition, for example. But the price you pay is a weird DSL that you must learn to use it.

FYI there is an interesting discussion of its workings in the tensorflow jacobians ticket request