Getting your computer to tell you the gradient of a function, without resorting to finite difference approximation. I am mostly interested here in the sense of automatic forward or reverse mode differentiation, which is not, as such, a symbolic technique, but symbolic differentiation gets an incidental lookin.
Infinitesimal/Taylor series formulations, and closely related dual number formulations, and even fancier hyperdual formulations. Reversemode, a.k.a. Backpropagation, versus forwardmode etc. Computational complexity of all the above.
There is a beautiful explanation of the basics by Sanjeev Arora and Tengyu Ma.
You might want to do this for optimisation, batch or SGD, especially in neural networks, matrix factorisations, variational approximation etc. This is not news these days, but it took a stunningly long time to become common; see, e.g. Justin Domke, who claimed Automatic Differentiation to be the most criminally underused tool in the machine learning toolbox?.
See also symbolic mathematical calculators.
Software

can automatically differentiate native Python and Numpy code. It can handle a large subset of Python’s features, including loops, ifs, recursion and closures, and it can even take derivatives of derivatives of derivatives. It uses reversemode differentiation (a.k.a. backpropagation), which means it can efficiently take gradients of scalarvalued functions with respect to arrayvalued arguments. The main intended application is gradientbased optimization.
This is the most pythonic of the choices here; not as fast as tensorflow but simple to use and can differentiate more general things than Tensorflow.
autogradforward will mingle forwardmode differentiation in to calculate Jacobianvector products and Hessianvector products for scalarvalued loss functions, which is useful for classic optimization. AFAICT there are no guarantees about computational efficiency for these, but practically it’s often pretty good.

Another neuralnet style thing like tensorflow, but with dynamic graph construction as in autograd.

allows you to differentiate functions implemented as computer programs by using Algorithmic Differentiation (AD) techniques in the forward and reverse mode. The forward mode propagates univariate Taylor polynomials of arbitrary order. Hence it is also possible to use AlgoPy to evaluate higherorder derivative tensors.
Speciality of AlgoPy is the possibility to differentiate functions that contain matrix functions as +,,*,/, dot, solve, qr, eigh, cholesky.
Looks sophisticated, and indeed supports differentiation elegantly; but not so actively maintained, and the source code is hard to find.
CasADi (Python, C++, MATLAB)
a symbolic framework for numeric optimization implementing automatic differentiation in forward and reverse modes on sparse matrixvalued computational graphs. It supports selfcontained Ccode generation and interfaces stateoftheart codes such as SUNDIALS, IPOPT etc. It can be used from C++, Python or Matlab
[…]CasADi is an opensource tool, written in selfcontained C++ code, depending only on the C++ Standard Library. It is developed by Joel Andersson and Joris Gillis at the Optimization in Engineering Center, OPTEC of the K.U. Leuven under supervision of Moritz Diehl. CasADi is distributed under the LGPL license, meaning the code can be used royaltyfree even in commercial applications.
Documentation is minimal; probably should read the source or the published papers to understand how well this will fit your needs and, e.g. which arithmetic operations it supports.
It might be worth it for such features as graceful support for 100fold nonlinear composition, for example. But the price you pay is a weird DSL that you must learn to use it.
ADOLC is a popular C++ differentiation library with python binding. Looks clunky from python but tenable from c++.
stan is famous for Monte Carlo, but also does deterministic optimisation using automatic differentiation. this is a luxurious option; But it is computationally expensive and ugly to invoke purely for the gradients unless you are using their inference loop, so it does not count as a general purpose autodiff library.
ad, which is based off uncertainties (and therefore python) also does it.
ceressolver, (C++), the google least squares solver, is pretty good at this although mostly focussed on leastsquares solutions to things.
Theano, (python) supports autodiff as a basic feature and had a massive user base, although it is now discontinued in favour of…
Tensorflow (python, C++, go, java) is the same deal, has a massive user base plus the backing of Google.
FYI there is an interesting discussion of its workings in the tensorflow jacobians ticket request
Symbolic math packages such as Sympy, MAPLE and mathematica can all do actual symbolic differentiation, which is different again, but sometimes leads to the same thing. I haven’t tried Sympy or MAPLE, but Mathematica’s support for matrix calculus is weak.
autodiff
, which is usually referred to as audi for the sake of clarity, offers light automatic differentiation for MATLAB.juliadiff has implemented forward and reverse mode autodiff, plus verious other lesscommondly seen flavours such as dual numbers and hyperdual numbers, which are discussed at julia.
Refs
 ADGH16: Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W. Hoffman, David Pfau, Tom Schaul, Nando de Freitas (2016) Learning to learn by gradient descent by gradient descent. ArXiv:1606.04474 [Cs].
 Amar98: Shunichi Amari (1998) Natural Gradient Works Efficiently in Learning. Neural Computation, 10(2), 251–276. DOI
 BaPe14: Atilim Gunes Baydin, Barak A. Pearlmutter (2014) Automatic Differentiation of Algorithms for Machine Learning. ArXiv:1404.7456 [Cs, Stat].
 BaPS16: Atılım Güneş Baydin, Barak A. Pearlmutter, Jeffrey Mark Siskind (2016) Tricks from Deep Learning. ArXiv:1611.03777 [Cs, Stat].
 BPRS15: Atilim Gunes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, Jeffrey Mark Siskind (2015) Automatic differentiation in machine learning: a survey. ArXiv:1502.05767 [Cs].
 CHBL15: Bob Carpenter, Matthew D. Hoffman, Marcus Brubaker, Daniel Lee, Peter Li, Michael Betancourt (2015) The Stan Math Library: ReverseMode Automatic Differentiation in C++. ArXiv Preprint ArXiv:1509.07164.
 FiAl11: Jeffrey Fike, Juan Alonso (2011) The Development of HyperDual Numbers for Exact SecondDerivative Calculations. In 49th AIAA Aerospace Sciences Meeting including the New Horizons Forum and Aerospace Exposition. Orlando, Florida: American Institute of Aeronautics and Astronautics DOI
 FiSa18: Keno Fischer, Elliot Saba (2018) Automatic Full Compilation of Julia Programs and ML Models to Cloud TPUs. ArXiv:1810.09868 [Cs, Stat].
 Gile08: Mike B. Giles (2008) Collected Matrix Derivative Results for Forward and Reverse Mode Algorithmic Differentiation. In Advances in Automatic Differentiation (Vol. 64, pp. 35–44). Berlin, Heidelberg: Springer Berlin Heidelberg
 GrWa08: Andreas Griewank, Andrea Walther (2008) Evaluating derivatives: principles and techniques of algorithmic differentiation. Philadelphia, PA: Society for Industrial and Applied Mathematics
 Inne18: Michael Innes (2018) Don’t Unroll Adjoint: Differentiating SSAForm Programs. ArXiv:1810.07951 [Cs].
 MaDA15: Dougal Maclaurin, David K. Duvenaud, Ryan P. Adams (2015) Gradientbased Hyperparameter Optimization through Reversible Learning. In ICML (pp. 2113–2122).
 Neid10: R. Neidinger (2010) Introduction to Automatic Differentiation and MATLAB ObjectOriented Programming. SIAM Review, 52(3), 545–563. DOI
 Neue18: Martin Neuenhofen (2018) Review of theory and implementation of hyperdual numbers for first and second order automatic differentiation. ArXiv:1801.03614 [Cs].
 Rall81: Louis B. Rall (1981) Automatic differentiation: techniques and applications. Berlin ; New York: SpringerVerlag
 RuHW86: David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams (1986) Learning representations by backpropagating errors. Nature, 323(6088), 533–536. DOI