The Living Thing / Notebooks :

Numerical libraries

There are too many numerical libraries.

Here are some I am sifting through, skewed towards python-compatible ones.

General numerics

To mention: the LAPACK/Fortran ecology. (Scott Locklin’s summary will do, actually)

Most dense linear algebra work presently happens in something called LAPACK. If you want to calculate eigenvectors in Matlab, Incanter/Clojure, Python, R, Julia and what have you: it’s almost certainly getting piped through LAPACK. If you’re working in C, C++ or Fortran, you’re still probably doing in via LAPACK. If you’re doing linear algebra in Java, you are a lost soul, and you should seriously consider killing yourself, but the serious people I know who do this, they allocate blobs of memory and run LAPACK on the blobs. The same thing was true 25 years ago. In fact, people have basically been running their numerics code on LAPACK for over 40 years when it was called EISPACK and LINPACK. Quite a lot of it conforms with… Fortran 66; a language invented 50 years ago which is specifically formatted for punched cards. That is an astounding fact.

LAPACK is a sort of layercake on top of something called the BLAS (“Basic Linear Algebra Subroutines”). I think BLAS were originally designed as a useful abstraction so your Eigendecomposition functions have more elementary operations in common with your QR decomposition functions. This made LAPACK better than its ancestors from a code management and complexity point of view. It also made LAPACK faster, as it allowed one to use machine optimized BLAS when they are available, and BLAS are the type of thing a Fortran compiler could tune pretty well by itself, especially on old-timey architectures without complex cache latency. While most of these functions seem boring, they are in fact extremely interesting. Just to give a hint: there is a function in there for doing matrix-multiply via the Winograd version of the Strassen algorithm, which multiplies matrices in O(N^2.8) instead of O(N^3). It’s the GEMMS family of functions if you want to go look at it. Very few people end up using GEMMS, because you have to be a numerical linear algebraist to understand when it is useful enough to justify the O(N^0.2) speedup. For example, there are exactly no Lapack functions which call GEMMS. But GEMMS is there, implemented for you, just waiting for the moment when you might need to solve … I dunno, a generalized Sylvester equation where Strassen’s algorithm applies.

There are so many – Intel MKL, Boost uBLAS etc. Where does GSL fit in?

One of the remarkable features is that they are all faster than all the other ones. Or rather, the easiest to find benchmarks for each tend to favour what they do best. In my inexpert understanding this means that Blaze does simple operations over very big arrays fastest, Armadillo does fused arrays fast-ish, ArrayFire does… everything pretty fast AFAICT. Maybe it has crappy licensing? IDK. Eigen seems to be not especially fast but must be easy to plug in to you existing code because it’s gotten everywhere. So has Armadillo, FWIW. Also what you intend to run it on is important – not everything supports all platforms, SIMD, threads etc.


Intel’s Matrix Kernel library is supposed to get peak performance out of various intel CPUs. I’ve used it via python and julia. In neither case has it been worthwhile; It is gigantic. It possibly speeds things up marginally, but not noticably, and it tends to cause all your gigantically, MKL-bloated scripts to crash all the time. Maybe if you needed to squeeze an extra few percentage points out of some reasonably refined codebase? But not scripting-happy for me so far.


Elemental is a hyped new entrant, supporting the newer things that modern matrix manipulation necessitates, including built-in sparse/compressive support, NNMF etc. It has lots of very fancy features in addition to basic matrix manipulations.

Elemental is an open-source library for distributed-memory dense and sparse-direct linear algebra and optimization which builds on top of BLAS, LAPACK, and MPI using modern C++ and additionally exposes interfaces to C and Python (with a Julia interface beginning development).

[…]Elemental supports a wide collection of sequential and distributed-memory operations, including support for dense and sparse-direct linear algebra, Linear, Quadratic and Second-Order Cone Programming, and lattice reduction. Furthermore, it supports such functionality for real and complex single-precision, double-precision, “double-double”, “quad-double”, quad-precision, and arbitrary-precision floating-point arithmetic.

The C++11 API is by far the most complete, but a large percentage of the library is also exposed to C and Python interfaces. Please see the README for an up-to-date list of unique functionality.

The development of Elemental has led to a number of research articles and is incorporated into a variety of scientific (e.g., libSkylark, PETSc, and CVXPY) and industrial projects (e.g., within Finite Element companies).

The putative homepage is out of date, but there is an intimidating list of features and dependencies in the source code repo.

Notably, the Python API is very complete.


Armadillo is a popular C++ wrapper around the LAPACK tools, providing lazy automagic graph optimisation of operations, or at least, that’s what some earnest guy told me at a seminar. Super popular for less horrible matrix stuff in R.


Arrayfire is a CPU/OpenCL/GPU array operations library, open-sourced by a DARPA contractor, who are still spruiking:

The ArrayFire library is our flagship product. It is a general-purpose computational tool that simplifies the process of developing software that targets parallel and massively-parallel architectures including CPUs, GPUs, and other hardware acceleration devices. ArrayFire provides software developers with a high-level abstraction of data which resides on the accelerator, the af::array object (or C-style struct). Developers write code which performs operations on ArrayFire arrays which, in turn, are automatically translated into near-optimal kernels that execute on the computational device. ArrayFire includes support for supports CPUs from all major vendors (Intel, AMD, Arm), GPUs from the dominant manufacturers (NVIDIA, AMD, and Qualcomm), as well as a variety of other accelerator devices. ArrayFire has been used successfully on devices ranging from low-power mobile phones to high-power GPU-enabled supercomputers.

It has python and julia backends. Remarkably the julia one is asynchronous, which is nifty.


Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms. Notable for being used in various other things e.g. Tensorflow’s CPU backend and Stan the Monte Carlo thingy.


Blaze is another bloody one, which seems to have threaded computation in particular as its USP, and not caring about backward-compatibility (which is both plus and minus.)

Blaze is an open-source, high-performance C++ math library for dense and sparse arithmetic. With its state-of-the-art Smart Expression Template implementation Blaze combines the elegance and ease of use of a domain-specific language with HPC-grade performance, making it one of the most intuitive and fastest C++ math libraries available.

The Blaze library offers …

Automatic differentiation

See automatic differentiation


“Solvers” for large systems of equations are not necessarily distinct from optimisation. But there are differences in general assumptions to finding an optimal vector solution to an equation which is presumably smaller than your data, and finite element methods and PDE wrangling, where the solution might be much larger than your data.

See solving all my problems.

GPU computation

Tensorflow again, numba, theano again, Arrayfire etc… See GPU computation.