For my reference.

I write the singular value decomposition of a \(d_1\times d_2\) matrix \(\mathbf{B}\)

where we have unitary matrices \(\mathbf{Q}_1,\, \mathbf{Q}_1\) and a matrix,with non-negative diagonals \(\boldsymbol{\Sigma}\), of respective dimensions \(d_1\times d_1,\,d_1\times d_2,\,d_2\times d_2\).

The diagonal entries of \(\boldsymbol{\Sigma}\), written
\(\sigma_i(B)\) are the *singular values* of \(\mathbf{B}\).

For Hermitian \(\mathbf{H}\) matrices we may write an eigenvalue decomposition

for unitary \(\mathbf{Q}\) and diagonal matrix \(\boldsymbol{\Lambda}\) with entries \(\lambda_i(H}\) the eigenvalues.

## Spectral norm

TBD.

## Frobenius norm

Concincides with the \(\ell_2\) norm when the matrix happens to be a column vector.

We can define this in terms of the entries \(b_{jk}\) of \(\mathbf{B}\):

Equivalently, if \(\mathbf{B}\) is square,

If we have the SVD, we might instead use

## Schatten 1-norm

## Bregman divergence

TBD. Relation to exponential family and maximum likelihood.

Mark Reid: Meet the Bregman divergences:

If you have some abstract way of measuring the “distance” between any two points and, for any choice of distribution over points the mean point minimises the average distance to all the others, then your distance measure must be a Bregman divergence.