The Living Thing / Notebooks :

Compressed sensing / compressed sampling

The fanciest ways of counting zero

Stand by for higgledy-piggledy notes on the theme of exploiting sparsity to recover signals from few non-local measurements, given that we know they are nearly sparse, in a sense that will be made clear soon. This is another twist on classic sampling theory.

Sparse regression is closely related, but with a stochastic process angle.

See also matrix factorisations, random projections, optimisation, model selection, multiple testing, random linear algebra, concentration inequalities, restricted isometry properties.

Basic Compressed Sensing

I’ll follow the intro of CENR11, which tries to unify many variants.

We attempt to recover a signal \(x_k\in \mathbb{R}^d\) from \(m\ll n\) measurements \(y_k\) of the form

$$ y_k =\langle a_k, x\rangle + z_k,\, 1\leq k \leq m, $$

or, as a matrix equation,

$$ y = Ax + z $$

where \(A\) is the \(m \times d\) stacked measurement matrices, and the \(z\) terms denote i.i.d. measurement noise.

Now, if \(x\) is a sparse vector, and \(A\) satisfies an appropriate restricted isometry property, then we can construct an estimate \(\hat{x}\) with small error by minimising

$$ \hat{x}=\min \|\dot{x}\|_1 \text{ subject to } \|A\dot{x}-y\|_2 \lt \varepsilon, $$

where \(\varepsilon\gt \|z\|_2^2.\)

In the lecture notes on restricted isometry properties, Candès and Tao talk about not vectors \(x\in \mathbb{R}^d\) but functions \(f:G \mapsto \mathbb{C}\) on Abelian groups like \(G=\mathbb{Z}/d\mathbb{Z},\) which is convenient for some phrasing, since then when I say say my signal is \(s\)-sparse, which means that its support \(\operatorname{supp} \tilde{f}=S\subset G\) where \(|S|=s\).

In the finite-dimensional vector framing, we can talk about best sparse approximations \(x_s\) to non-sparse vectors, \(x\).

$$ x_s = \argmin_{\|\dot{x}\|_0\leq s} \|x-\dot{x}\|_2 $$

where all the coefficients apart from the \(s\) largest are zeroed.

The basic results are find attractive convex problems. There are also greedy optimisation versions, which are formulated as above, but no longer necessarily a convex optimisation; instead, we talk about Orthogonal Matching Pursuit, Iterative Thresholding and some other stuff the details of which I do not yet know, which I think pops up in wavelets and sparse coding.

For all of these the results tend to be something like

with data \(y,\) the difference between my estimate of \(\hat{x}\) and \(\hat{x}_\text{oracle}\) is bounded by something-or-other where the oracle estimate is the one where you know ahead of time the set \(S=\operatorname{supp}(x)\).

Candés gives an example result

$$ \|\hat{x}-x\|_2 \leq C_0\frac{\|x-x_s\|_1}{\sqrt{s}} + C_1\varepsilon $$

conditional upon

$$ \delta_2s(A) \lt \sqrt{2} -1 $$

where this \(\delta_s(\cdot)\) gives the restricted isometry constant of a matrix, defined as the smallest constant such that \((1-\delta_s(A))\|x\|_2^2\leq \|Ax\|_2^2\leq (1+\delta_s(A))\|x\|_2^2\) for all \(s\)-sparse \(x\). That is, the measurement matrix does not change the norm of sparse signals “much”, and in particular, does not null them when \(\delta_s \lt 1.\)

This is not the strongest bound out there apparently, but for any of that form, those constants look frustrating.

Measuring the restricted isometry constant of a given measurement matrix is presumably hard, although I haven’t tried yet. But generating random matrices that have a certain RIC with high probability is easy; that’s a neat trick in this area.

Redundant compressed sensing

TBD. For now see restricted isometry principles.

Introductory texts

…Using random projections

Classic. Notes to come.

Locality-Sensitive Hashing (LSH) is an algorithm for solving the approximate or exact Near Neighbor Search in high dimensional spaces.

…Using deterministic projections

Surely this is close to quasi monte carlo?

I blogged about constructing harmonic frames using difference sets. We proved that such harmonic frames are equiangular tight frames, thereby having minimal coherence between columns. I concluded the entry by conjecturing that incoherent harmonic frames are as good for compressed sensing as harmonic frames whose rows were randomly drawn from the discrete Fourier transform (DFT) matrix

recent work of Yves Meyer might be relevant:

These papers are interesting because their approach to compressed sensing is very different. Specifically, their sparse vectors are actually functions of compact support with sufficiently small Lebesgue measure. As such, concepts like conditioning are replaced with that of stable sampling, and the results must be interpreted in the context of functional analysis. The papers demonstrate that sampling frequencies according to a (deterministic) simple quasicrystal will uniquely determine sufficiently sparse functions, and furthermore, the sparsest function in the preimage can be recovered by L1-minimization provided it’s nonnegative.

That phase transition

How well can you recover a matrix from a certain number of measurements? In obvious metrics there is a sudden jump in how well you do with increasing measurements for a given rank. This looks a lot like a physical phase transition. Hmm.

See statistical mechanics of statistics.

Weird things to be classified

csgm, (BJPD17) compressed sensing using generative models, tries to find a model which is sparse with respect to… some manifold of the latent variables of… a generative model? or something?

Sparse FFT.