Restricted isometry properties, a.k.a. uniform uncertainty principles (CaTa05, CaRT06), mutual incoherence (DoET06), irrepresentability conditions (ZhYu06)…

This is mostly notes while I learn some definitions; expect no actual thoughts.

Recoverability conditions, as seen in sparse regression, sparse basis dictionaries, function approximation, compressed sensing etc. Uncertainty principles for a sparse world.

Terry Tao mentions the various related conditions for the compressed sensing problem, and which types of random matrices satisfy them. The chatty lecture notes on uniform uncertainty look nice.

## Restricted Isometry

The compressed sensing formulation.

The restricted isometry constant of a matrix \(A\), is the smallest constant \(\delta_s\) \((1-\delta_s(A))\|x\|_2^2\leq \|Ax\|_2^2\leq (1+\delta_s(A))\|x\|_2^2\) for all \(s\)-sparse \(x\). That is, the measurement matrix does not change the norm of sparse signals “too much”, and in particular, does not null them when \(\delta_s \lt 1.\)

## Irrepresentability

The set up is a little different for regression-type problems, which is where “representability” comes from. Here we care also about the *design*, roughly, the dependence of covariates we actually observe, and the noise distribution.

ZhYu06 present an abstract condition called strong irrepresentability, which guarantees asymptotic sign consistency of selection. See also MeBü06, who call this *neighborhood stability*, which is even less catchy.

More recently MeYu09 extend this (and explain the original irrepresenatability more clearly IMO):

Here we examine the behavior of the Lasso estimators if the irrepresentable condition is relaxed. Even though the Lasso cannot recover the correct sparsity pattern, we show that the estimator is still consistent in the l2-norm sense for fixed designs under conditions on (a) the number \(s_n\) of nonzero components of the vector \(\beta_n\) and (b) the minimal singular values of design matrices that are induced by selecting small subsets of variables. Furthermore, a rate of convergence result is obtained on the l2 error with an appropriate choice of the smoothing parameter.

They do a good gob of uniting prediction-error and model-selection consistency approaches. In fact, I will base *everything* off MeYu09, since not only is the prose more lucid, it gives the background to the design assumptions and relaxation of coherence.

TBC.

## Incoherence

A Basis-Pursuit noise-free setting.

We can think of the atoms in our dictionary as columns in a matrix \(\Phi\), so that \(\Phi\) is \(n\) by \(m\) and \(m \gt n.\). A representation of \(y\in\mathbb{R}^n\) can be thought of as a vector \(\alpha\in\mathbb{R}^m\) satisfying \(y=\Phi\alpha.\)

The concept of mutual coherence of the dictionary […] is defined, assuming that the columns of are normalized to unit \(\ell^2\)-norm, in terms of the Gram matrix \(G=\Phi^T\Phi\). With \(G(k,j)\) denoting entries of this matrix, the mutual coherence is

\[ M(\Phi) = \max_{1\leq k, j\leq m, k\neq j} |G(k,j)| \]

A dictionary is

incoherentif \(M\) is small.

## Frame theory

Something I see mentioned in the various conditions above, but don’t really understand.

Morgenshtern and Bölcskei (MoBö11):

Hilbert spaces [1, Def. 3.1-1] and the associated concept of orthonormal bases are of fundamental importance in signal processing, communications, control, and information theory. However, linear independence and orthonormality of the basis elements impose constraints that often make it difficult to have the basis elements satisfy additional desirable properties. This calls for a theory of signal decompositions that is flexible enough to accommodate decompositions into possibly nonorthogonal and redundant signal sets. The theory of frames provides such a tool. This chapter is an introduction to the theory of frames, which was developed by Duffin and Schaeffer [DuSc52] and popularized mostly through [Daub90, Daub92, HeWa89, Youn01]. Meanwhile frame theory, in particular the aspect of redundancy in signal expansions, has found numerous applications such as, e.g., denoising, code division multiple access (CDMA), orthogonal frequency division multiplexing (OFDM) systems, coding theory, quantum information theory, analog-to-digital (A/D) converters, and compressive sensing [DoEl03, Dono06, CaTa06]. A more extensive list of relevant references can be found in [KoCh08]. For a comprehensive treatment of frame theory we refer to the excellent textbook [Chri16].

## Refs

Restricted isometry properties, a.k.a. uniform uncertainty principles (CaTa05, CaRT06), mutual incoherence (DoET06), irrepresentability conditions (ZhYu06)…

This is mostly notes while I learn some definitions; expect no actual thoughts.

Recoverability conditions, as seen in sparse regression, sparse basis dictionaries, function approximation, compressed sensing etc. Uncertainty principles for a sparse world.

Terry Tao mentions the various related conditions for the compressed sensing problem, and which types of random matrices satisfy them. The chatty lecture notes on uniform uncertainty look nice.

## Restricted Isometry

The compressed sensing formulation.

The restricted isometry constant of a matrix \(A\), is the smallest constant \(\delta_s\) \((1-\delta_s(A))\|x\|_2^2\leq \|Ax\|_2^2\leq (1+\delta_s(A))\|x\|_2^2\) for all \(s\)-sparse \(x\). That is, the measurement matrix does not change the norm of sparse signals “too much”, and in particular, does not null them when \(\delta_s \lt 1.\)

## Irrepresentability

The set up is a little different for regression-type problems, which is where “representability” comes from. Here we care also about the *design*, roughly, the dependence of covariates we actually observe, and the noise distribution.

ZhYu06 present an abstract condition called strong irrepresentability, which guarantees asymptotic sign consistency of selection. See also MeBü06, who call this *neighborhood stability*, which is even less catchy.

More recently MeYu09 extend this (and explain the original irrepresenatability more clearly IMO):

Here we examine the behavior of the Lasso estimators if the irrepresentable condition is relaxed. Even though the Lasso cannot recover the correct sparsity pattern, we show that the estimator is still consistent in the l2-norm sense for fixed designs under conditions on (a) the number \(s_n\) of nonzero components of the vector \(\beta_n\) and (b) the minimal singular values of design matrices that are induced by selecting small subsets of variables. Furthermore, a rate of convergence result is obtained on the l2 error with an appropriate choice of the smoothing parameter.

They do a good gob of uniting prediction-error and model-selection consistency approaches. In fact, I will base *everything* off MeYu09, since not only is the prose more lucid, it gives the background to the design assumptions and relaxation of coherence.

TBC.

## Incoherence

A Basis-Pursuit noise-free setting.

We can think of the atoms in our dictionary as columns in a matrix \(\Phi\), so that )) is \(n\( by \(m\) and \(m \gt n.\). A representation of \(y\in\mathbb{R}^n\) can be thought of as a vector \(\alpha\in\mathbb{R}^m\) satisfying \(y=\Phi\alpha.\)

The concept of mutual coherence of the dictionary […] is defined, assuming that the columns of are normalized to unit \(\ell^2\)-norm, in terms of the Gram matrix \(G=\Phi^T\Phi\). With \(G(k,j)\) denoting entries of this matrix, the mutual coherence is

\[ M(\Phi) = \max_{1\leq k, j\leq m, k\neq j} |G(k,j)| \]

A dictionary is

incoherentif \(M\) is small.

## Frame theory

Something I see mentioned in the various conditions above, but don’t really understand.

Morgenshtern and Bölcskei (MoBö11):

Hilbert spaces [1, Def. 3.1-1] and the associated concept of orthonormal bases are of fundamental importance in signal processing, communications, control, and information theory. However, linear independence and orthonormality of the basis elements impose constraints that often make it difficult to have the basis elements satisfy additional desirable properties. This calls for a theory of signal decompositions that is flexible enough to accommodate decompositions into possibly nonorthogonal and redundant signal sets. The theory of frames provides such a tool. This chapter is an introduction to the theory of frames, which was developed by Duffin and Schaeffer [DuSc52] and popularized mostly through [Daub90, Daub92, HeWa89, Youn01]. Meanwhile frame theory, in particular the aspect of redundancy in signal expansions, has found numerous applications such as, e.g., denoising, code division multiple access (CDMA), orthogonal frequency division multiplexing (OFDM) systems, coding theory, quantum information theory, analog-to-digital (A/D) converters, and compressive sensing [DoEl03, Dono06, CaTa06]. A more extensive list of relevant references can be found in [KoCh08]. For a comprehensive treatment of frame theory we refer to the excellent textbook [Chri16].

## Refs

- DuSc52: (1952) A Class of Nonharmonic Fourier Series.
*Transactions of the American Mathematical Society*, 72(2), 341–366. DOI - MoBö11: (2011, April 21)
*A Short Course on Frame Theory* - BDDW08: (2008) A Simple Proof of the Restricted Isometry Property for Random Matrices.
*Constructive Approximation*, 28(3), 253–263. DOI - KoCh08: (2008) An Introduction to Frames.
*Foundations and Trends® in Signal Processing*, 2(1), 1–94. DOI - Chri16: (2016)
*An Introduction to Frames and Riesz Bases*. Cham: Springer International Publishing DOI - Youn01: (2001)
*An Introduction to Non-Harmonic Fourier Series, Revised Edition, 93*. Academic Press - BCDD08: (2008) Approximation and learning by greedy algorithms.
*The Annals of Statistics*, 36(1), 64–94. DOI - Dono06: (2006) Compressed sensing.
*IEEE Transactions on Information Theory*, 52(4), 1289–1306. DOI - CENR11: (2011) Compressed sensing with coherent and redundant dictionaries.
*Applied and Computational Harmonic Analysis*, 31(1), 59–73. DOI - DuBS17: (2017) Computationally Efficient Robust Estimation of Sparse Functionals. In ICML.
- HeWa89: (1989) Continuous and Discrete Wavelet Transforms.
*SIAM Review*, 31(4), 628–666. DOI - CaTa05: (2005) Decoding by linear programming.
*IEEE Transactions on Information Theory*, 51(12), 4203–4215. DOI - FoSr11: (2011) Fast-rate and optimistic-rate error bounds for L1-regularized regression.
*ArXiv:1108.0373 [Math, Stat]*. - Cand99: (1999) Harmonic Analysis of Neural Networks.
*Applied and Computational Harmonic Analysis*, 6(2), 197–218. DOI - RWRY11: (2011) High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence.
*Electronic Journal of Statistics*, 5, 935–980. DOI - MeBü06: (2006) High-dimensional graphs and variable selection with the lasso.
*The Annals of Statistics*, 34(3), 1436–1462. DOI - DDFG10: (2010) Iteratively reweighted least squares minimization for sparse recovery.
*Communications on Pure and Applied Mathematics*, 63(1), 1–38. DOI - MeYu09: (2009) Lasso-type recovery of sparse representations for high-dimensional data.
*The Annals of Statistics*, 37(1), 246–270. DOI - BCDH10: (2010) Model-based compressive sensing.
*IEEE Transactions on Information Theory*, 56(4), 1982–2001. DOI - Zhan10: (2010) Nearly unbiased variable selection under minimax concave penalty.
*The Annals of Statistics*, 38(2), 894–942. DOI - CaTa06: (2006) Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?
*IEEE Transactions on Information Theory*, 52(12), 5406–5425. DOI - ZhYu06: (2006) On model selection consistency of Lasso.
*Journal of Machine Learning Research*, 7(Nov), 2541–2563. - CaXZ08: (2008) On Recovery of Sparse Signals via ℓ1 Minimization.
*ArXiv:0805.0149 [Cs]*. - DoEl03: (2003) Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization.
*Proceedings of the National Academy of Sciences*, 100(5), 2197–2202. DOI - FGLE12: (2012) Quantum Tomography via Compressed Sensing: Error Bounds, Sample Complexity, and Efficient Estimators.
*New Journal of Physics*, 14(9), 095022. DOI - RaWY10: (2010) Restricted eigenvalue properties for correlated Gaussian designs.
*2*, 11(Aug), 2241–2259. - SoMe17: (2017) RIPML: A Restricted Isometry Property based Approach to Multilabel Learning.
*ArXiv:1702.05181 [Cs, Stat]*. - HeBa12: (2012) Signal Recovery on Incoherent Manifolds.
*IEEE Transactions on Information Theory*, 58(12), 7204–7214. DOI - KrKL14: (2014) Sparse matrices in frame theory.
*Computational Statistics*, 29(3–4), 547–568. DOI - Mixo12: (2012, April 26) Sparse Signal Processing with Frame Theory
- RiGr14: (2014) Sparse Signal Recovery with Exponential-Family Noise. In Compressed Sensing & Sparse Filtering (pp. 77–93). Springer Berlin Heidelberg DOI
- DoET06: (2006) Stable recovery of sparse overcomplete representations in the presence of noise.
*IEEE Transactions on Information Theory*, 52(1), 6–18. DOI - CaRT06: (2006) Stable signal recovery from incomplete and inaccurate measurements.
*Communications on Pure and Applied Mathematics*, 59(8), 1207–1223. DOI - Daub92: (1992)
*Ten lectures on wavelets*. Philadelphia, Pa: Society for Industrial and Applied Mathematics (SIAM, 3600 Market Street, Floor 6, Philadelphia, PA 19104) - WuHC13: (2013) The Exact Support Recovery of Sparse Signals With Noise via Orthogonal Matching Pursuit.
*IEEE Signal Processing Letters*, 20(4), 403–406. DOI - CaTa08: (2008) The uniform uncertainty principle and compressed sensing.
- Daub90: (1990) The wavelet transform, time-frequency localization and signal analysis.
*IEEE Transactions on Information Theory*, 36(5), 961–1005. DOI - NeVe09: (2009) Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit.
*Foundations of Computational Mathematics*, 9(3), 317–334. DOI - JuQM17: (2017) When is Network Lasso Accurate?
*ArXiv:1704.02107 [Stat]*. - Geer14: (2014) Worst possible sub-directions in high-dimensional models. In arXiv:1403.7023 [math, stat] (Vol. 131).