Restricted isometry properties

Plus incoherence, irrepresentability, and other uncertainty bounds for a sparse world, and maybe frame theory, what’s that now?

June 12, 2017 — March 9, 2020

functional analysis

Hilbert space

linear algebra

probability

signal processing

sparser than thou

Restricted isometry properties, a.k.a. uniform uncertainty principles (E. Candès and Tao 2005; E. J. Candès, Romberg, and Tao 2006), mutual incoherence (David L. Donoho 2006; D. L. Donoho, Elad, and Temlyakov 2006), irrepresentability conditions (Zhao and Yu 2006)…

This is mostly notes while I learn some definitions; expect no actual thoughts.

Recoverability conditions, as seen in sparse regression, sparse basis dictionaries, function approximation, compressed sensing etc. If you squint right you could imagine these uncertainty principles for a sparse world, or a the foundations of a particular type of sampling theory.

Terry Tao mentions the various related conditions for the compressed sensing problem, and which types of random matrices satisfy them.

1 Restricted Isometry

The compressed sensing formulation.

The chatty lecture notes on uniform uncertainty look fun.

The restricted isometry constant of a matrix \(A\), is the smallest constant \(\delta_s\) \((1-\delta_s(A))\|x\|_2^2\leq \|Ax\|_2^2\leq (1+\delta_s(A))\|x\|_2^2\) for all \(s\)-sparse \(x\). That is, the measurement matrix does not change the norm of sparse signals “too much”, and in particular, does not null them when \(\delta_s < 1.\)

2 Irrepresentability

The set up is a little different for regression-type problems, which is where “representability” comes from. Here we care also about the design, roughly, the dependence of covariates we actually observe, and the noise distribution.

Zhao and Yu (2006) present an abstract condition called strong irrepresentability, which guarantees asymptotic sign consistency of selection. See also Meinshausen and Bühlmann (2006), who call this neighborhood stability, which is even less catchy.

More recently Meinshausen and Yu (2009) extend this (and explain the original irrepresentability more clearly IMO):

Here we examine the behavior of the Lasso estimators if the irrepresentable condition is relaxed. Even though the Lasso cannot recover the correct sparsity pattern, we show that the estimator is still consistent in the l2-norm sense for fixed designs under conditions on (a) the number \(s_n\) of nonzero components of the vector \(\beta_n\) and (b) the minimal singular values of design matrices that are induced by selecting small subsets of variables. Furthermore, a rate of convergence result is obtained on the l2 error with an appropriate choice of the smoothing parameter.

They do a good gob of uniting prediction-error and model-selection consistency approaches. In fact, I will base everything off Meinshausen and Yu (2009), since not only is the prose lucid, it gives the background to the design assumptions and relaxation of coherence.

TBC.

3 Incoherence

A Basis-Pursuit noise-free setting.

D. L. Donoho, Elad, and Temlyakov (2006):

We can think of the atoms in our dictionary as columns in a matrix \(\Phi\), so that \(\Phi\) is \(n\) by \(m\) and \(m > n.\). A representation of \(y\in\mathbb{R}^n\) can be thought of as a vector \(\alpha\in\mathbb{R}^m\) satisfying \(y=\Phi\alpha.\)

The concept of mutual coherence of the dictionary […] is defined, assuming that the columns of are normalized to unit \(\ell^2\)-norm, in terms of the Gram matrix \(G=\Phi^T\Phi\). With \(G(k,j)\) denoting entries of this matrix, the mutual coherence is

\[ M(\Phi) = \max_{1\leq k, j\leq m, k\neq j} |G(k,j)| \]

A dictionary is incoherent if \(M\) is small.

4 Frame theory

See frames.

5 References

Adcock, Hansen, and Roman. 2015. “The Quest for Optimal Sampling: Computationally Efficient, Structure-Exploiting Measurements for Compressed Sensing.” In Compressed Sensing and Its Applications: MATHEON Workshop 2013. Applied and Numerical Harmonic Analysis.

Baraniuk, Richard G., Cevher, Duarte, et al. 2010. “Model-Based Compressive Sensing.” IEEE Transactions on Information Theory.

Baraniuk, Richard, Davenport, DeVore, et al. 2008. “A Simple Proof of the Restricted Isometry Property for Random Matrices.” Constructive Approximation.

Barron, Cohen, Dahmen, et al. 2008. “Approximation and Learning by Greedy Algorithms.” The Annals of Statistics.

Cai, Xu, and Zhang. 2008. “On Recovery of Sparse Signals via ℓ1 Minimization.” arXiv:0805.0149 [Cs].

Candès, Emmanuel J. 1999. “Harmonic Analysis of Neural Networks.” Applied and Computational Harmonic Analysis.

Candès, Emmanuel J., Eldar, Needell, et al. 2011. “Compressed Sensing with Coherent and Redundant Dictionaries.” Applied and Computational Harmonic Analysis.

Candès, Emmanuel J., Romberg, and Tao. 2006. “Stable Signal Recovery from Incomplete and Inaccurate Measurements.” Communications on Pure and Applied Mathematics.

Candès, Emmanuel, and Tao. 2005. “Decoding by Linear Programming.” IEEE Transactions on Information Theory.

Candès, Emmanuel J., and Tao. 2006. “Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?” IEEE Transactions on Information Theory.

———. 2008. “The Uniform Uncertainty Principle and Compressed Sensing.”

Christensen. 2016. An Introduction to Frames and Riesz Bases. Applied and Numerical Harmonic Analysis.

Daubechies, I. 1990. “The Wavelet Transform, Time-Frequency Localization and Signal Analysis.” IEEE Transactions on Information Theory.

Daubechies, Ingrid. 1992. Ten lectures on wavelets.

Daubechies, Ingrid, DeVore, Fornasier, et al. 2010. “Iteratively Reweighted Least Squares Minimization for Sparse Recovery.” Communications on Pure and Applied Mathematics.

Donoho, David L. 2006. “Compressed Sensing.” IEEE Transactions on Information Theory.

Donoho, David L., and Elad. 2003. “Optimally Sparse Representation in General (Nonorthogonal) Dictionaries via ℓ1 Minimization.” Proceedings of the National Academy of Sciences.

Donoho, D. L., Elad, and Temlyakov. 2006. “Stable Recovery of Sparse Overcomplete Representations in the Presence of Noise.” IEEE Transactions on Information Theory.

Duffin, and Schaeffer. 1952. “A Class of Nonharmonic Fourier Series.” Transactions of the American Mathematical Society.

Flammia, Gross, Liu, et al. 2012. “Quantum Tomography via Compressed Sensing: Error Bounds, Sample Complexity, and Efficient Estimators.” New Journal of Physics.

Foygel, and Srebro. 2011. “Fast-Rate and Optimistic-Rate Error Bounds for L1-Regularized Regression.” arXiv:1108.0373 [Math, Stat].

Hegde, and Baraniuk. 2012. “Signal Recovery on Incoherent Manifolds.” IEEE Transactions on Information Theory.

Heil, and Walnut. 1989. “Continuous and Discrete Wavelet Transforms.” SIAM Review.

Jung, Quang, and Mara. 2017. “When Is Network Lasso Accurate?” arXiv:1704.02107 [Stat].

Kovačević, and Chebira. 2008. An Introduction to Frames.

Krahmer, Kutyniok, and Lemvig. 2014. “Sparse Matrices in Frame Theory.” Computational Statistics.

Lederer, and Vogt. 2020. “Estimating the Lasso’s Effective Noise.” arXiv:2004.11554 [Stat].

Meinshausen, and Bühlmann. 2006. “High-Dimensional Graphs and Variable Selection with the Lasso.” The Annals of Statistics.

Meinshausen, and Yu. 2009. “Lasso-Type Recovery of Sparse Representations for High-Dimensional Data.” The Annals of Statistics.

Mixon. 2012. “Sparse Signal Processing with Frame Theory.”

Morgenshtern, and Bölcskei. 2011. “A Short Course on Frame Theory.”

Needell, and Vershynin. 2009. “Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit.” Foundations of Computational Mathematics.

Raskutti, Wainwright, and Yu. 2010. “Restricted Eigenvalue Properties for Correlated Gaussian Designs.” 2.

Ravikumar, Wainwright, Raskutti, et al. 2011. “High-Dimensional Covariance Estimation by Minimizing ℓ1-Penalized Log-Determinant Divergence.” Electronic Journal of Statistics.

Rish, and Grabarnik. 2014. “Sparse Signal Recovery with Exponential-Family Noise.” In Compressed Sensing & Sparse Filtering. Signals and Communication Technology.

Soni, and Mehdad. 2017. “RIPML: A Restricted Isometry Property Based Approach to Multilabel Learning.” arXiv:1702.05181 [Cs, Stat].

van de Geer. 2014. “Worst Possible Sub-Directions in High-Dimensional Models.” In arXiv:1403.7023 [Math, Stat].

Wu, Huang, and Chen. 2013. “The Exact Support Recovery of Sparse Signals With Noise via Orthogonal Matching Pursuit.” IEEE Signal Processing Letters.

Young. 2001. An Introduction to Non-Harmonic Fourier Series, Revised Edition, 93.

Zhang. 2010. “Nearly Unbiased Variable Selection Under Minimax Concave Penalty.” The Annals of Statistics.

Zhao, and Yu. 2006. “On Model Selection Consistency of Lasso.” Journal of Machine Learning Research.