Stand by for higgledypiggledy notes on the theme of exploiting sparsity to recover signals from few nonlocal measurements, given that we know they are nearly sparse, in a sense that will be made clear soon. This is another twist on classic sampling theory.
Sparse regression is closely related, but with a stochastic process angle.
See also matrix factorisations, random projections, optimisation, model selection, multiple testing, random linear algebra, concentration inequalities, restricted isometry properties.
Basic Compressed Sensing
I’ll follow the intro of CENR11, which tries to unify many variants.
We attempt to recover a signal \(x_k\in \mathbb{R}^d\) from \(m\ll n\) measurements \(y_k\) of the form
or, as a matrix equation,
where \(A\) is the \(m \times d\) stacked measurement matrices, and the \(z\) terms denote i.i.d. measurement noise.
Now, if \(x\) is a sparse vector, and \(A\) satisfies an appropriate restricted isometry property, then we can construct an estimate \(\hat{x}\) with small error by minimising
where \(\varepsilon\gt \z\_2^2.\)
In the lecture notes on restricted isometry properties, Candès and Tao talk about not vectors \(x\in \mathbb{R}^d\) but functions \(f:G \mapsto \mathbb{C}\) on Abelian groups like \(G=\mathbb{Z}/d\mathbb{Z},\) which is convenient for some phrasing, since then when I say say my signal is \(s\)sparse, which means that its support \(\operatorname{supp} \tilde{f}=S\subset G\) where \(S=s\).
In the finitedimensional vector framing, we can talk about best sparse approximations \(x_s\) to nonsparse vectors, \(x\).
where all the coefficients apart from the \(s\) largest are zeroed.
The basic results are find attractive convex problems. There are also greedy optimisation versions, which are formulated as above, but no longer necessarily a convex optimisation; instead, we talk about Orthogonal Matching Pursuit, Iterative Thresholding and some other stuff the details of which I do not yet know, which I think pops up in wavelets and sparse coding.
For all of these the results tend to be something like
with data \(y,\) the difference between my estimate of \(\hat{x}\) and \(\hat{x}_\text{oracle}\) is bounded by somethingorother where the oracle estimate is the one where you know ahead of time the set \(S=\operatorname{supp}(x)\).
Candés gives an example result
conditional upon
where this \(\delta_s(\cdot)\) gives the restricted isometry constant of a matrix, defined as the smallest constant such that \((1\delta_s(A))\x\_2^2\leq \Ax\_2^2\leq (1+\delta_s(A))\x\_2^2\) for all \(s\)sparse \(x\). That is, the measurement matrix does not change the norm of sparse signals “much”, and in particular, does not null them when \(\delta_s \lt 1.\)
This is not the strongest bound out there apparently, but for any of that form, those constants look frustrating.
Measuring the restricted isometry constant of a given measurement matrix is presumably hard, although I haven’t tried yet. But generating random matrices that have a certain RIC with high probability is easy; that’s a neat trick in this area.
Redundant compressed sensing
TBD. For now see restricted isometry principles.
Introductory texts

Aside: see the rather good practical python notebook in numerical tours.

Terry Tao’s exposition is great as usual. See, e.g.

The uniform uncertainty principle and compressed sensing:
[…]we can at least give an informal geometric argument as to why \(\ell^1\) minimisation is more likely to recover a sparse solution than \(\ell^2\) minimisation. The set of all \(f\) whose Fourier coefficients match the observed data \(c_\xi\) forms an affine subspace of the space of all functions. The \(\ell^2\) minimiser can then be viewed geometrically by taking l^2 balls (i.e. Euclidean balls) centred at the origin, and gradually increasing the radius of the ball until the first point of contact with the affine subspace. In general, there is no reason to expect this point of contact to be sparse (i.e. to lie on a highcodimension coordinate subspace). If however we replace \(\ell^2\) with \(\ell^1\), then the Euclidean balls are replaced by octahedra, which are much “pointier” (especially in high dimensions) and whose corners lie on coordinate subspaces. So the point of first contact is now much more likely to be sparse. The idea of using \(\ell^1\) as a “convex relaxation” of \(\ell^0\) is a powerful one in applied mathematics; see for instance Trop06 on the topic.

Hegde, Baraniuk, Davenport and Duarte have an open source textbook

Gabriel Peyre’s Compressed Sensing of Images

Igor Carron’s Sunday Morning Insight: A Quick Panorama of Sensing from Direct Imaging to Machine Learning

and his intro to compressive sensing page
…Using random projections
Classic. Notes to come.
LocalitySensitive Hashing (LSH) is an algorithm for solving the approximate or exact Near Neighbor Search in high dimensional spaces.

John Myles White explains why the JohnsonLindenstrauss lemma is worth knowing

Bob Durrant: The Unreasonable Effectiveness of Random Projections in Computer Science
…Using deterministic projections
Surely this is close to quasi monte carlo?
 Dustin G. Mixon Achieving the Welch bound with difference sets
I blogged about constructing harmonic frames using difference sets. We proved that such harmonic frames are equiangular tight frames, thereby having minimal coherence between columns. I concluded the entry by conjecturing that incoherent harmonic frames are as good for compressed sensing as harmonic frames whose rows were randomly drawn from the discrete Fourier transform (DFT) matrix
recent work of Yves Meyer might be relevant:
A variant on the compressed sensing of Emmanuel Candes, Basarab Matei and Yves Meyer
Simple quasicrystals are sets of stable sampling, Basarab Matei and Yves Meyer
These papers are interesting because their approach to compressed sensing is very different. Specifically, their sparse vectors are actually functions of compact support with sufficiently small Lebesgue measure. As such, concepts like conditioning are replaced with that of stable sampling, and the results must be interpreted in the context of functional analysis. The papers demonstrate that sampling frequencies according to a (deterministic) simple quasicrystal will uniquely determine sufficiently sparse functions, and furthermore, the sparsest function in the preimage can be recovered by L1minimization provided it’s nonnegative.
That phase transition
How well can you recover a matrix from a certain number of measurements? In obvious metrics there is a sudden jump in how well you do with increasing measurements for a given rank. This looks a lot like a physical phase transition. Hmm.
See statistical mechanics of statistics.
Weird things to be classified
csgm, (BJPD17) compressed sensing using generative models, tries to find a model which is sparse with respect to… some manifold of the latent variables of… a generative model? or something?
Refs
 DaHV12: (2012) A concentration theorem for projections. ArXiv Preprint ArXiv:1206.6813.
 PaRa15: (2015) A robust sublinear time RFFAST algorithm for computing a sparse DFT. ArXiv:1501.00320 [Cs, Math].
 BDDW08: (2008) A Simple Proof of the Restricted Isometry Property for Random Matrices. Constructive Approximation, 28(3), 253–263. DOI
 MaMe00: (n.d.) A variant on the compressed sensing of Emmanuel Candes.
 Chre08: (2008) An Alternating l1 approach to the compressed sensing problem. ArXiv:0809.0660 [Stat].
 DaGu03: (2003) An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures & Algorithms, 22(1), 60–65. DOI
 CaWa08: (2008) An Introduction To Compressive Sampling. IEEE Signal Processing Magazine, 25(2), 21–30. DOI
 DaDD04: (2004) An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics, 57(11), 1413–1457. DOI
 RuPE13: (2013) Analysis KSVD: A DictionaryLearning Algorithm for the Analysis Sparse Model. IEEE Transactions on Signal Processing, 61(3), 661–677. DOI
 DiFr84: (1984) Asymptotics of Graphical Projection Pursuit. The Annals of Statistics, 12(3), 793–815.
 BaSB10: (2010) Bayesian compressive sensing via belief propagation. IEEE Transactions on Signal Processing, 58(1), 269–280. DOI
 Dono06: (2006) Compressed sensing. IEEE Transactions on Information Theory, 52(4), 1289–1306. DOI
 OkLa08: (2008) Compressed sensing of GaussMarkov random field with wireless sensor networks. In 5th IEEE Sensor Array and Multichannel Signal Processing Workshop, 2008. SAM 2008 (pp. 257–260). DOI
 BJPD17: (2017) Compressed Sensing using Generative Models. In International Conference on Machine Learning (pp. 537–546).
 CENR11: (2011) Compressed sensing with coherent and redundant dictionaries. Applied and Computational Harmonic Analysis, 31(1), 59–73. DOI
 ScRa12: (2012) Compressive phase retrieval via generalized approximate message passing. In 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton) (pp. 815–822). DOI
 Bara07: (2007) Compressive sensing. IEEE Signal Processing Magazine, 24(4).
 GrSi15: (2015) Compressive sensing in medical imaging. Applied Optics, 54(8), C23–C44.
 Carm14: (2014) Compressive System Identification. In Compressed Sensing & Sparse Filtering (pp. 281–324). Springer Berlin Heidelberg DOI
 Carm13: (2013) Compressive system identification: sequential methods and entropy bounds. Digital Signal Processing, 23(3), 751–770. DOI
 BeDY16: (2016) Computational biology in the 21st century: scaling with compressive algorithms. Communications of the ACM, 59(8), 72–80. DOI
 TrWr10: (2010) Computational Methods for Sparse Solution of Linear Inverse Problems. Proceedings of the IEEE, 98(6), 948–958. DOI
 HaJN15: (2015) Conditional gradient algorithms for normregularized smooth convex optimization. Mathematical Programming, 152(1–2), 75–112. DOI
 ChZh13: (2013) Convergence of the reweighted ℓ. Computational Optimization and Applications, 59(1–2), 47–61. DOI
 LiOs09: (2009) Coordinate descent optimization for ℓ 1 minimization with application to compressed sensing; a greedy algorithm. Inverse Problems and Imaging, 3(3), 487–503.
 NeTr08: (2008) CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. ArXiv:0803.2392 [Cs, Math].
 Achl03: (2003) Databasefriendly random projections: JohnsonLindenstrauss with binary coins. Journal of Computer and System Sciences, 66(4), 671–687. DOI
 CaTa05: (2005) Decoding by linear programming. IEEE Transactions on Information Theory, 51(12), 4203–4215. DOI
 GiSB16: (2016) Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy? IEEE Transactions on Signal Processing, 64(13), 3444–3457. DOI
 HRLV10: (2010) Distributed Sampling of Signals Linked by Sparse Filtering: Theory and Applications. IEEE Transactions on Signal Processing, 58(3), 1095–1109. DOI
 RaBr15: (2015) Efficient Blind Compressed Sensing Using Sparsifying Transforms with Convergence Guarantees and Application to MRI. ArXiv:1501.02923 [Cs, Stat].
 CaRe09: (2009) Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6), 717–772. DOI
 Dasg00: (2000) Experiments with Random Projection. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (pp. 143–151). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
 PeEE10: (2010) Exploiting Statistical Dependencies in Sparse Representations for Signal Recovery. IEEE Transactions on Signal Processing, 60(5), 2286–2303. DOI
 AzKS15: (2015) Extreme Compressive Sampling for Covariance Estimation. ArXiv:1506.00898 [Cs, Math, Stat].
 FoSr11: (2011) Fastrate and optimisticrate error bounds for L1regularized regression. ArXiv:1108.0373 [Math, Stat].
 MiEl10: (2010) From Theory to Practice: SubNyquist Sampling of Sparse Wideband Analog Signals. IEEE Journal of Selected Topics in Signal Processing, 4(2), 375–391. DOI
 Mont12: (2012) Graphical models concepts in compressed sensing. Compressed Sensing: Theory and Applications, 394–438.
 KiHa16: (2016) Greedy algorithms for nonnegativityconstrained simultaneous sparse recovery. Signal Processing, 125, 274–289. DOI
 CaDa11: (2011) How well can we estimate a sparse vector? ArXiv:1104.5246 [Cs, Math, Stat].
 Romb08: (2008) Imaging via Compressive Sampling. IEEE Signal Processing Magazine, 25(2), 14–20. DOI
 WiNa16: (2016) Iterative Reweighted l1 and l2 Methods for Finding Sparse Solution. Microsoft Research.
 ChYi08: (2008) Iteratively reweighted algorithms for compressive sensing. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008 (pp. 3869–3872). DOI
 DDFG10: (2010) Iteratively reweighted least squares minimization for sparse recovery. Communications on Pure and Applied Mathematics, 63(1), 1–38. DOI
 Trop06: (2006) Just relax: convex programming methods for identifying sparse signals in noise. IEEE Transactions on Information Theory, 52(3), 1030–1051. DOI
 SFJJ15: (2015) L1Regularized Distributed Optimization: A CommunicationEfficient PrimalDual Framework. ArXiv:1512.04011 [Cs].
 FDKV07: (2007) Learning the structure of manifolds using random projections. In Advances in Neural Information Processing Systems (pp. 473–480).
 MoBa17: (2017) Learning to Invert: Signal Recovery via Deep Convolutional Networks. In ICASSP.
 Cand14: (2014) Mathematics of sparsity (and few other things). ICM 2014 Proceedings, to Appear.
 SaBB06: (2006) Measurements vs Bits: Compressed Sensing meets Information Theory. In in Proc. Allerton Conf. on Comm., Control, and Computing.
 DoMM10: (2010) Message passing algorithms for compressed sensing: I motivation and construction. In 2010 IEEE Information Theory Workshop (ITW) (pp. 1–5). DOI
 DoMM09a: (2009a) Message passing algorithms for compressed sensing: II analysis and validation. In 2010 IEEE Information Theory Workshop (ITW) (pp. 1–5). DOI
 DoMM09b: (2009b) Messagepassing algorithms for compressed sensing. Proceedings of the National Academy of Sciences, 106(45), 18914–18919. DOI
 BCDH10: (2010) Modelbased compressive sensing. IEEE Transactions on Information Theory, 56(4), 1982–2001. DOI
 OlFi96: (1996) Natural image statistics and efficient coding. Network (Bristol, England), 7(2), 333–339. DOI
 HIKP12: (2012) Nearly Optimal Sparse Fourier Transform. In Proceedings of the Fortyfourth Annual ACM Symposium on Theory of Computing (pp. 563–578). New York, NY, USA: ACM DOI
 CaTa06: (2006) NearOptimal Signal Recovery From Random Projections: Universal Encoding Strategies? IEEE Transactions on Information Theory, 52(12), 5406–5425. DOI
 Kabá14: (2014) New Bounds on Compressive Linear Least Squares Regression. In Journal of Machine Learning Research (pp. 448–456).
 YNGD12: (2012) Noise aware analysis operator learning for approximately cosparse signals. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5409–5412). DOI
 Devo98: (1998) Nonlinear approximation. Acta Numerica, 7, 51–150. DOI
 Hoye04: (2004) Nonnegative Matrix Factorization with Sparseness Constraints. Journal of Machine Learning Research, 5(9), 1457–1469.
 HaLi93: (1993) On almost Linearity of Low Dimensional Projections from High Dimensional Data. The Annals of Statistics, 21(2), 867–889.
 CaXZ08: (2008) On Recovery of Sparse Signals via ℓ1 Minimization. ArXiv:0805.0149 [Cs].
 BrEZ08a: (2008a) On the Uniqueness of Nonnegative Sparse Solutions to Underdetermined Systems of Equations. IEEE Transactions on Information Theory, 54(11), 4813–4820. DOI
 BoSc16: (2016) OnsagerCorrected Deep Networks for Sparse Linear Inverse Problems. ArXiv:1612.01183 [Cs, Math].
 DoEl03: (2003) Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization. Proceedings of the National Academy of Sciences, 100(5), 2197–2202. DOI
 BJMO12: (2012) Optimization with SparsityInducing Penalties. Foundations and Trends® in Machine Learning, 4(1), 1–106. DOI
 WeMZ16: (2016) Overcoming The Limitations of Phase Transition by Higher Order Analysis of Regularization Techniques. ArXiv:1603.07377 [Cs, Math, Stat].
 RoZh07: (2007) Piecewise linear regularized solution paths. The Annals of Statistics, 35(3), 1012–1030. DOI
 FGLE12: (2012) Quantum Tomography via Compressed Sensing: Error Bounds, Sample Complexity, and Efficient Estimators. New Journal of Physics, 14(9), 095022. DOI
 BiMa01: (2001) Random Projection in Dimensionality Reduction: Applications to Image and Text Data. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 245–250). New York, NY, USA: ACM DOI
 LaGG16: (2016) Random projections of random manifolds. ArXiv:1607.04331 [Cs, qBio, Stat].
 ZLZX17: (2017) Randomization or Condensation?: LinearCost Matrix Sketching Via Cascaded Compression Sampling. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 615–623). New York, NY, USA: ACM DOI
 WeVe12: (2012) Rate Distortion Behavior of Sparse Sources. IEEE Transactions on Information Theory, 58(8), 4969–4992. DOI
 Jagg13: (2013) Revisiting FrankWolfe: ProjectionFree Sparse Convex Optimization. In Journal of Machine Learning Research (pp. 427–435).
 CaRT06a: (2006a) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory, 52(2), 489–509. DOI
 CaZh15: (2015) ROP: Matrix recovery via rankone projections. The Annals of Statistics, 43(1), 102–138. DOI
 SoXP15: (2015) Sequential Information Guided Sensing. ArXiv:1509.00130 [Cs, Math, Stat].
 HeBa12: (2012) Signal Recovery on Incoherent Manifolds. IEEE Transactions on Information Theory, 58(12), 7204–7214. DOI
 HIKP12: (2012) Simple and Practical Algorithm for Sparse Fourier Transform. In Proceedings of the TwentyThird Annual ACMSIAM Symposium on Discrete Algorithms (pp. 1183–1194). Kyoto, Japan: Society for Industrial and Applied Mathematics
 MaMe10: (2010) Simple quasicrystals are sets of stable sampling. Complex Variables and Elliptic Equations, 55(8–10), 947–964. DOI
 Bara08: (2008) Singlepixel imaging via compressive sampling. IEEE Signal Processing Magazine, 25(2), 83–91. DOI
 Chen12: (2012) Smoothing methods for nonsmooth, nonconvex minimization. Mathematical Programming, 134(1), 71–99. DOI
 OlFi04: (2004) Sparse coding of sensory inputs. Current Opinion in Neurobiology, 14(4), 481–487. DOI
 RiGr15: (2015) Sparse modeling: theory, algorithms, and applications. Boca Raton, FL: CRC Press, Taylor & Francis Group
 BrEZ08b: (2008b) Sparse nonnegative solution of a linear system of equations is unique. In 3rd International Symposium on Communications, Control and Signal Processing, 2008. ISCCSP 2008 (pp. 762–767). DOI
 CDHB09: (2009) Sparse Signal Recovery Using Markov Random Fields. In Advances in Neural Information Processing Systems (pp. 257–264). Curran Associates, Inc.
 RiGr14: (2014) Sparse Signal Recovery with ExponentialFamily Noise. In Compressed Sensing & Sparse Filtering (pp. 77–93). Springer Berlin Heidelberg DOI
 RaBr15: (2015) Sparsifying Transform Learning With Efficient Optimal Updates and Convergence Guarantees. IEEE Transactions on Signal Processing, 63(9), 2389–2404. DOI
 DuBa13: (2013) Spectral compressive sensing. Applied and Computational Harmonic Analysis, 35(1), 111–129. DOI
 DoET06: (2006) Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Transactions on Information Theory, 52(1), 6–18. DOI
 CaRT06b: (2006b) Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics, 59(8), 1207–1223. DOI
 ShTe11: (2011) Stochastic Methods for L1regularized Loss Minimization. Journal of Machine Learning Research, 12, 1865–1892.
 YaXu15: (2015) Streaming Sparse Principal Component Analysis. In Journal of Machine Learning Research (pp. 494–503).
 BaMo11: (2011) The dynamics of message passing on dense graphs, with applications to compressed sensing. IEEE Transactions on Information Theory, 57(2), 764–785. DOI
 WuHC13: (2013) The Exact Support Recovery of Sparse Signals With Noise via Orthogonal Matching Pursuit. IEEE Signal Processing Letters, 20(4), 403–406. DOI
 CaTa08: (2008) The uniform uncertainty principle and compressed sensing.
 Blan13: (2013) Toward deterministic compressed sensing. Proceedings of the National Academy of Sciences, 110(4), 1146–1147. DOI
 LiHC06: (2006) Very Sparse Random Projections. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 287–296). New York, NY, USA: ACM DOI
 Vett99: (1999) Wavelets: approximation and compression–a review. In AeroSense’99 (Vol. 3723, pp. 28–31). International Society for Optics and Photonics DOI
 BiCh13: (2013) WorstCase Complexity of Smoothing Quadratic Regularization Methods for NonLipschitzian Optimization. SIAM Journal on Optimization, 23(3), 1718–1741. DOI