The Living Thing / Notebooks :

Sparse coding

How to quickly make big things out of short lists of small things.

some sparse basis functions

Daniel LaCombe’s learned sparse basis example.

Linear expansion with dictionaries of basis functions, with respect to which you wish your representation to be sparse; i.e. in the statistical case, basis-sparse regression. But even outside statistics, you wish simply to approximate some data compactly. My focus here is on the noisy-observation case, although the same results are recycled enough throughout the field.

Note that there are two ways you can get your representation to be sparse;

  1. you know that your signal happens to be compressible, in the sense that under some transform its coefficient vector is mostly zeros, even in a plain old orthogonal basis expansion.
  2. you are using a redundant dictionary such that you won’t need most of it to represent even a dense signal.

I should break these two notions apart here. For now, I’m especially interested in adaptive bases.

This is merely a bunch of links to important articles at the moment; I should do a little exposition one day.

Decomposition of stuff by matching pursuit, wavelets, curvelets, chirplets, framelets, shearlets, camelhairbrushlets, content-specific basis dictionaries, designed or learned. Mammals visual cortexes seem to use something like this, if you squint right at the evidence.

To discuss:


Baraniuk’s lab has a comprehensive, but not usefully annotated, selection of articles in this field, which I include more to demonstrate the virtue of a literature review by showing the pathology of its absence, rather than as a useful starting point.

Classics: Wavelet bases

Very popoular practical intro is Torrence and Comp.


Learnable codings

Adaptive dictionaries!

I want to generalise or extend this idea, ideally in some shift-invariant way (see below.)

Oldhausen and Field (OlFi96a) kicked this area off by arguing sparse coding tricks are revealing of what the brain does.

For a walk through of one version of this, see Theano example of dictionary learning by Daniel LaCombe, who bases his version on NCBK11, HyHH09 and HLLB15.

See MBPS09 for some a summary of methods to 2009 in basis learning.

Question: how do you do this in a big data / offline setting?

TRANSFORM LEARNING: Sparse Representations at Scale:

Analytical sparsifying transforms such as Wavelets and DCT have been widely used in compression standards. Recently, the data-driven learning of sparse models such as the synthesis dictionary model have become popular especially in applications such as denoising, inpainting, compressed sensing, etc. Our group’s research at the University of Illinois focuses on the data-driven adaptation of the alternative sparsifying transform model, which offers numerous advantages over the synthesis dictionary model.

We have proposed several methods for batch learning of square or overcomplete sparsifying transforms from data. We have also investigated specific structures for these transforms such as double sparsity, union-of-transforms, and filter bank structures, which enable their efficient learning or usage. Apart from batch transform learning, our group has investigated methods for online learning of sparsifying transforms, which are particularly useful for big data or real-time applications.


Shift-invariant codings

I would like to find a general way of doing this in a phase/shift-robust fashion, although doing this naively can be computationally expensive outside of certain convenient bases.

(Sorry, that’s not very clear; I need to return to this section to polish it up.)

It can be better if you know you have have constraints, say, a convex combination (e.g. mixtures) or positive definite bases. (e.g. in kernel methods)

One method is “Shift Invariant Sparse coding”, ( BlDa04) and there are various versions out there. (GRKN07 etc) One way is to include multiple shifted copies of your atoms, another is to actually shift them in a separate optimisation stage. Both these get annoying in the time domain for various reasons.

Affine tight framelets (DHRS03) and their presumably less-computationally-tractable, more flexible cousins, shearlets also sound interesting here. For reasons I do not yet understand I am told they can naturally be used on sundry graphs and manifolds, not just lattices, is traditional in DSP. I saw Xiaosheng Zhuang present these (see, e.g. HaZZ16 and WaZh16, where WaZh16 demonstrates a Fast Framelet Transform which is supposedly as computationally as cheap as the FFT.)

I have some ideas I call learning gamelan which relate to this.

Optimisation procedures



This boils down to clever optimisation to make the calculations tractable.

To read

Arora, S., Ge, R., Ma, T., & Moitra, A. (2015) Simple, Efficient, and Neural Algorithms for Sparse Coding. ArXiv:1503.00778.
Bach, F. R., & Jordan, M. I.(2006) Learning spectral clustering, with application to speech separation. Journal of Machine Learning Research, 7(Oct), 1963–2001.
Baraniuk, R. G., Cevher, V., Duarte, M. F., & Hegde, C. (2010) Model-based compressive sensing. IEEE Transactions on Information Theory, 56(4), 1982–2001. DOI.
Barthélemy, Q., Larue, A., Mayoue, A., Mercier, D., & Mars, J. I.(2012) Shift & 2D rotation invariant sparse coding for multivariate signals. IEEE Transactions on Signal Processing, 60(4), 1597–1611.
Bertin, K., Pennec, E. L., & Rivoirard, V. (2011) Adaptive Dantzig density estimation. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, 47(1), 43–74. DOI.
Blumensath, T., & Davies, M. (2004) On Shift-Invariant Sparse Coding. In C. G. Puntonet & A. Prieto (Eds.), Independent Component Analysis and Blind Signal Separation (Vol. 3195, pp. 1205–1212). Berlin, Heidelberg: Springer Berlin Heidelberg
Blumensath, T., & Davies, M. (2006) Sparse and shift-Invariant representations of music. IEEE Transactions on Audio, Speech and Language Processing, 14(1), 50–57. DOI.
Bora, A., Jalal, A., Price, E., & Dimakis, A. G.(2017) Compressed Sensing using Generative Models. ArXiv:1703.03208 [Cs, Math, Stat].
Boyes, G. (2011) Dictionary-Based Analysis/Synthesis and Structured Representations of Musical Audio. . McGill University
Cai, J.-F., Chan, R. H., & Shen, Z. (2008) A framelet-based image inpainting algorithm. Applied and Computational Harmonic Analysis, 24(2), 131–149. DOI.
Carabias-Orti, J. J., Virtanen, T., Vera-Candeas, P., Ruiz-Reyes, N., & Canadas-Quesada, F. J.(2011) Musical Instrument Sound Multi-Excitation Model for Non-Negative Spectrogram Factorization. IEEE Journal of Selected Topics in Signal Processing, 5(6), 1144–1158. DOI.
Chen, S., & Donoho, D. L.(1994) Basis pursuit. In 1994 Conference Record of the Twenty-Eighth Asilomar Conference on Signals, Systems and Computers, 1994 (Vol. 1, pp. 41–44 vol.1). DOI.
Daubechies, I. (1988) Orthonormal bases of compactly supported wavelets. Communications on Pure and Applied Mathematics, 41(7), 909–996. DOI.
Daubechies, I., Defrise, M., & De Mol, C. (2004) An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics, 57(11), 1413–1457. DOI.
Daubechies, I., Han, B., Ron, A., & Shen, Z. (2003) Framelets: MRA-based constructions of wavelet frames. Applied and Computational Harmonic Analysis, 14(1), 1–46. DOI.
Davis, G. M.(1998) A wavelet-based analysis of fractal image compression. IEEE Transactions on Image Processing, 7(2), 141–154. DOI.
Davis, G. M., Mallat, S. G., & Zhang, Z. (1994) Adaptive time-frequency decompositions. Optical Engineering, 33(7), 2183–2191. DOI.
DeVore, R. A.(1998) Nonlinear approximation. Acta Numerica, 7, 51–150. DOI.
Dong, B. (2015) Sparse representation on graphs by tight wavelet frames and applications. Applied and Computational Harmonic Analysis. DOI.
Donoho, D. L., & Johnstone, I. M.(1995) Adapting to Unknown Smoothness via Wavelet Shrinkage. Journal of the American Statistical Association, 90(432), 1200–1224. DOI.
Donoho, D. L., Johnstone, I. M., Kerkyacharian, G., & Picard, D. (1995) Wavelet Shrinkage: Asymptopia?. Journal of the Royal Statistical Society. Series B (Methodological), 57(2), 301–369.
Du, P., Kibbe, W. A., & Lin, S. M.(2006) Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching. Bioinformatics, 22(17), 2059–2065. DOI.
Ekanadham, C., Tranchina, D., & Simoncelli, E. P.(2011) Recovery of Sparse Translation-Invariant Signals With Continuous Basis Pursuit. IEEE Transactions on Signal Processing, 59(10), 4735–4744. DOI.
Fan, J., & Li, R. (2001) Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Journal of the American Statistical Association, 96(456), 1348–1360. DOI.
Garg, S., Rish, I., Cecchi, G., & Lozano, A. (2017) Neurogenesis-Inspired Dictionary Learning: Online Model Adaption in a Changing World. In arXiv:1701.06106 [cs, stat].
Giné, E., & Nickl, R. (2009) Uniform limit theorems for wavelet density estimators. The Annals of Probability, 37(4), 1605–1646. DOI.
Giryes, R., Sapiro, G., & Bronstein, A. M.(2016) Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?. IEEE Transactions on Signal Processing, 64(13), 3444–3457. DOI.
Goodwin, M. M.(1997) Adaptive Signal Models: Theory, Algorithms, and Audio Applications (phdthesis). . University of California, Berkeley, Berkeley, CA
Goodwin, M. M.(2001) Multiscale overlap-add sinusoidal modeling using matching pursuit and refinements. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.
Goodwin, M. M., & Vetterli, M. (1999) Matching pursuit and atomic signal models based on recursive filter banks. IEEE Trans. Signal Process., 47(7), 1890–1902. DOI.
Goodwin, M., & Vetterli, M. (1997) Atomic decompositions of audio signals. In 1997 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, 1997 (p. 4 pp.-). DOI.
Gregor, K., & LeCun, Y. (2010) Learning fast approximations of sparse coding. In Proceedings of the 27th International Conference on Machine Learning (ICML-10) (pp. 399–406).
Gregor, K., & LeCun, Y. (2011) Efficient Learning of Sparse Invariant Representations. ArXiv:1105.5307 [Cs].
Grosse, R., Raina, R., Kwong, H., & Ng, A. Y.(2007) Shift-Invariant Sparse Coding for Audio Classification. In The Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI2007) (Vol. 9, p. 8).
Gupta, P., & Pensky, M. (2016) Solution of linear ill-posed problems using random dictionaries. ArXiv:1605.07913 [Math, Stat].
Hahn, W. E., Lewkowitz, S., Lacombe Jr, D. C., & Barenholtz, E. (2015) Deep learning human actions from video via sparse filtering and locally competitive algorithms.
Han, B., Zhao, Z., & Zhuang, X. (2016) Directional tensor product complex tight framelets with low redundancy. Applied and Computational Harmonic Analysis, 41(2), 603–637. DOI.
Han, B., & Zhuang, X. (2015) Smooth affine shear tight frames with MRA structure. Applied and Computational Harmonic Analysis, 39(2), 300–338. DOI.
Han, K., & Shin, H. (n.d.) Functional Linear Regression for Functional Response via Sparse Basis Selection.
Harte, C., Sandler, M., & Gasser, M. (2006) Detecting Harmonic Change in Musical Audio. In Proceedings of the 1st ACM Workshop on Audio and Music Computing Multimedia (pp. 21–26). New York, NY, USA: ACM DOI.
Henaff, M., Jarrett, K., Kavukcuoglu, K., & LeCun, Y. (2011) Unsupervised learning of sparse features for scalable audio classification. In ISMIR.
Huang, C., Cheang, G. L. H., & Barron, A. R.(2008) Risk of penalized least squares, greedy selection and l1 penalization for flexible function libraries.
Hyvärinen, A., & Hoyer, P. (2000) Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces. Neural Computation, 12(7), 1705–1720. DOI.
Hyvärinen, A., Hurri, J., & Hoyer, P. O.(2009) Natural Image Statistics: A Probabilistic Approach to Early Computational Vision. (Vol. 39). Springer Science & Business Media
Jafari, M. G., & Plumbley, M. D.(2011) Fast Dictionary Learning for Sparse Representations of Speech Signals. IEEE Journal of Selected Topics in Signal Processing, 5(5), 1025–1031. DOI.
Jaillet, F., Gribonval, R., Plumbley, M. D., & Zayyani, H. (2010) An L1 criterion for dictionary learning by subspace identification. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 5482–5485). DOI.
Jung, A. (2013) An RKHS Approach to Estimation with Sparsity Constraints. In Advances in Neural Information Processing Systems 29.
Koch, P., & Corso, J. J.(2016) Sparse Factorization Layers for Neural Networks with Limited Supervision. ArXiv:1612.04468 [Cs, Stat].
Koppel, A., Warnell, G., Stump, E., & Ribeiro, A. (2016) Parsimonious Online Learning with Kernels via Sparse Projections in Function Space. ArXiv:1612.04111 [Cs, Stat].
Lee, H., Battle, A., Raina, R., & Ng, A. Y.(2007) Efficient sparse coding algorithms. Advances in Neural Information Processing Systems, 19, 801.
Lewicki, M. S., & Sejnowski, T. J.(1999) Coding time-varying signals using sparse, shift-invariant representations. In NIPS (Vol. 11, pp. 730–736). Denver, CO: MIT Press
Lewicki, M. S., & Sejnowski, T. J.(2000) Learning Overcomplete Representations. Neural Computation, 12(2), 337–365. DOI.
Liu, T., Tao, D., & Xu, D. (2016) Dimensionality-Dependent Generalization Bounds for $k$-Dimensional Coding Schemes. ArXiv:1601.00238 [Cs, Stat].
Mailhé, B., Gribonval, R., Vandergheynst, P., & Bimbot, F. (2011) Fast orthogonal sparse approximation algorithms over local dictionaries. Signal Processing, 91(12), 2822–2835. DOI.
Mairal, J., Bach, F., & Ponce, J. (2014) Sparse Modeling for Image and Vision Processing. Foundations and Trends® in Comput Graph. Vis., 8(2–3), 85–283. DOI.
Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2009) Online Dictionary Learning for Sparse Coding. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 689–696). New York, NY, USA: ACM DOI.
Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2010) Online learning for matrix factorization and sparse coding. The Journal of Machine Learning Research, 11, 19–60.
Mallat, S. G.(1989) Multiresolution approximations and wavelet orthonormal bases of L²(R). Transactions of the American Mathematical Society, 315(1), 69–87. DOI.
Mallat, S. G., & Zhang, Z. (1993) Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41(12), 3397–3415.
Mallat, S., & Zhang, Z. (1992) Adaptive time-frequency decomposition with matching pursuits. In Time-Frequency and Time-Scale Analysis, 1992., Proceedings of the IEEE-SP International Symposium (pp. 7–10). DOI.
Marcus, G., Marblestone, A., & Dean, T. (2014) The atoms of neural computation. Science (New York, N.Y.), 346(6209), 551–552. DOI.
Mlynarski, W. (2013) Sparse, complex-valued representations of natural sounds learned with phase and amplitude continuity priors. ArXiv Preprint ArXiv:1312.4695.
Mondal, D., & Percival, D. B.(2010) M-estimation of wavelet variance. Annals of the Institute of Statistical Mathematics, 64(1), 27–53. DOI.
Mørup, M., Schmidt, M. N., & Hansen, L. K.(2007) Shift invariant sparse coding of image and music data. Journal of Machine Learning Research.
Ngiam, J., Chen, Z., Bhaskar, S. A., Koh, P. W., & Ng, A. Y.(2011) Sparse Filtering. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 24 (pp. 1125–1133). Curran Associates, Inc.
Olshausen, B. A., & Field, D. J.(1996a) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607–609. DOI.
Olshausen, B. A., & Field, D. J.(1996b) Natural image statistics and efficient coding. Network (Bristol, England), 7(2), 333–339. DOI.
Olshausen, B. A., & Field, D. J.(2004) Sparse coding of sensory inputs. Current Opinion in Neurobiology, 14(4), 481–487. DOI.
Opsomer, J., Wang, Y., & Yang, Y. (2001) Nonparametric Regression with Correlated Errors. Statistical Science, 16(2), 134-153.
Oyallon, E., Belilovsky, E., & Zagoruyko, S. (2017) Scaling the Scattering Transform: Deep Hybrid Networks. ArXiv Preprint ArXiv:1703.08961.
Pfister, L., & Bresler, Y. (2017) Automatic parameter tuning for image denoising with learned sparsifying transforms. . Presented at the ICASSP
Plumbley, M. D., Abdallah, S. A., Blumensath, T., & Davies, M. E.(2006) Sparse representations of polyphonic music. Signal Processing, 86(3), 417–431. DOI.
Qian, S., & Chen, D. (1994) Signal representation using adaptive normalized Gaussian functions. Signal Processing, 36(1), 1–11. DOI.
Ravishankar, S., & Bresler, Y. (2015) Efficient Blind Compressed Sensing Using Sparsifying Transforms with Convergence Guarantees and Application to MRI. ArXiv:1501.02923 [Cs, Stat].
Rubinstein, R., Bruckstein, A. M., & Elad, M. (2010) Dictionaries for Sparse Representation Modeling. Proceedings of the IEEE, 98(6), 1045–1057. DOI.
Rubinstein, R., Zibulevsky, M., & Elad, M. (2008) Efficient implementation of the K-SVD algorithm using batch orthogonal matching pursuit. CS Technion, 40.
Shen, Z. (2010) Wavelet frames and image restorations. In Scopus (pp. 2834–2863). World Scientific
Simoncelli, E. P., & Olshausen, B. A.(2001) Natural Image Statistics and Neural Representation. Annual Review of Neuroscience, 24(1), 1193–1216. DOI.
Smith, E. C., & Lewicki, M. S.(2006) Efficient auditory coding. Nature, 439(7079), 978–982. DOI.
Soh, Y. S., & Chandrasekaran, V. (2017) A Matrix Factorization Approach for Learning Semidefinite-Representable Regularizers. ArXiv:1701.01207 [Cs, Math, Stat].
Torrence, C., & Compo, G. P.(1998) A Practical Guide to Wavelet Analysis. Bulletin of the American Meteorological Society, 79(1), 61–78.
Tošić, I., & Frossard, P. (2011) Dictionary learning: What is the right representation for my signal?. IEEE Signal Processing Magazine, 28(2), 27–38. DOI.
Tsaig, Y., & Donoho, D. L.(2006a) Breakdown of equivalence between the minimal -norm solution and the sparsest solution. Signal Processing, 86(3), 533–548. DOI.
Tsaig, Y., & Donoho, D. L.(2006b) Extensions of compressed sensing. Signal Processing, 86(3), 549–571. DOI.
Vainsencher, D., Mannor, S., & Bruckstein, A. M.(2011) The Sample Complexity of Dictionary Learning. Journal of Machine Learning Research, 12(Nov), 3259–3281.
Vetterli, M. (1999) Wavelets: approximation and compression–a review. In AeroSense’99 (Vol. 3723, pp. 28–31). International Society for Optics and Photonics DOI.
Wang, Y. G., & Zhu, H. (2017) Localized Tight Frames and Fast Framelet Transforms on the Simplex. ArXiv:1701.01595 [Cs, Math].
Wang, Y. G., & Zhuang, X. (2016) Tight framelets and fast framelet transforms on manifolds. ArXiv:1608.04026 [Math].
Wang, Y.-X., Smola, A., & Tibshirani, R. J.(2014) The Falling Factorial Basis and Its Statistical Applications. ArXiv:1405.0558 [Stat].
Weidmann, C., & Vetterli, M. (2012) Rate Distortion Behavior of Sparse Sources. IEEE Transactions on Information Theory, 58(8), 4969–4992. DOI.
Yaghoobi, M., Nam, S., Gribonval, R., & Davies, M. E.(2013) Constrained Overcomplete Analysis Operator Learning for Cosparse Signal Modelling. IEEE Transactions on Signal Processing, 61(9), 2341–2355. DOI.
Zhuang, X. (2015) Smooth affine shear tight frames: digitization and applications. In SPIE Optical Engineering+ Applications (Vol. 9597, pp. 959709–959709). International Society for Optics and Photonics
Zhuang, X. (2016) Digital Affine Shear Transforms: Fast Realization and Applications in Image/Video Processing. SIAM Journal on Imaging Sciences, 9(3), 1437–1466. DOI.