.
Linear expansion with dictionaries of basis functions, with respect to which you wish your representation to be sparse; i.e. in the statistical case, basissparse regression. But even outside statistics, you wish simply to approximate some data compactly. My focus here is on the noisyobservation case, although the same results are recycled enough throughout the field.
Note that there are two ways you can get your representation to be sparse;

you know that your signal happens to be compressible, in the sense that under some transform its coefficient vector is mostly zeros, even in a plain old orthogonal basis expansion.

you are using a redundant dictionary such that you won't need most of it to represent even a dense signal.
I should break these two notions apart here. For now, I'm especially interested in adaptive bases.
This is merely a bunch of links to important articles at the moment; I should do a little exposition one day.
Decomposition of stuff by matching pursuit, wavelets, curvelets, chirplets, framelets, shearlets, camelhairbrushlets, contentspecific basis dictionaries, designed or learned. Mammals visual cortexes seem to use something like this, if you squint right at the evidence.
To discuss:

connection to mixture models.

Sampling complexity versus approximation complexity

am especially interested in approaches where we learn the transform or the basis dictionary unsupervised
Resources
Baraniuk's lab has a comprehensive, but not usefully annotated, selection of articles in this field, which I include more to demonstrate the virtue of a literature review by showing the pathology of its absence, rather than as a useful starting point.
Classics: Wavelet bases
Very popoular practical intro is Torrence and Comp.
TBD
Learnable codings
Adaptive dictionaries!
I want to generalise or extend this idea, ideally in some shiftinvariant way (see below.)
Oldhausen and Field (OlFi96a) kicked this area off by arguing sparse coding tricks are revealing of what the brain does.
For a walk through of one version of this, see Theano example of dictionary learning by Daniel LaCombe, who bases his version on NCBK11, HyHH09 and HLLB15.
See MBPS09 for some a summary of methods to 2009 in basis learning.
Question: how do you do this in a big data / offline setting?
TRANSFORM LEARNING: Sparse Representations at Scale:
Analytical sparsifying transforms such as Wavelets and DCT have been widely used in compression standards. Recently, the datadriven learning of sparse models such as the synthesis dictionary model have become popular especially in applications such as denoising, inpainting, compressed sensing, etc. Our group’s research at the University of Illinois focuses on the datadriven adaptation of the alternative sparsifying transform model, which offers numerous advantages over the synthesis dictionary model.
We have proposed several methods for batch learning of square or overcomplete sparsifying transforms from data. We have also investigated specific structures for these transforms such as double sparsity, unionoftransforms, and filter bank structures, which enable their efficient learning or usage. Apart from batch transform learning, our group has investigated methods for online learning of sparsifying transforms, which are particularly useful for big data or realtime applications.
Huh.
Shiftinvariant codings
I would like to find a general way of doing this in a phase/shiftrobust fashion, although doing this naively can be computationally expensive outside of certain convenient bases.
(Sorry, that's not very clear; I need to return to this section to polish it up.)
It can be better if you know you have have constraints, say, a convex combination (e.g. mixtures) or positive definite bases. (e.g. in kernel methods)
One method is “Shift Invariant Sparse coding”, ( BlDa04) and there are various versions out there. (GRKN07 etc) One way is to include multiple shifted copies of your atoms, another is to actually shift them in a separate optimisation stage. Both these get annoying in the time domain for various reasons.
Affine tight framelets (DHRS03) and their presumably lesscomputationallytractable, more flexible cousins, shearlets also sound interesting here. For reasons I do not yet understand I am told they can naturally be used on sundry graphs and manifolds, not just lattices, is traditional in DSP. I saw Xiaosheng Zhuang present these (see, e.g. HaZZ16 and WaZh16, where WaZh16 demonstrates a Fast Framelet Transform which is supposedly as computationally as cheap as the FFT.)
I have some ideas I call learning gamelan which relate to this.
Optimisation procedures
TBD.
Implementations
This boils down to clever optimisation to make the calculations tractable.

the wavelet toolkits.

scipy's wavelet transform has no frills and little coherent explanation, but it goes

pywavelets does various fancy wavelets and seems to be a standard for python.

Matlab's Wavelet toolbox seems to be the reference.

scikitlearn dictionary learning version here

also pydbm

Fancy easy GPU wavelet implementation, PyTorchWavelets.


SParse Optimization Research COde (SPORCO) is an opensource Python package for solving optimization problems with sparsityinducing regularization, consisting primarily of sparse coding and dictionary learning, for both standard and convolutional forms of sparse representation. In the current version, all optimization problems are solved within the Alternating Direction Method of Multipliers (ADMM) framework. SPORCO was developed for applications in signal and image processing, but is also expected to be useful for problems in computer vision, statistics, and machine learning.

Sparsefiltering: Unsupervised feature learning based on sparsefiltering
This implements the method described Jiquan Ngiam, Pang Wei Koh, Zhenghao Chen, Sonia Bhaskar, Andrew Y. Ng: Sparse Filtering. NIPS 2011: 11251133 and is based on the Matlab code provided in the supplementary material

spams does a variety of sparse codings, although non of them accepting pluggable models. Nonetheless it does some neat things fast. (see optimisation)
Refs
 CaLy15: Peter G. Casazza, Richard G. Lynch (2015) A brief introduction to Hilbert space frame theory and its applications. In Finite Frame Theory: A Complete Introduction to Overcompleteness.
 CaCS08: JianFeng Cai, Raymond H. Chan, Zuowei Shen (2008) A frameletbased image inpainting algorithm. Applied and Computational Harmonic Analysis, 24(2), 131–149. DOI
 SoCh17: Yong Sheng Soh, Venkat Chandrasekaran (2017) A Matrix Factorization Approach for Learning SemidefiniteRepresentable Regularizers. ArXiv:1701.01207 [Cs, Math, Stat].
 ToCo98: Christopher Torrence, Gilbert P Compo (1998) A Practical Guide to Wavelet Analysis. Bulletin of the American Meteorological Society, 79(1), 61–78.
 Davi98: Geoffrey M Davis (1998) A waveletbased analysis of fractal image compression. IEEE Transactions on Image Processing, 7(2), 141–154. DOI
 DoJo95: David L. Donoho, Iain M. Johnstone (1995) Adapting to Unknown Smoothness via Wavelet Shrinkage. Journal of the American Statistical Association, 90(432), 1200–1224. DOI
 BePR11: K. Bertin, E. Le Pennec, V. Rivoirard (2011) Adaptive Dantzig density estimation. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, 47(1), 43–74. DOI
 DaMA97: G. Davis, S. Mallat, M. Avellaneda (1997) Adaptive greedy approximations. Constructive Approximation, 13(1), 57–98. DOI
 MaZh92: S. Mallat, Z. Zhang (1992) Adaptive timefrequency decomposition with matching pursuits. In TimeFrequency and TimeScale Analysis, 1992., Proceedings of the IEEESP International Symposium (pp. 7–10). DOI
 DaMZ94a: Geoffrey M. Davis, Stephane G. Mallat, Zhifeng Zhang (1994a) Adaptive timefrequency decompositions. Optical Engineering, 33(7), 2183–2191. DOI
 DaMZ94b: Geoffrey M. Davis, Stephane G. Mallat, Zhifeng Zhang (1994b) Adaptive timefrequency decompositions with matching pursuit. In Wavelet Applications (Vol. 2242, pp. 402–414). International Society for Optics and Photonics DOI
 DaDD04: I. Daubechies, M. Defrise, C. De Mol (2004) An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics, 57(11), 1413–1457. DOI
 JGPZ10: F. Jaillet, R. Gribonval, M. D. Plumbley, H. Zayyani (2010) An L1 criterion for dictionary learning by subspace identification. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 5482–5485). DOI
 Jung13: Alexander Jung (2013) An RKHS Approach to Estimation with Sparsity Constraints. In Advances in Neural Information Processing Systems 29.
 BCDD08: Andrew R. Barron, Albert Cohen, Wolfgang Dahmen, Ronald A. DeVore (2008) Approximation and learning by greedy algorithms. The Annals of Statistics, 36(1), 64–94. DOI
 GoVe97: M. Goodwin, M. Vetterli (1997) Atomic decompositions of audio signals. In 1997 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, 1997. DOI
 PfBr17: Luke Pfister, Yoram Bresler (2017) Automatic parameter tuning for image denoising with learned sparsifying transforms.
 ChDo94: Shaobing Chen, David L. Donoho (1994) Basis pursuit. In 1994 Conference Record of the TwentyEighth Asilomar Conference on Signals, Systems and Computers, 1994 (Vol. 1, pp. 41–44 vol.1). DOI
 TsDo06a: Yaakov Tsaig, David L. Donoho (2006a) Breakdown of equivalence between the minimal norm solution and the sparsest solution. Signal Processing, 86(3), 533–548. DOI
 LeSe99: M S Lewicki, T J Sejnowski (1999) Coding timevarying signals using sparse, shiftinvariant representations. In NIPS (Vol. 11, pp. 730–736). Denver, CO: MIT Press
 BJPD17: Ashish Bora, Ajil Jalal, Eric Price, Alexandros G. Dimakis (2017) Compressed Sensing using Generative Models. In International Conference on Machine Learning (pp. 537–546).
 TrWr10: J. A. Tropp, S. J. Wright (2010) Computational Methods for Sparse Solution of Linear Inverse Problems. Proceedings of the IEEE, 98(6), 948–958. DOI
 DuBS17: Simon S. Du, Sivaraman Balakrishnan, Aarti Singh (2017) Computationally Efficient Robust Estimation of Sparse Functionals. In ICML.
 YNGD13: M. Yaghoobi, Sangnam Nam, R. Gribonval, M.E. Davies (2013) Constrained Overcomplete Analysis Operator Learning for Cosparse Signal Modelling. IEEE Transactions on Signal Processing, 61(9), 2341–2355. DOI
 HLLB15: William Edward Hahn, Stephanie Lewkowitz, Daniel C. Lacombe, Elan Barenholtz (2015) Deep learning human actions from video via sparse filtering and locally competitive algorithms. Multimedia Tools and Applications, 74(22), 10097–10110. DOI
 GiSB16: R. Giryes, G. Sapiro, A. M. Bronstein (2016) Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy? IEEE Transactions on Signal Processing, 64(13), 3444–3457. DOI
 HaSG06: Christopher Harte, Mark Sandler, Martin Gasser (2006) Detecting Harmonic Change in Musical Audio. In Proceedings of the 1st ACM Workshop on Audio and Music Computing Multimedia (pp. 21–26). New York, NY, USA: ACM DOI
 RuBE10: Ron Rubinstein, A.M. Bruckstein, Michael Elad (2010) Dictionaries for Sparse Representation Modeling. Proceedings of the IEEE, 98(6), 1045–1057. DOI
 ToFr11: Ivana Tošić, Pascal Frossard (2011) Dictionary learning: What is the right representation for my signal? IEEE Signal Processing Magazine, 28(2), 27–38. DOI
 Boye11: Graham Boyes (2011) Dictionarybased analysis/synthesis and structured representations of musical audio. McGill University
 Zhua16: Xiaosheng Zhuang (2016) Digital Affine Shear Transforms: Fast Realization and Applications in Image/Video Processing. SIAM Journal on Imaging Sciences, 9(3), 1437–1466. DOI
 LiTX16: Tongliang Liu, Dacheng Tao, Dong Xu (2016) DimensionalityDependent Generalization Bounds for Dimensional Coding Schemes. ArXiv:1601.00238 [Cs, Stat].
 HaZZ16: Bin Han, Zhenpeng Zhao, Xiaosheng Zhuang (2016) Directional tensor product complex tight framelets with low redundancy. Applied and Computational Harmonic Analysis, 41(2), 603–637. DOI
 LeBW96: Wee Sun Lee, Peter L. Bartlett, Robert C. Williamson (1996) Efficient agnostic learning of neural networks with bounded fanin. IEEE Transactions on Information Theory, 42(6), 2118–2132. DOI
 SmLe06: Evan C. Smith, Michael S. Lewicki (2006) Efficient auditory coding. Nature, 439(7079), 978–982. DOI
 RaBr15: Saiprasad Ravishankar, Yoram Bresler (2015) Efficient Blind Compressed Sensing Using Sparsifying Transforms with Convergence Guarantees and Application to MRI. ArXiv:1501.02923 [Cs, Stat].
 RuZE08: Ron Rubinstein, Michael Zibulevsky, Michael Elad (2008) Efficient implementation of the KSVD algorithm using batch orthogonal matching pursuit (p. 40). CS Technion
 GrLe11: Karol Gregor, Yann LeCun (2011) Efficient Learning of Sparse Invariant Representations. ArXiv:1105.5307 [Cs].
 LBRN07: Honglak Lee, Alexis Battle, Rajat Raina, Andrew Y. Ng (2007) Efficient sparse coding algorithms. Advances in Neural Information Processing Systems, 19, 801.
 HyHo00: Aapo Hyvärinen, Patrik Hoyer (2000) Emergence of Phase and ShiftInvariant Features by Decomposition of Natural Images into Independent Feature Subspaces. Neural Computation, 12(7), 1705–1720. DOI
 OlFi96: Bruno A. Olshausen, David J. Field (1996) Emergence of simplecell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607–609. DOI
 TsDo06b: Yaakov Tsaig, David L. Donoho (2006b) Extensions of compressed sensing. Signal Processing, 86(3), 549–571. DOI
 JaPl11: M. G. Jafari, M. D. Plumbley (2011) Fast Dictionary Learning for Sparse Representations of Speech Signals. IEEE Journal of Selected Topics in Signal Processing, 5(5), 1025–1031. DOI
 MGVB11: Boris Mailhé, Rémi Gribonval, Pierre Vandergheynst, Frédéric Bimbot (2011) Fast orthogonal sparse approximation algorithms over local dictionaries. Signal Processing, 91(12), 2822–2835. DOI
 DHRS03: Ingrid Daubechies, Bin Han, Amos Ron, Zuowei Shen (2003) Framelets: MRAbased constructions of wavelet frames. Applied and Computational Harmonic Analysis, 14(1), 1–46. DOI
 HaSh00: Kyunghee Han, Hyejin Shin (n.d.) Functional Linear Regression for Functional Response via Sparse Basis Selection.
 YuLZ14: Xiaotong Yuan, Ping Li, Tong Zhang (2014) Gradient Hard Thresholding Pursuit for SparsityConstrained Optimization. In Proceedings of the 31st International Conference on International Conference on Machine Learning  Volume 32 (pp. 127–135). Beijing, China: JMLR.org
 DuKL06: Pan Du, Warren A. Kibbe, Simon M. Lin (2006) Improved peak detection in mass spectrum by incorporating continuous wavelet transformbased pattern matching. Bioinformatics, 22(17), 2059–2065. DOI
 AhEB06: M. Aharon, M. Elad, A. Bruckstein (2006) KSVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation. IEEE Transactions on Signal Processing, 54(11), 4311–4322. DOI
 GrLe10: Karol Gregor, Yann LeCun (2010) Learning fast approximations of sparse coding. In Proceedings of the 27th International Conference on Machine Learning (ICML10) (pp. 399–406).
 LeSe00: Michael S. Lewicki, Terrence J. Sejnowski (2000) Learning Overcomplete Representations. Neural Computation, 12(2), 337–365. DOI
 BaJo06: Francis R. Bach, Michael I. Jordan (2006) Learning spectral clustering, with application to speech separation. Journal of Machine Learning Research, 7(Oct), 1963–2001.
 WaZh17: Yu Guang Wang, Houying Zhu (2017) Localized Tight Frames and Fast Framelet Transforms on the Simplex. ArXiv:1701.01595 [Cs, Math].
 GoVe99: M M Goodwin, M Vetterli (1999) Matching pursuit and atomic signal models based on recursive filter banks. IEEE Transactions on Signal Processing, 47(7), 1890–1902. DOI
 MaZh93: Stéphane G. Mallat, Zhifeng Zhang (1993) Matching pursuits with timefrequency dictionaries. IEEE Transactions on Signal Processing, 41(12), 3397–3415.
 MoPe10: Debashis Mondal, Donald B. Percival (2010) Mestimation of wavelet variance. Annals of the Institute of Statistical Mathematics, 64(1), 27–53. DOI
 BCDH10: Richard G. Baraniuk, Volkan Cevher, Marco F. Duarte, Chinmay Hegde (2010) Modelbased compressive sensing. IEEE Transactions on Information Theory, 56(4), 1982–2001. DOI
 Mall89: Stephane G. Mallat (1989) Multiresolution approximations and wavelet orthonormal bases of L²(R). Transactions of the American Mathematical Society, 315(1), 69–87. DOI
 Good01: M M Goodwin (2001) Multiscale overlapadd sinusoidal modeling using matching pursuit and refinements. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.
 CVVR11: J. J. CarabiasOrti, T. Virtanen, P. VeraCandeas, N. RuizReyes, F. J. CanadasQuesada (2011) Musical Instrument Sound MultiExcitation Model for NonNegative Spectrogram Factorization. IEEE Journal of Selected Topics in Signal Processing, 5(6), 1144–1158. DOI
 HyHH09: Aapo Hyvärinen, Jarmo Hurri, Patrick O. Hoyer (2009) Natural Image Statistics: A Probabilistic Approach to Early Computational Vision (Vol. 39). Springer Science & Business Media
 OlFi96: B. A. Olshausen, D. J. Field (1996) Natural image statistics and efficient coding. Network (Bristol, England), 7(2), 333–339. DOI
 SiOl01: Eero P Simoncelli, Bruno A Olshausen (2001) Natural Image Statistics and Neural Representation. Annual Review of Neuroscience, 24(1), 1193–1216. DOI
 GRCL17: Sahil Garg, Irina Rish, Guillermo Cecchi, Aurelie Lozano (2017) NeurogenesisInspired Dictionary Learning: Online Model Adaption in a Changing World. In arXiv:1701.06106 [cs, stat].
 Devo98: Ronald A. DeVore (1998) Nonlinear approximation. Acta Numerica, 7, 51–150. DOI
 OpWY01: Jean Opsomer, Yuedong Wang, Yuhong Yang (2001) Nonparametric Regression with Correlated Errors. Statistical Science, 16(2), 134153.
 BlDa04: Thomas Blumensath, Mike Davies (2004) On ShiftInvariant Sparse Coding. In Independent Component Analysis and Blind Signal Separation (Vol. 3195, pp. 1205–1212). Berlin, Heidelberg: Springer Berlin Heidelberg
 MBPS09: Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro (2009) Online Dictionary Learning for Sparse Coding. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 689–696). New York, NY, USA: ACM DOI
 MBPS10: Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro (2010) Online learning for matrix factorization and sparse coding. The Journal of Machine Learning Research, 11, 19–60.
 Daub88: Ingrid Daubechies (1988) Orthonormal bases of compactly supported wavelets. Communications on Pure and Applied Mathematics, 41(7), 909–996. DOI
 YaDD09: M. Yaghoobi, L. Daudet, M. E. Davies (2009) Parametric dictionary design for sparse coding. IEEE Transactions on Signal Processing, 57(12), 4800–4810. DOI
 KWSR16: Alec Koppel, Garrett Warnell, Ethan Stump, Alejandro Ribeiro (2016) Parsimonious Online Learning with Kernels via Sparse Projections in Function Space. ArXiv:1612.04111 [Cs, Stat].
 WeVe12: Claudio Weidmann, Martin Vetterli (2012) Rate Distortion Behavior of Sparse Sources. IEEE Transactions on Information Theory, 58(8), 4969–4992. DOI
 EkTS11: C. Ekanadham, D. Tranchina, E. P. Simoncelli (2011) Recovery of Sparse TranslationInvariant Signals With Continuous Basis Pursuit. IEEE Transactions on Signal Processing, 59(10), 4735–4744. DOI
 HuCB08: Cong Huang, G. L. H. Cheang, Andrew R. Barron (2008) Risk of penalized least squares, greedy selection and l1 penalization for flexible function libraries.
 OyBZ17: Edouard Oyallon, Eugene Belilovsky, Sergey Zagoruyko (2017) Scaling the Scattering Transform: Deep Hybrid Networks. ArXiv Preprint ArXiv:1703.08961.
 BLMM12: Quentin Barthélemy, Anthony Larue, Aurélien Mayoue, David Mercier, Jérôme I. Mars (2012) Shift & 2D rotation invariant sparse coding for multivariate signals. IEEE Transactions on Signal Processing, 60(4), 1597–1611.
 MøSH07: Morten Mørup, Mikkel N. Schmidt, Lars K. Hansen (2007) Shift invariant sparse coding of image and music data. Journal of Machine Learning Research.
 GRKN07: Roger Grosse, Rajat Raina, Helen Kwong, Andrew Y. Ng (2007) ShiftInvariant Sparse Coding for Audio Classification. In The TwentyThird Conference on Uncertainty in Artificial Intelligence (UAI2007) (Vol. 9, p. 8).
 QiCh94: Shie Qian, Dapang Chen (1994) Signal representation using adaptive normalized Gaussian functions. Signal Processing, 36(1), 1–11. DOI
 AGMM15: Sanjeev Arora, Rong Ge, Tengyu Ma, Ankur Moitra (2015) Simple, efficient, and neural algorithms for sparse coding. In Proceedings of The 28th Conference on Learning Theory (Vol. 40, pp. 113–149). Paris, France: PMLR
 HaZh15: Bin Han, Xiaosheng Zhuang (2015) Smooth affine shear tight frames with MRA structure. Applied and Computational Harmonic Analysis, 39(2), 300–338. DOI
 GuPe16: Pawan Gupta, Marianna Pensky (2016) Solution of linear illposed problems using random dictionaries. ArXiv:1605.07913 [Math, Stat].
 BlDa06: Thomas Blumensath, Mike Davies (2006) Sparse and shiftInvariant representations of music. IEEE Transactions on Audio, Speech and Language Processing, 14(1), 50–57. DOI
 OlFi04: Bruno A Olshausen, David J Field (2004) Sparse coding of sensory inputs. Current Opinion in Neurobiology, 14(4), 481–487. DOI
 Mlyn13: Wiktor Mlynarski (2013) Sparse, complexvalued representations of natural sounds learned with phase and amplitude continuity priors. ArXiv Preprint ArXiv:1312.4695.
 KoCo16: Parker Koch, Jason J. Corso (2016) Sparse Factorization Layers for Neural Networks with Limited Supervision. ArXiv:1612.04468 [Cs, Stat].
 NCBK11: Jiquan Ngiam, Zhenghao Chen, Sonia A. Bhaskar, Pang W. Koh, Andrew Y. Ng (2011) Sparse filtering. In Advances in Neural Information Processing Systems 24 (pp. 1125–1133). Curran Associates, Inc.
 MaBP14: Julien Mairal, Francis Bach, Jean Ponce (2014) Sparse modeling for image and vision processing. Foundations and Trends® in Comput Graph. Vis., 8(2–3), 85–283. DOI
 Dong15: Bin Dong (2015) Sparse representation on graphs by tight wavelet frames and applications. Applied and Computational Harmonic Analysis. DOI
 PABD06: Mark D. Plumbley, Samer A. Abdallah, Thomas Blumensath, Michael E. Davies (2006) Sparse representations of polyphonic music. Signal Processing, 86(3), 417–431. DOI
 Wohl17: Brendt Wohlberg (2017) SPORCO: A Python package for standard and convolutional sparse representations.
 MaMD14: Gary Marcus, Adam Marblestone, Thomas Dean (2014) The atoms of neural computation. Science, 346(6209), 551–552. DOI
 WaST14: YuXiang Wang, Alex Smola, Ryan J. Tibshirani (2014) The Falling Factorial Basis and Its Statistical Applications. ArXiv:1405.0558 [Stat].
 VaMB11: Daniel Vainsencher, Shie Mannor, Alfred M. Bruckstein (2011) The Sample Complexity of Dictionary Learning. Journal of Machine Learning Research, 12(Nov), 3259–3281.
 WaZh16: Yu Guang Wang, Xiaosheng Zhuang (2016) Tight framelets and fast framelet transforms on manifolds. ArXiv:1608.04026 [Math].
 GiNi09: Evarist Giné, Richard Nickl (2009) Uniform limit theorems for wavelet density estimators. The Annals of Probability, 37(4), 1605–1646. DOI
 HJKL11: Mikael Henaff, Kevin Jarrett, Koray Kavukcuoglu, Yann LeCun (2011) Unsupervised learning of sparse features for scalable audio classification. In ISMIR.
 FaLi01: Jianqing Fan, Runze Li (2001) Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Journal of the American Statistical Association, 96(456), 1348–1360. DOI
 Gray84: R. Gray (1984) Vector quantization. IEEE ASSP Magazine, 1(2), 4–29. DOI
 GeGr12: Allen Gersho, Robert M. Gray (2012) Vector Quantization and Signal Compression. Springer Science & Business Media
 Shen10: Z. Shen (2010) Wavelet frames and image restorations. In Scopus (pp. 2834–2863). World Scientific
 DJKP95: David L. Donoho, Iain M. Johnstone, Gerard Kerkyacharian, Dominique Picard (1995) Wavelet Shrinkage: Asymptopia? Journal of the Royal Statistical Society. Series B (Methodological), 57(2), 301–369.
 Vett99: Martin Vetterli (1999) Wavelets: approximation and compression–a review. In AeroSense’99 (Vol. 3723, pp. 28–31). International Society for Optics and Photonics DOI