# Sparse coding

### How to quickly make big things out of short lists of small things.

Linear expansion with dictionaries of basis functions, with respect to which you wish your representation to be sparse; i.e. in the statistical case, basis-sparse regression. But even outside statistics, you wish simply to approximate some data compactly. My focus here is on the noisy-observation case, although the same results are recycled enough throughout the field.

Note that there are two ways you can get your representation to be sparse;

• you know that your signal happens to be compressible, in the sense that under some transform its coefficient vector is mostly zeros, even in a plain old orthogonal basis expansion.

• you are using a redundant dictionary such that you won’t need most of it to represent even a dense signal.

I should break these two notions apart here. For now, I’m especially interested in adaptive bases.

This is merely a bunch of links to important articles at the moment; I should do a little exposition one day.

Decomposition of stuff by matching pursuit, wavelets, curvelets, chirplets, framelets, shearlets, camelhairbrushlets, content-specific basis dictionaries, designed or learned. Mammals visual cortexes seem to use something like this, if you squint right at the evidence.

To discuss:

• connection to mixture models.

• Sampling complexity versus approximation complexity

• am especially interested in approaches where we learn the transform or the basis dictionary unsupervised

## Resources

Baraniuk’s lab has a comprehensive, but not usefully annotated, selection of articles in this field, which I include more to demonstrate the virtue of a literature review by showing the pathology of its absence, rather than as a useful starting point.

## Classics: Wavelet bases

Very popoular practical intro is Torrence and Comp.

TBD

## Learnable codings

I want to generalise or extend this idea, ideally in some shift-invariant way (see below.)

Oldhausen and Field (OlFi96a) kicked this area off by arguing sparse coding tricks are revealing of what the brain does.

For a walk through of one version of this, see Theano example of dictionary learning by Daniel LaCombe, who bases his version on NCBK11, HyHH09 and HLLB15.

See MBPS09 for some a summary of methods to 2009 in basis learning.

Question: how do you do this in a big data / offline setting?

Analytical sparsifying transforms such as Wavelets and DCT have been widely used in compression standards. Recently, the data-driven learning of sparse models such as the synthesis dictionary model have become popular especially in applications such as denoising, inpainting, compressed sensing, etc. Our group’s research at the University of Illinois focuses on the data-driven adaptation of the alternative sparsifying transform model, which offers numerous advantages over the synthesis dictionary model.

We have proposed several methods for batch learning of square or overcomplete sparsifying transforms from data. We have also investigated specific structures for these transforms such as double sparsity, union-of-transforms, and filter bank structures, which enable their efficient learning or usage. Apart from batch transform learning, our group has investigated methods for online learning of sparsifying transforms, which are particularly useful for big data or real-time applications.

Huh.

### Shift-invariant codings

I would like to find a general way of doing this in a phase/shift-robust fashion, although doing this naively can be computationally expensive outside of certain convenient bases.

(Sorry, that’s not very clear; I need to return to this section to polish it up.)

It can be better if you know you have have constraints, say, a convex combination (e.g. mixtures) or positive definite bases. (e.g. in kernel methods)

One method is “Shift Invariant Sparse coding”, ( BlDa04) and there are various versions out there. (GRKN07 etc) One way is to include multiple shifted copies of your atoms, another is to actually shift them in a separate optimisation stage. Both these get annoying in the time domain for various reasons.

Affine tight framelets (DHRS03) and their presumably less-computationally-tractable, more flexible cousins, shearlets also sound interesting here. For reasons I do not yet understand I am told they can naturally be used on sundry graphs and manifolds, not just lattices, is traditional in DSP. I saw Xiaosheng Zhuang present these (see, e.g. HaZZ16 and WaZh16, where WaZh16 demonstrates a Fast Framelet Transform which is supposedly as computationally as cheap as the FFT.)

I have some ideas I call learning gamelan which relate to this.

TBD.

## Implementations

This boils down to clever optimisation to make the calculations tractable.

• the wavelet toolkits.

• scipy’s wavelet transform has no frills and little coherent explanation, but it goes

• pywavelets does various fancy wavelets and seems to be a standard for python.

• Matlab’s Wavelet toolbox seems to be the reference.

• scikit-learn dictionary learning version here

• also pydbm

• Fancy easy GPU wavelet implementation, PyTorchWavelets.

• SPORCO

SParse Optimization Research COde (SPORCO) is an open-source Python package for solving optimization problems with sparsity-inducing regularization, consisting primarily of sparse coding and dictionary learning, for both standard and convolutional forms of sparse representation. In the current version, all optimization problems are solved within the Alternating Direction Method of Multipliers (ADMM) framework. SPORCO was developed for applications in signal and image processing, but is also expected to be useful for problems in computer vision, statistics, and machine learning.

• Sparse-filtering: Unsupervised feature learning based on sparse-filtering

This implements the method described Jiquan Ngiam, Pang Wei Koh, Zhenghao Chen, Sonia Bhaskar, Andrew Y. Ng: Sparse Filtering. NIPS 2011: 1125-1133 and is based on the Matlab code provided in the supplementary material

• spams does a variety of sparse codings, although non of them accepting pluggable models. Nonetheless it does some neat things fast. (see optimisation)