Dimensionality reduction

Wherein I teach myself, amongst other things, how a sparse PCA works, and work out where to file multidimensional scaling.

Should I rename this to “feature construction”? Some of the same techniques, but we drop the assumption that we wish to decrease the number of dimensions. Or even “clustering”? The overlap is considerable.

See also matrix factorisations and random features, high-dimensional statistics and discuss random projections and their role in compressed sensing etc.

• Dimensionality reduction 101: linear algebra, hidden variables and generative models

• Diffusion maps: another way of mapping graphs to $\mathbb{R}^d$

• Principal component analysis (PCA) is one of the most versatile tools for unsupervised learning with applications ranging from dimensionality reduction to exploratory data analysis and visualization. While much effort has been devoted to encouraging meaningful representations through regularization (e.g. non-negativity or sparsity), underlying linearity assumptions can limit their effectiveness. To address this issue, we propose Additive Component Analysis (ACA), a novel nonlinear extension of PCA. Inspired by multivariate nonparametric regression with additive models, ACA fits a smooth manifold to data by learning an explicit mapping from a low-dimensional latent space to the input space, which trivially enables applications like denoising. Furthermore, ACA can be used as a drop-in replacement in many algorithms that use linear component analysis methods as a subroutine via the local tangent space of the learned manifold. Unlike many other nonlinear dimensionality reduction techniques, ACA can be efficiently applied to large datasets since it does not require computing pairwise similarities or storing training data during testing. Multiple ACA layers can also be composed and learned jointly with essentially the same procedure for improved representational power, demonstrating the encouraging potential of nonparametric deep learning. We evaluate ACA on a variety of datasets, showing improved robustness, reconstruction performance, and interpretability.

Special case: t-SNE