Related: â€śclusteringâ€ť? Also with the notion of similarity as seen in kernel tricks. Inducing a differential metric. Matrix factorisations and random features, highdimensional statistics and discuss random projections and their role in compressed sensing etc.
misc
Dimensionality reduction 101: linear algebra, hidden variables and generative models
Diffusion maps: another way of mapping graphs to \(\mathbb{R}^d\).
PCA and cousins
Kernel PCA, linear algebra and probabilistic formulations.
Linear algebra version of PCA puts us in the world of matrix factorisationa.
Nonlinear versions

Principal component analysis (PCA) is one of the most versatile tools for unsupervised learning with applications ranging from dimensionality reduction to exploratory data analysis and visualization. While much effort has been devoted to encouraging meaningful representations through regularization (e.g.Â nonnegativity or sparsity), underlying linearity assumptions can limit their effectiveness. To address this issue, we propose Additive Component Analysis (ACA), a novel nonlinear extension of PCA. Inspired by multivariate nonparametric regression with additive models, ACA fits a smooth manifold to data by learning an explicit mapping from a lowdimensional latent space to the input space, which trivially enables applications like denoising. Furthermore, ACA can be used as a dropin replacement in many algorithms that use linear component analysis methods as a subroutine via the local tangent space of the learned manifold. Unlike many other nonlinear dimensionality reduction techniques, ACA can be efficiently applied to large datasets since it does not require computing pairwise similarities or storing training data during testing. Multiple ACA layers can also be composed and learned jointly with essentially the same procedure for improved representational power, demonstrating the encouraging potential of nonparametric deep learning. We evaluate ACA on a variety of datasets, showing improved robustness, reconstruction performance, and interpretability.
Autoencoder and word2vec
â€śThe nonlinear PCAâ€ť interpretation, I just heard from Junbin Gao.
\[L(x, x') = \xx\^2=\x\sigma(U*sigma*W^Tx+b)) + b')\^2\]
Locality Preserving projections
Try to preserve the nearness of points if they are connected on some (weight) graph.
\[\sum_{i,j}(y_iy_j)^2 w_{i,j}\]
So we seen an optimal projection vector.
(requirement for sparse similarity matrix?)
Multidimensional scaling
TDB.
Topological data analysis
Start with the distances between points and try to find a lower dimensional manifold which preserves their distances. Local MDS? TDB.
Random projection
Stochastic neighbour embedding
Probabilisitically preserving closeness.
Refs
Cook, R. Dennis. 2018. â€śPrincipal Components, Sufficient Dimension Reduction, and Envelopes.â€ť Annual Review of Statistics and Its Application 5 (1): 533â€“59. https://doi.org/10.1146/annurevstatistics031017100257.
Globerson, Amir, and Sam T. Roweis. 2006. â€śMetric Learning by Collapsing Classes.â€ť In Advances in Neural Information Processing Systems, 451â€“58. NIPSâ€™05. Cambridge, MA, USA: MIT Press. http://papers.nips.cc/paper/2947metriclearningbycollapsingclasses.pdf.
Goroshin, Ross, Joan Bruna, Jonathan Tompson, David Eigen, and Yann LeCun. 2014. â€śUnsupervised Learning of Spatiotemporally Coherent Metrics,â€ť December. http://arxiv.org/abs/1412.6056.
Hadsell, R., S. Chopra, and Y. LeCun. 2006. â€śDimensionality Reduction by Learning an Invariant Mapping.â€ť In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2:1735â€“42. https://doi.org/10.1109/CVPR.2006.100.
Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. 2006. â€śReducing the Dimensionality of Data with Neural Networks.â€ť Science 313 (5786): 504â€“7. https://doi.org/10.1126/science.1127647.
Hinton, Geoffrey, and Sam Roweis. 2002. â€śStochastic Neighbor Embedding.â€ť In Proceedings of the 15th International Conference on Neural Information Processing Systems, 857â€“64. NIPSâ€™02. Cambridge, MA, USA: MIT Press. http://papers.nips.cc/paper/2276stochasticneighborembedding.pdf.
Lawrence, Neil. 2005. â€śProbabilistic NonLinear Principal Component Analysis with Gaussian Process Latent Variable Models.â€ť Journal of Machine Learning Research 6 (Nov): 1783â€“1816. http://www.jmlr.org/papers/v6/lawrence05a.html.
LopezPaz, David, Suvrit Sra, Alex Smola, Zoubin Ghahramani, and Bernhard SchĂ¶lkopf. 2014. â€śRandomized Nonlinear Component Analysis,â€ť February. http://arxiv.org/abs/1402.0119.
Maaten, Laurens van der, and Geoffrey Hinton. 2008. â€śVisualizing Data Using TSNE.â€ť Journal of Machine Learning Research 9 (Nov): 2579â€“2605. http://www.jmlr.org/papers/v9/vandermaaten08a.html.
Murdock, Calvin, and Fernando De la Torre. 2017. â€śAdditive Component Analysis.â€ť In Conference on Computer Vision and Pattern Recognition (CVPR). http://www.calvinmurdock.com/content/uploads/publications/cvpr2017aca.pdf.
Oymak, Samet, and Joel A. Tropp. 2015. â€śUniversality Laws for Randomized Dimension Reduction, with Applications,â€ť November. http://arxiv.org/abs/1511.09433.
PeluffoOrdĂłnez, Diego H., John A. Lee, and Michel Verleysen. 2014. â€śShort Review of Dimensionality Reduction Methods Based on Stochastic Neighbour Embedding.â€ť In Advances in SelfOrganizing Maps and Learning Vector Quantization, 65â€“74. Springer. http://link.springer.com/chapter/10.1007/9783319076959_6.
Salakhutdinov, Ruslan, and Geoff Hinton. 2007. â€śLearning a Nonlinear Embedding by Preserving Class Neighbourhood Structure.â€ť In PMLR, 412â€“19. http://proceedings.mlr.press/v2/salakhutdinov07a.html.
Smola, Alex J., Robert C. Williamson, Sebastian Mika, and Bernhard SchĂ¶lkopf. 1999. â€śRegularized Principal Manifolds.â€ť In Computational Learning Theory, edited by Paul Fischer and Hans Ulrich Simon, 214â€“29. Lecture Notes in Computer Science 1572. Springer Berlin Heidelberg. http://link.springer.com/chapter/10.1007/3540490973_17.
Sohn, Kihyuk, and Honglak Lee. 2012. â€śLearning Invariant Representations with Local Transformations.â€ť In Proceedings of the 29th International Conference on Machine Learning (ICML12), 1311â€“8. http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2012Sohn_659.pdf.
Sorzano, C. O. S., J. Vargas, and A. Pascual Montano. 2014. â€śA Survey of Dimensionality Reduction Techniques,â€ť March. http://arxiv.org/abs/1403.2877.
Wang, Boyue, Yongli Hu, Junbin Gao, Yanfeng Sun, Haoran Chen, and Baocai Yin. 2017. â€śLocality Preserving Projections for Grassmann Manifold.â€ť In PRoceedings of IJCAI, 2017. http://arxiv.org/abs/1704.08458.
Wasserman, Larry. 2018. â€śTopological Data Analysis.â€ť Annual Review of Statistics and Its Application 5 (1): 501â€“32. https://doi.org/10.1146/annurevstatistics031017100045.
Weinberger, Kilian, Anirban Dasgupta, John Langford, Alex Smola, and Josh Attenberg. 2009. â€śFeature Hashing for Large Scale Multitask Learning.â€ť In Proceedings of the 26th Annual International Conference on Machine Learning, 1113â€“20. ICML â€™09. New York, NY, USA: ACM. https://doi.org/10.1145/1553374.1553516.