The Living Thing / Notebooks :

Dimensionality reduction

Wherein I teach myself, amongst other things, how a *sparse* PCA works, and decide where to file multidimensional scaling

Usefulness: 🔧
Novelty: 💡
Uncertainty: 🤪 🤪 🤪
Incompleteness: 🚧 🚧 🚧

Related: “clustering”? Also with the notion of similarity as seen in kernel tricks. Inducing a differential metric. Matrix factorisations and random features, high-dimensional statistics and discuss random projections and their role in compressed sensing etc.

misc

PCA and cousins

Kernel PCA, linear algebra and probabilistic formulations.

Linear algebra version of PCA puts us in the world of matrix factorisationa.

Non-linear versions

Autoencoder and word2vec

“The nonlinear PCA” interpretation, I just heard from Junbin Gao.

\[L(x, x') = \|x-x\|^2=\|x-\sigma(U*sigma*W^Tx+b)) + b')\|^2\]

Locality Preserving projections

Try to preserve the nearness of points if they are connected on some (weight) graph.

\[\sum_{i,j}(y_i-y_j)^2 w_{i,j}\]

So we seen an optimal projection vector.

(requirement for sparse similarity matrix?)

Multidimensional scaling

TDB.

Topological data analysis

Start with the distances between points and try to find a lower dimensional manifold which preserves their distances. Local MDS? TDB.

Random projection

see random embeddings

Stochastic neighbour embedding

Probabilisitically preserving closeness.

t-SNE

Refs

Cook, R. Dennis. 2018. “Principal Components, Sufficient Dimension Reduction, and Envelopes.” Annual Review of Statistics and Its Application 5 (1): 533–59. https://doi.org/10.1146/annurev-statistics-031017-100257.

Globerson, Amir, and Sam T. Roweis. 2006. “Metric Learning by Collapsing Classes.” In Advances in Neural Information Processing Systems, 451–58. NIPS’05. Cambridge, MA, USA: MIT Press. http://papers.nips.cc/paper/2947-metric-learning-by-collapsing-classes.pdf.

Goroshin, Ross, Joan Bruna, Jonathan Tompson, David Eigen, and Yann LeCun. 2014. “Unsupervised Learning of Spatiotemporally Coherent Metrics,” December. http://arxiv.org/abs/1412.6056.

Hadsell, R., S. Chopra, and Y. LeCun. 2006. “Dimensionality Reduction by Learning an Invariant Mapping.” In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2:1735–42. https://doi.org/10.1109/CVPR.2006.100.

Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. 2006. “Reducing the Dimensionality of Data with Neural Networks.” Science 313 (5786): 504–7. https://doi.org/10.1126/science.1127647.

Hinton, Geoffrey, and Sam Roweis. 2002. “Stochastic Neighbor Embedding.” In Proceedings of the 15th International Conference on Neural Information Processing Systems, 857–64. NIPS’02. Cambridge, MA, USA: MIT Press. http://papers.nips.cc/paper/2276-stochastic-neighbor-embedding.pdf.

Lawrence, Neil. 2005. “Probabilistic Non-Linear Principal Component Analysis with Gaussian Process Latent Variable Models.” Journal of Machine Learning Research 6 (Nov): 1783–1816. http://www.jmlr.org/papers/v6/lawrence05a.html.

Lopez-Paz, David, Suvrit Sra, Alex Smola, Zoubin Ghahramani, and Bernhard Schölkopf. 2014. “Randomized Nonlinear Component Analysis,” February. http://arxiv.org/abs/1402.0119.

Maaten, Laurens van der, and Geoffrey Hinton. 2008. “Visualizing Data Using T-SNE.” Journal of Machine Learning Research 9 (Nov): 2579–2605. http://www.jmlr.org/papers/v9/vandermaaten08a.html.

Murdock, Calvin, and Fernando De la Torre. 2017. “Additive Component Analysis.” In Conference on Computer Vision and Pattern Recognition (CVPR). http://www.calvinmurdock.com/content/uploads/publications/cvpr2017aca.pdf.

Oymak, Samet, and Joel A. Tropp. 2015. “Universality Laws for Randomized Dimension Reduction, with Applications,” November. http://arxiv.org/abs/1511.09433.

Peluffo-Ordónez, Diego H., John A. Lee, and Michel Verleysen. 2014. “Short Review of Dimensionality Reduction Methods Based on Stochastic Neighbour Embedding.” In Advances in Self-Organizing Maps and Learning Vector Quantization, 65–74. Springer. http://link.springer.com/chapter/10.1007/978-3-319-07695-9_6.

Salakhutdinov, Ruslan, and Geoff Hinton. 2007. “Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure.” In PMLR, 412–19. http://proceedings.mlr.press/v2/salakhutdinov07a.html.

Smola, Alex J., Robert C. Williamson, Sebastian Mika, and Bernhard Schölkopf. 1999. “Regularized Principal Manifolds.” In Computational Learning Theory, edited by Paul Fischer and Hans Ulrich Simon, 214–29. Lecture Notes in Computer Science 1572. Springer Berlin Heidelberg. http://link.springer.com/chapter/10.1007/3-540-49097-3_17.

Sohn, Kihyuk, and Honglak Lee. 2012. “Learning Invariant Representations with Local Transformations.” In Proceedings of the 29th International Conference on Machine Learning (ICML-12), 1311–8. http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2012Sohn_659.pdf.

Sorzano, C. O. S., J. Vargas, and A. Pascual Montano. 2014. “A Survey of Dimensionality Reduction Techniques,” March. http://arxiv.org/abs/1403.2877.

Wang, Boyue, Yongli Hu, Junbin Gao, Yanfeng Sun, Haoran Chen, and Baocai Yin. 2017. “Locality Preserving Projections for Grassmann Manifold.” In PRoceedings of IJCAI, 2017. http://arxiv.org/abs/1704.08458.

Wasserman, Larry. 2018. “Topological Data Analysis.” Annual Review of Statistics and Its Application 5 (1): 501–32. https://doi.org/10.1146/annurev-statistics-031017-100045.

Weinberger, Kilian, Anirban Dasgupta, John Langford, Alex Smola, and Josh Attenberg. 2009. “Feature Hashing for Large Scale Multitask Learning.” In Proceedings of the 26th Annual International Conference on Machine Learning, 1113–20. ICML ’09. New York, NY, USA: ACM. https://doi.org/10.1145/1553374.1553516.