Random embeddings and hashing

December 6, 2016 — December 1, 2020

feature construction
functional analysis
linear algebra
probabilistic algorithms
probability
sparser than thou
statistics
Figure 1: Separation of inputs by random projection

See also matrix factorisations, for some extra ideas on why random projections have a role in motivating compressed sensing, randomised regressions etc.

Occasionally we might use non-linear projections to increase the dimensionality of our data in the hope of making a non-linear regression approximately linear, which dates back to (Cover 1965).

Cover’s Theorem (Cover 1965):

It was shown that, for a random set of linear inequalities in \(d\) unknowns, the expected number of extreme inequalities, which are necessary and sufficient to imply the entire set, tends to \(2d\) as the number of consistent inequalities tends to infinity, thus bounding the expected necessary storage capacity for linear decision algorithms in separable problems. The results, even those dealing with randomly positioned points, have been combinatorial in nature, and have been essentially independent of the configuration of the set of points in the space.

I am especially interested in random embeddings for kernel approximation.

Over at compressed sensing we mention some other useful such as the Johnson-Lindenstrauss lemma, and these ideas are closely related, in the probabilistic setting, to concentration inequalities.

1 References

Achlioptas. 2003. Database-Friendly Random Projections: Johnson-Lindenstrauss with Binary Coins.” Journal of Computer and System Sciences, Special Issue on PODS 2001,.
Ailon, and Chazelle. 2009. The Fast Johnson–Lindenstrauss Transform and Approximate Nearest Neighbors.” SIAM Journal on Computing.
Alaoui, and Mahoney. 2014. Fast Randomized Kernel Methods With Statistical Guarantees.” arXiv:1411.0306 [Cs, Stat].
Andoni, A., and Indyk. 2006. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions.” In 47th Annual IEEE Symposium on Foundations of Computer Science, 2006. FOCS ’06.
Andoni, Alexandr, Indyk, Nguyen, et al. 2013. Beyond Locality-Sensitive Hashing.” arXiv:1306.1547 [Cs].
Andoni, Alexandr, and Razenshteyn. 2015. Optimal Data-Dependent Hashing for Approximate Near Neighbors.” arXiv:1501.01062 [Cs].
Auvolat, and Vincent. 2015. Clustering Is Efficient for Approximate Maximum Inner Product Search.” arXiv:1507.05910 [Cs, Stat].
Bach. 2015. On the Equivalence Between Kernel Quadrature Rules and Random Feature Expansions.”
Baraniuk, Davenport, DeVore, et al. 2008. A Simple Proof of the Restricted Isometry Property for Random Matrices.” Constructive Approximation.
Bingham, and Mannila. 2001. Random Projection in Dimensionality Reduction: Applications to Image and Text Data.” In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’01.
Brault, d’Alché-Buc, and Heinonen. 2016. Random Fourier Features for Operator-Valued Kernels.” In Proceedings of The 8th Asian Conference on Machine Learning.
Candès, and Tao. 2006. Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? IEEE Transactions on Information Theory.
Casey, Rhodes, and Slaney. 2008. Analysis of Minimum Distances in High-Dimensional Musical Spaces.” IEEE Transactions on Audio, Speech, and Language Processing.
Celentano, Misiakiewicz, and Montanari. 2021. Minimum Complexity Interpolation in Random Features Models.”
Choromanski, Rowland, and Weller. 2017. The Unreasonable Effectiveness of Random Orthogonal Embeddings.” arXiv:1703.00864 [Stat].
Coleman, Baraniuk, and Shrivastava. 2020. Sub-Linear Memory Sketches for Near Neighbor Search on Streaming Data.” arXiv:1902.06687 [Cs, Eess, Stat].
Cover. 1965. Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition.” IEEE Transactions on Electronic Computers.
Dasgupta. 2000. Experiments with Random Projection.” In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence. UAI’00.
Dasgupta, and Gupta. 2003. An Elementary Proof of a Theorem of Johnson and Lindenstrauss.” Random Structures & Algorithms.
Datar, Immorlica, Indyk, et al. 2004. Locality-Sensitive Hashing Scheme Based on P-Stable Distributions.” In Proceedings of the Twentieth Annual Symposium on Computational Geometry. SCG ’04.
Dezfouli, and Bonilla. 2015. Scalable Inference for Gaussian Process Models with Black-Box Likelihoods.” In Advances in Neural Information Processing Systems 28. NIPS’15.
Duarte, and Baraniuk. 2013. Spectral Compressive Sensing.” Applied and Computational Harmonic Analysis.
Eftekhari, Yap, Wakin, et al. 2016. Stabilizing Embedology: Geometry-Preserving Delay-Coordinate Maps.” arXiv:1609.06347 [Nlin, Stat].
Fodor. 2002. A Survey of Dimension Reduction Techniques.”
Freund, Dasgupta, Kabra, et al. 2007. Learning the Structure of Manifolds Using Random Projections.” In Advances in Neural Information Processing Systems.
Geurts, Ernst, and Wehenkel. 2006. Extremely Randomized Trees.” Machine Learning.
Ghojogh, Ghodsi, Karray, et al. 2021. Johnson-Lindenstrauss Lemma, Linear and Nonlinear Random Projections, Random Fourier Features, and Random Kitchen Sinks: Tutorial and Survey.” arXiv:2108.04172 [Cs, Math, Stat].
Gionis, Indyky, and Motwaniz. 1999. Similarity Search in High Dimensions via Hashing.” In.
Giryes, Sapiro, and Bronstein. 2016. Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy? IEEE Transactions on Signal Processing.
Gorban, Tyukin, and Romanenko. 2016. The Blessing of Dimensionality: Separation Theorems in the Thermodynamic Limit.” arXiv:1610.00494 [Cs, Stat].
Gottwald, and Reich. 2020. Supervised Learning from Noisy Observations: Combining Machine-Learning Techniques with Data Assimilation.” arXiv:2007.07383 [Physics, Stat].
Hall, and Li. 1993. On Almost Linearity of Low Dimensional Projections from High Dimensional Data.” The Annals of Statistics.
Heusser, Ziman, Owen, et al. 2017. HyperTools: A Python Toolbox for Visualizing and Manipulating High-Dimensional Data.” arXiv:1701.08290 [Stat].
Kammonen, Kiessling, Plecháč, et al. 2020. Adaptive Random Fourier Features with Metropolis Sampling.” arXiv:2007.10683 [Cs, Math].
Kane, and Nelson. 2014. Sparser Johnson-Lindenstrauss Transforms.” Journal of the ACM.
Kar, and Karnick. 2012. Random Feature Maps for Dot Product Kernels.” In Artificial Intelligence and Statistics.
Koltchinskii, and Giné. 2000. Random Matrix Approximation of Spectra of Integral Operators.” Bernoulli.
Koppel, Warnell, Stump, et al. 2016. Parsimonious Online Learning with Kernels via Sparse Projections in Function Space.” arXiv:1612.04111 [Cs, Stat].
Krummenacher, McWilliams, Kilcher, et al. 2016. Scalable Adaptive Stochastic Optimization Using Random Projections.” In Advances in Neural Information Processing Systems 29.
Kulis, and Grauman. 2012. Kernelized Locality-Sensitive Hashing.” IEEE Transactions on Pattern Analysis and Machine Intelligence.
Landweber, Lazar, and Patel. 2016. On Fiber Diameters of Continuous Maps.” American Mathematical Monthly.
Li, Ping, Hastie, and Church. 2006. Very Sparse Random Projections.” In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’06.
Li, Zhu, Ton, Oglic, et al. 2019. “Towards a Unified Analysis of Random Fourier Features.” In.
McWilliams, Balduzzi, and Buhmann. 2013. Correlated Random Features for Fast Semi-Supervised Learning.” In Advances in Neural Information Processing Systems 26.
Moosmann, Triggs, and Jurie. 2006. Fast Discriminative Visual Codebooks Using Randomized Clustering Forests.” In Advances in Neural Information Processing Systems.
Oveneke, Aliosha-Perez, Zhao, et al. 2016. Efficient Convolutional Auto-Encoding via Random Convexification and Frequency-Domain Minimization.” In Advances in Neural Information Processing Systems 29.
Oymak, and Tropp. 2015. Universality Laws for Randomized Dimension Reduction, with Applications.” arXiv:1511.09433 [Cs, Math, Stat].
Rahimi, and Recht. 2007. Random Features for Large-Scale Kernel Machines.” In Advances in Neural Information Processing Systems.
———. 2008. Uniform Approximation of Functions with Random Bases.” In 2008 46th Annual Allerton Conference on Communication, Control, and Computing.
Saul. 2023. A Geometrical Connection Between Sparse and Low-Rank Matrices and Its Application to Manifold Learning.” Transactions on Machine Learning Research.
Scardapane, and Wang. 2017. Randomness in Neural Networks: An Overview.” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery.
Shi, Sun, and Zhu. 2018. A Spectral Approach to Gradient Estimation for Implicit Distributions.” In.
Sinha, and Duchi. 2016. Learning Kernels with Random Features.” In Advances in Neural Information Processing Systems 29.
Sterge, and Sriperumbudur. 2021. Statistical Optimality and Computational Efficiency of Nyström Kernel PCA.” arXiv:2105.08875 [Cs, Math, Stat].
Tang, Athreya, Sussman, et al. 2014. A Nonparametric Two-Sample Hypothesis Testing Problem for Random Dot Product Graphs.” arXiv:1409.2344 [Math, Stat].
Weinberger, Dasgupta, Langford, et al. 2009. Feature Hashing for Large Scale Multitask Learning.” In Proceedings of the 26th Annual International Conference on Machine Learning. ICML ’09.
Zhang, Wang, Cai, et al. 2010. Self-Taught Hashing for Fast Similarity Search.” In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR ’10.