See also matrix factorisations, and discuss random projections and their role in motivating compressed sensing etc.

Cover’s Theorem (Cove65)

It was shown that, for a random set of linear inequalities in \(d\) unknowns, the expected number of extreme inequalities, which are necessary and sufficient to imply the entire set, tends to \(2d\) as the number of consistent inequalities tends to infinity, thus bounding the expected necessary storage capacity for linear decision algorithms in separable problems. The results, even those dealing with randomly positioned points, have been combinatorial in nature, and have been essentially independent of the configuration of the set of points in the space.

I am especially interested in random embeddings for kernel approximation.

Over at compressed sensing we mention some other random projection results, such as the Johnson-Lindenstrauss lemma, and these ideas are closely related, in the probabilistic setting, to concentration inequalities.

Landweber etc al (LaLP16) have an example of a converse result about your continuous random embeddings. (TBD)

## Refs

- Achl03
- Achlioptas, D. (2003) Database-friendly random projections: Johnson-Lindenstrauss with binary coins.
*Journal of Computer and System Sciences*, 66(4), 671–687. DOI. - AiCh09
- Ailon, N., & Chazelle, B. (2009) The Fast Johnson–Lindenstrauss Transform and Approximate Nearest Neighbors.
*SIAM Journal on Computing*, 39(1), 302–322. DOI. - AlMa14
- Alaoui, A. E., & Mahoney, M. W.(2014) Fast Randomized Kernel Methods With Statistical Guarantees.
*arXiv:1411.0306 [Cs, Stat]*. - AnIn06
- Andoni, A., & Indyk, P. (2006) Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. In 47th Annual IEEE Symposium on Foundations of Computer Science, 2006. FOCS ’06 (Vol. 51, pp. 459–468). DOI.
- AINR13
- Andoni, Alexandr, Indyk, P., Nguyen, H. L., & Razenshteyn, I. (2013) Beyond Locality-Sensitive Hashing.
*arXiv:1306.1547 [Cs]*. - AnRa15
- Andoni, Alexandr, & Razenshteyn, I. (2015) Optimal Data-Dependent Hashing for Approximate Near Neighbors.
*arXiv:1501.01062 [Cs]*. - AuVi15
- Auvolat, A., & Vincent, P. (2015) Clustering is Efficient for Approximate Maximum Inner Product Search.
*arXiv:1507.05910 [Cs, Stat]*. - Bach15
- Bach, F. (2015) On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions.
*arXiv Preprint arXiv:1502.06800*. - BDDW08
- Baraniuk, R., Davenport, M., DeVore, R., & Wakin, M. (2008) A Simple Proof of the Restricted Isometry Property for Random Matrices.
*Constructive Approximation*, 28(3), 253–263. DOI. - BiMa01
- Bingham, E., & Mannila, H. (2001) Random Projection in Dimensionality Reduction: Applications to Image and Text Data. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 245–250). New York, NY, USA: ACM DOI.
- CaTa06
- Candès, E. J., & Tao, T. (2006) Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?.
*IEEE Transactions on Information Theory*, 52(12), 5406–5425. DOI. - CaRS08
- Casey, M., Rhodes, C., & Slaney, M. (2008) Analysis of Minimum Distances in High-Dimensional Musical Spaces.
*IEEE Transactions on Audio, Speech, and Language Processing*, 16(5), 1015–1028. DOI. - Cove65
- Cover, T. M.(1965) Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition.
*IEEE Transactions on Electronic Computers*, EC-14(3), 326–334. DOI. - Dasg00
- Dasgupta, S. (2000) Experiments with Random Projection. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (pp. 143–151). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
- DaGu03
- Dasgupta, S., & Gupta, A. (2003) An elementary proof of a theorem of Johnson and Lindenstrauss.
*Random Structures & Algorithms*, 22(1), 60–65. DOI. - DIIM04
- Datar, M., Immorlica, N., Indyk, P., & Mirrokni, V. S.(2004) Locality-sensitive Hashing Scheme Based on P-stable Distributions. In Proceedings of the Twentieth Annual Symposium on Computational Geometry (pp. 253–262). New York, NY, USA: ACM DOI.
- DuBa13
- Duarte, M. F., & Baraniuk, R. G.(2013) Spectral compressive sensing.
*Applied and Computational Harmonic Analysis*, 35(1), 111–129. DOI. - EYWR16
- Eftekhari, A., Yap, H. L., Wakin, M. B., & Rozell, C. J.(2016) Stabilizing Embedology: Geometry-Preserving Delay-Coordinate Maps.
*arXiv:1609.06347 [Nlin, Stat]*. - Fodo02
- Fodor, I. (2002) A Survey of Dimension Reduction Techniques.
- FDKV07
- Freund, Y., Dasgupta, S., Kabra, M., & Verma, N. (2007) Learning the structure of manifolds using random projections. In Advances in Neural Information Processing Systems (pp. 473–480).
- GeEW06
- Geurts, P., Ernst, D., & Wehenkel, L. (2006) Extremely randomized trees.
*Machine Learning*, 63(1), 3–42. DOI. - GiIM99
- Gionis, A., Indyky, P., & Motwaniz, R. (1999) Similarity Search in High Dimensions via Hashing. . Presented at the VLDB Conference
- GiSB16
- Giryes, R., Sapiro, G., & Bronstein, A. M.(2016) Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?.
*IEEE Transactions on Signal Processing*, 64(13), 3444–3457. DOI. - GoTR16
- Gorban, A. N., Tyukin, I. Y., & Romanenko, I. (2016) The Blessing of Dimensionality: Separation Theorems in the Thermodynamic Limit.
*arXiv:1610.00494 [Cs, Stat]*. - HaLi93
- Hall, P., & Li, K.-C. (1993) On almost Linearity of Low Dimensional Projections from High Dimensional Data.
*The Annals of Statistics*, 21(2), 867–889. - HZOM17
- Heusser, A. C., Ziman, K., Owen, L. L. W., & Manning, J. R.(2017) HyperTools: A Python toolbox for visualizing and manipulating high-dimensional data.
*arXiv:1701.08290 [Stat]*. - Huan15
- Huang, P.-S. (2015) Shallow and deep learning for audio and natural language processing. . University of Illinois at Urbana-Champaign
- KaNe14
- Kane, D. M., & Nelson, J. (2014) Sparser Johnson-Lindenstrauss Transforms.
*Journal of the ACM*, 61(1), 1–23. DOI. - KWSR16
- Koppel, A., Warnell, G., Stump, E., & Ribeiro, A. (2016) Parsimonious Online Learning with Kernels via Sparse Projections in Function Space.
*arXiv:1612.04111 [Cs, Stat]*. - KMKB16
- Krummenacher, G., McWilliams, B., Kilcher, Y., Buhmann, J. M., & Meinshausen, N. (2016) Scalable Adaptive Stochastic Optimization Using Random Projections. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 1750–1758). Curran Associates, Inc.
- LaLP16
- Landweber, P. S., Lazar, E. A., & Patel, N. (2016) On Fiber Diameters of Continuous Maps.
*American Mathematical Monthly*, 123(4), 392–397. DOI. - LiHC06
- Li, P., Hastie, T. J., & Church, K. W.(2006) Very Sparse Random Projections. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 287–296). New York, NY, USA: ACM DOI.
- McBB13
- McWilliams, B., Balduzzi, D., & Buhmann, J. M.(2013) Correlated random features for fast semi-supervised learning. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 26 (Vol. 1050, pp. 440–448). Curran Associates, Inc.
- Meno07
- Menon, A. K.(2007) Random projections and applications to dimensionality reduction. . The University of Sydney Australia 1
- MoTJ06
- Moosmann, F., Triggs, B., & Jurie, F. (2006) Fast Discriminative Visual Codebooks using Randomized Clustering Forests. In Advances in Neural Information Processing Systems (pp. 985–992).
- OAZJ16
- Oveneke, M. C., Aliosha-Perez, M., Zhao, Y., Jiang, D., & Sahli, H. (2016) Efficient Convolutional Auto-Encoding via Random Convexification and Frequency-Domain Minimization. In Advances in Neural Information Processing Systems 29.
- OyTr15
- Oymak, S., & Tropp, J. A.(2015) Universality laws for randomized dimension reduction, with applications.
*arXiv:1511.09433 [Cs, Math, Stat]*. - ScWa17
- Scardapane, S., & Wang, D. (2017) Randomness in neural networks: an overview.
*Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery*, 7(2), n/a-n/a. DOI. - TASL14
- Tang, M., Athreya, A., Sussman, D. L., Lyzinski, V., & Priebe, C. E.(2014) A nonparametric two-sample hypothesis testing problem for random dot product graphs.
*arXiv:1409.2344 [Math, Stat]*. - WDLS09
- Weinberger, K., Dasgupta, A., Langford, J., Smola, A., & Attenberg, J. (2009) Feature Hashing for Large Scale Multitask Learning. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 1113–1120). New York, NY, USA: ACM DOI.
- ZWCL10
- Zhang, D., Wang, J., Cai, D., & Lu, J. (2010) Self-taught Hashing for Fast Similarity Search. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 18–25). New York, NY, USA: ACM DOI.