The Living Thing / Notebooks : Random embeddings and hashing

See also matrix factorisations, high-dimensional statistics and discuss random projections and their role in motivating compressed sensing etc.

Cover’s Theorem (Cove65)

It was shown that, for a random set of linear inequalities in \(d\) unknowns, the expected number of extreme inequalities, which are necessary and sufficient to imply the entire set, tends to \(2d\) as the number of consistent inequalities tends to infinity, thus bounding the expected necessary storage capacity for linear decision algorithms in separable problems. The results, even those dealing with randomly positioned points, have been combinatorial in nature, and have been essentially independent of the configuration of the set of points in the space.

I am especially interested in random embeddings for random kernel approximation.

Over at compressed sensing we mention some other random projection results, such as the Johnson-Lindenstrauss lemma, and these ideas are closely related, in the probabilistic setting, to concentration inequalities.

Landweber etc al (LaLP16) have an example of a converse result about your continuous random embeddings. (TBD)

Refs

Achl03
Achlioptas, D. (2003) Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of Computer and System Sciences, 66(4), 671–687. DOI.
AiCh09
Ailon, N., & Chazelle, B. (2009) The Fast Johnson–Lindenstrauss Transform and Approximate Nearest Neighbors. SIAM Journal on Computing, 39(1), 302–322. DOI.
AlMa14
Alaoui, A. E., & Mahoney, M. W.(2014) Fast Randomized Kernel Methods With Statistical Guarantees. arXiv:1411.0306 [Cs, Stat].
AnIn06
Andoni, A., & Indyk, P. (2006) Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. In 47th Annual IEEE Symposium on Foundations of Computer Science, 2006. FOCS ’06 (Vol. 51, pp. 459–468). DOI.
AINR13
Andoni, Alexandr, Indyk, P., Nguyen, H. L., & Razenshteyn, I. (2013) Beyond Locality-Sensitive Hashing. arXiv:1306.1547 [Cs].
AnRa15
Andoni, Alexandr, & Razenshteyn, I. (2015) Optimal Data-Dependent Hashing for Approximate Near Neighbors. arXiv:1501.01062 [Cs].
AuVi15
Auvolat, A., & Vincent, P. (2015) Clustering is Efficient for Approximate Maximum Inner Product Search. arXiv:1507.05910 [Cs, Stat].
Bach15
Bach, F. (2015) On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions. arXiv Preprint arXiv:1502.06800.
BDDW08
Baraniuk, R., Davenport, M., DeVore, R., & Wakin, M. (2008) A Simple Proof of the Restricted Isometry Property for Random Matrices. Constructive Approximation, 28(3), 253–263. DOI.
BiMa01
Bingham, E., & Mannila, H. (2001) Random Projection in Dimensionality Reduction: Applications to Image and Text Data. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 245–250). New York, NY, USA: ACM DOI.
CaTa06
Candès, E. J., & Tao, T. (2006) Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?. IEEE Transactions on Information Theory, 52(12), 5406–5425. DOI.
CaRS08
Casey, M., Rhodes, C., & Slaney, M. (2008) Analysis of Minimum Distances in High-Dimensional Musical Spaces. IEEE Transactions on Audio, Speech, and Language Processing, 16(5), 1015–1028. DOI.
Cove65
Cover, T. M.(1965) Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition. IEEE Transactions on Electronic Computers, EC-14(3), 326–334. DOI.
Dasg00
Dasgupta, S. (2000) Experiments with Random Projection. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (pp. 143–151). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
DaGu03
Dasgupta, S., & Gupta, A. (2003) An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures & Algorithms, 22(1), 60–65. DOI.
DIIM04
Datar, M., Immorlica, N., Indyk, P., & Mirrokni, V. S.(2004) Locality-sensitive Hashing Scheme Based on P-stable Distributions. In Proceedings of the Twentieth Annual Symposium on Computational Geometry (pp. 253–262). New York, NY, USA: ACM DOI.
DuBa13
Duarte, M. F., & Baraniuk, R. G.(2013) Spectral compressive sensing. Applied and Computational Harmonic Analysis, 35(1), 111–129. DOI.
EYWR16
Eftekhari, A., Yap, H. L., Wakin, M. B., & Rozell, C. J.(2016) Stabilizing Embedology: Geometry-Preserving Delay-Coordinate Maps. arXiv:1609.06347 [Nlin, Stat].
Fodo02
Fodor, I. (2002) A Survey of Dimension Reduction Techniques.
FDKV07
Freund, Y., Dasgupta, S., Kabra, M., & Verma, N. (2007) Learning the structure of manifolds using random projections. In Advances in Neural Information Processing Systems (pp. 473–480).
GeEW06
Geurts, P., Ernst, D., & Wehenkel, L. (2006) Extremely randomized trees. Machine Learning, 63(1), 3–42. DOI.
GiIM99
Gionis, A., Indyky, P., & Motwaniz, R. (1999) Similarity Search in High Dimensions via Hashing. . Presented at the VLDB Conference
GiSB16
Giryes, R., Sapiro, G., & Bronstein, A. M.(2016) Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?. IEEE Transactions on Signal Processing, 64(13), 3444–3457. DOI.
GoTR16
Gorban, A. N., Tyukin, I. Y., & Romanenko, I. (2016) The Blessing of Dimensionality: Separation Theorems in the Thermodynamic Limit. arXiv:1610.00494 [Cs, Stat].
HaLi93
Hall, P., & Li, K.-C. (1993) On almost Linearity of Low Dimensional Projections from High Dimensional Data. The Annals of Statistics, 21(2), 867–889.
HZOM17
Heusser, A. C., Ziman, K., Owen, L. L. W., & Manning, J. R.(2017) HyperTools: A Python toolbox for visualizing and manipulating high-dimensional data. arXiv:1701.08290 [Stat].
Huan15
Huang, P.-S. (2015) Shallow and deep learning for audio and natural language processing. . University of Illinois at Urbana-Champaign
KaNe14
Kane, D. M., & Nelson, J. (2014) Sparser Johnson-Lindenstrauss Transforms. Journal of the ACM, 61(1), 1–23. DOI.
KWSR16
Koppel, A., Warnell, G., Stump, E., & Ribeiro, A. (2016) Parsimonious Online Learning with Kernels via Sparse Projections in Function Space. arXiv:1612.04111 [Cs, Stat].
KMKB16
Krummenacher, G., McWilliams, B., Kilcher, Y., Buhmann, J. M., & Meinshausen, N. (2016) Scalable Adaptive Stochastic Optimization Using Random Projections. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 1750–1758). Curran Associates, Inc.
LaLP16
Landweber, P. S., Lazar, E. A., & Patel, N. (2016) On Fiber Diameters of Continuous Maps. American Mathematical Monthly, 123(4), 392–397. DOI.
LiHC06
Li, P., Hastie, T. J., & Church, K. W.(2006) Very Sparse Random Projections. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 287–296). New York, NY, USA: ACM DOI.
McBB13
McWilliams, B., Balduzzi, D., & Buhmann, J. M.(2013) Correlated random features for fast semi-supervised learning. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 26 (Vol. 1050, pp. 440–448). Curran Associates, Inc.
Meno07
Menon, A. K.(2007) Random projections and applications to dimensionality reduction. . The University of Sydney Australia 1
MoTJ06
Moosmann, F., Triggs, B., & Jurie, F. (2006) Fast Discriminative Visual Codebooks using Randomized Clustering Forests. In Advances in Neural Information Processing Systems (pp. 985–992).
OAZJ16
Oveneke, M. C., Aliosha-Perez, M., Zhao, Y., Jiang, D., & Sahli, H. (2016) Efficient Convolutional Auto-Encoding via Random Convexification and Frequency-Domain Minimization. In Advances in Neural Information Processing Systems 29.
OyTr15
Oymak, S., & Tropp, J. A.(2015) Universality laws for randomized dimension reduction, with applications. arXiv:1511.09433 [Cs, Math, Stat].
ScWa17
Scardapane, S., & Wang, D. (2017) Randomness in neural networks: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 7(2), n/a-n/a. DOI.
TASL14
Tang, M., Athreya, A., Sussman, D. L., Lyzinski, V., & Priebe, C. E.(2014) A nonparametric two-sample hypothesis testing problem for random dot product graphs. arXiv:1409.2344 [Math, Stat].
WDLS09
Weinberger, K., Dasgupta, A., Langford, J., Smola, A., & Attenberg, J. (2009) Feature Hashing for Large Scale Multitask Learning. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 1113–1120). New York, NY, USA: ACM DOI.
ZWCL10
Zhang, D., Wang, J., Cai, D., & Lu, J. (2010) Self-taught Hashing for Fast Similarity Search. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 18–25). New York, NY, USA: ACM DOI.