The Living Thing / Notebooks : Randomised regression

Tackling your regression, by using random projections of the predictors.

Usually this means using those projections to reduce the dimensionality of a high dimensional regression. In this case it is not far from compressed sensing, except in how we handle noise. In this linear model case, this is of course random linear algebra, and may be a randomised matrix factorisation.

Occasionally we might use non-linear projections to increase the dimensionality of our data in the hope of making a non-linear regression approximately linear, which dates back to Cover (Cove65).

I am especially interested in seeing how this might be useful for dependent data, especially time series.

Brian McWilliams, Gabriel Krummenacher and Mario Lučić, Randomized Linear Regression: A brief overview and recent results. Gabriel implemented some of the algorithms mentioned, e.g.

Martin Wainright, Statistics meets Optimization: Randomization and approximation for high-dimensional problems.

In the modern era of high-dimensional data, the interface between mathematical statistics and optimization has become an increasingly vibrant area of research. In this course, we provide some vignettes into this interface, including the following topics:

  1. Dimensionality reduction via random projection. The naive idea of projecting high-dimensional data to a randomly chosen low-dimensional space is remarkably effective. We discuss the classical Johnson-Lindenstrauss lemma, as well as various modern variants that provide computationally-efficient embeddings with strong guarantees.
  2. When is it possible to quickly obtain approximate solutions of large-scale convex programs? In practice, methods based on randomized projection can work very well, and arguments based on convex analysis and concentration of measure provide a rigorous underpinning to these observations.
  3. Optimization problems with some form of nonconvexity arise frequently in statistical settings — for instance, in problems with latent variables, combinatorial constraints, or rank constraints. Nonconvex programs are known to be intractable in a complexity-theoretic sense, but the random ensembles arising in statistics are not adversarially constructed. Under what conditions is it possible to make rigorous guarantees about the behavior of simple iterative algorithms for such problems? We develop some general theory for addressing these questions, exploiting tools from both optimization theory and empirical process theory.

Refs

Cove65
Cover, T. M.(1965) Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition. IEEE Transactions on Electronic Computers, EC-14(3), 326–334. DOI.
DLFU13
Dhillon, P., Lu, Y., Foster, D. P., & Ungar, L. (2013) New subsampling algorithms for fast least squares regression. In Advances in Neural Information Processing Systems (pp. 360–368). Curran Associates, Inc.
HeMM16
Heinze, C., McWilliams, B., & Meinshausen, N. (2016) DUAL-LOCO: Distributing Statistical Estimation Using Random Projections. (pp. 875–883). Presented at the Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
HMMK14
Heinze, C., McWilliams, B., Meinshausen, N., & Krummenacher, G. (2014) LOCO: Distributing Ridge Regression with Random Projections. arXiv:1406.3469 [Stat].
KMKB16
Krummenacher, G., McWilliams, B., Kilcher, Y., Buhmann, J. M., & Meinshausen, N. (2016) Scalable Adaptive Stochastic Optimization Using Random Projections. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 1750–1758). Curran Associates, Inc.
McBB13
McWilliams, B., Balduzzi, D., & Buhmann, J. M.(2013) Correlated random features for fast semi-supervised learning. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 26 (Vol. 1050, pp. 440–448). Curran Associates, Inc.
MKLB14
McWilliams, B., Krummenacher, G., Lucic, M., & Buhmann, J. M.(2014) Fast and Robust Least Squares Estimation in Corrupted Linear Models. arXiv:1406.3175 [Stat].
ScWa17
Scardapane, S., & Wang, D. (2017) Randomness in neural networks: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 7(2). DOI.
ThHM17
Thanei, G.-A., Heinze, C., & Meinshausen, N. (2017) Random Projections For Large-Scale Regression. arXiv:1701.05325 [Math, Stat].
WaZM17
Wang, H., Zhu, R., & Ma, P. (2017) Optimal Subsampling for Large Sample Logistic Regression. arXiv:1702.01166 [Stat].
ZhWG17
Zhang, X., Wang, L., & Gu, Q. (2017) Stochastic Variance-reduced Gradient Descent for Low-rank Matrix Recovery from Linear Measurements. arXiv:1701.00481 [Stat].