The Living Thing / Notebooks : Measure concentration inequalities (particularly probability)

Welcome to the probability inequality mines!

When something in your process (measurement, estimation) means that you can be pretty sure that a whole bunch of your stuff is damn likely to be somewhere in particular.

This is basic workhorse stuff in univariate probability, and turns out to be yet more essential in multivariate matrix probability, as seen in matrix factorisation, compressive sensing, PAC-bounds and suchlike.

Overviews include

Foundational but impenetrable things I won’t read: Tagrand’s opus that is commonly credit as kicking off the modern fad. (Tala95)

Finite sample bounds

These are everywhere in statistics. Special attention will be given here to finite-sample inequalities. Asymptotic normality is so last season. These days we care about finite sample performance, and asymptotic results don’t help us there. Apparently I can construct useful bounds using concetration inequalities? Ahlswede-Winterfeld bounds?

Basic inequalities

the classics

Markov

TBD

Chebychev

TBD

Hoeffding

TBD

Chernoff

TBD

Kolmogorov

TBD.

Gaussian

For the Gaussian distribution. Filed there.

Martingale

TBD.

Empirical process theory

Large deviation inequalities, empirical process inequalities, Talagrand chaining method. Berry-Esseen bound.

Matrix Chernoff bounds

Nikhil Srivastava’s Discrepancy, Graphs, and the Kadison-Singer Problem has a wonderful explanation of Ahlswede-Winter-style matrix Chernoff Bounds. Gros11 is also readable, and gives quantum-mechanical results (i.e. the matrices are complex-valued).

Trop15 summarises:

In recent years, random matrices have come to play a major role in computational mathematics, but most of the classical areas of random matrix theory remain the province of experts. Over the last decade, with the advent of matrix concentration inequalities, research has advanced to the point where we can conquer many (formerly) challenging problems with aChernoff page or two of arithmetic.

To read

Bach13
Bach, F. R.(2013) Sharp analysis of low-rank kernel matrix approximations. In COLT (Vol. 30, pp. 185–209).
BaBM99
Barron, A., Birgé, L., & Massart, P. (1999) Risk bounds for model selection via penalization. Probability Theory and Related Fields, 113(3), 301–413.
BeHK12
Berend, D., Harremoës, P., & Kontorovich, A. (2012) Minimum KL-divergence on complements of $L_1$ balls. arXiv:1206.6544 [Cs, Math].
BePo05
Bertsimas, D., & Popescu, I. (2005) Optimal Inequalities in Probability Theory: A Convex Optimization Approach. SIAM Journal on Optimization, 15(3), 780–804. DOI.
BoBL04a
Boucheron, S., Bousquet, O., & Lugosi, G. (2004a) Concentration inequalities. In O. Bousquet, U. von Luxburg, & G. Rätsch (Eds.), Advanced Lectures in Machine Learning.
BoLM03
Boucheron, S., Lugosi, G., & Massart, P. (2003) Concentration inequalities using the entropy method. , 31(3), 1583–1614. DOI.
BoLM13
Boucheron, S., Lugosi, G., & Massart, P. (2013) Concentration inequalities: a nonasymptotic theory of independence. (1st ed.). Oxford: Oxford University Press
BoBL04b
Bousquet, O., Boucheron, S., & Lugosi, G. (2004b) Introduction to Statistical Learning Theory. In O. Bousquet, U. von Luxburg, & G. Rätsch (Eds.), Advanced Lectures on Machine Learning (pp. 169–207). Springer Berlin Heidelberg
BüGe11
Bühlmann, P., & van de Geer, S. (2011) Statistics for High-Dimensional Data: Methods, Theory and Applications. (2011 edition.). Heidelberg ; New York: Springer
CaRe09
Candès, E. J., & Recht, B. (2009) Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6), 717–772. DOI.
CaRT06
Candès, E. J., Romberg, J., & Tao, T. (2006) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory, 52(2), 489–509. DOI.
DSBN13
Dasarathy, G., Shah, P., Bhaskar, B. N., & Nowak, R. (2013) Sketching Sparse Matrices. arXiv:1303.6544 [Cs, Math].
Dasg08
DasGupta, A. (2008) Asymptotic Theory of Statistics and Probability. . New York: Springer New York
Dasg00
Dasgupta, S. (2000) Experiments with Random Projection. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (pp. 143–151). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
DaHV12
Dasgupta, S., Hsu, D., & Verma, N. (2012) A concentration theorem for projections. arXiv Preprint arXiv:1206.6813.
DeHW11
Del Moral, P., Hu, P., & Wu, L. (2011) On the concentration properties of Interacting particle processes. Foundations and Trends® in Machine Learning, 3(3–4), 225–389. DOI.
FGLE12
Flammia, S. T., Gross, D., Liu, Y.-K., & Eisert, J. (2012) Quantum Tomography via Compressed Sensing: Error Bounds, Sample Complexity, and Efficient Estimators. New Journal of Physics, 14(9), 95022. DOI.
GhMc09
Ghofrani, S., & McLernon, D. C.(2009) Auto-Wigner–Ville distribution via non-adaptive and adaptive signal decomposition. Signal Processing, 89(8), 1540–1549. DOI.
Gros11
Gross, D. (2011) Recovering Low-Rank Matrices From Few Coefficients in Any Basis. IEEE Transactions on Information Theory, 57(3), 1548–1566. DOI.
GLFB10
Gross, D., Liu, Y.-K., Flammia, S. T., Becker, S., & Eisert, J. (2010) Quantum state tomography via compressed sensing. Physical Review Letters, 105(15). DOI.
HaRR15
Hansen, N. R., Reynaud-Bouret, P., & Rivoirard, V. (2015) Lasso and probabilistic inequalities for multivariate point processes. Bernoulli, 21(1), 83–143. DOI.
Horn79
Horn, M. (1979) Some inequalities for the expectation of a product of functions of a random variable and for the multivariate distribution function at a random point. Biometrical Journal, 21(3), 243–245. DOI.
Houd02
Houdré, C. (2002) Remarks on deviation inequalities for functions of infinitely divisible random vectors. The Annals of Probability, 30(3), 1223–1237. DOI.
HoPr02
Houdré, C., & Privault, N. (2002) Concentration and deviation inequalities in infinite dimensions via covariance representations. Bernoulli, 8(6), 697–720.
IsMc16
Isaev, M., & McKay, B. D.(2016) On a bound of Hoeffding in the complex case. Electronic Communications in Probability, 21(0). DOI.
Karo13
Karoui, N. E.(2013) Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators : rigorous results. arXiv:1311.2445 [Math, Stat].
Kenn15
Kennedy, E. H.(2015) Semiparametric theory and empirical processes in causal inference. arXiv Preprint arXiv:1510.04740.
LaGG16
Lahiri, S., Gao, P., & Ganguli, S. (2016) Random projections of random manifolds. arXiv:1607.04331 [Cs, Q-Bio, Stat].
LeGe14
Lederer, J., & van de Geer, S. (2014) New concentration inequalities for suprema of empirical processes. Bernoulli, 20(4), 2020–2038. DOI.
Ligg10
Liggett, T. M.(2010) Stochastic models for large interacting systems and related correlation inequalities. Proceedings of the National Academy of Sciences of the United States of America, 107(38), 16413–16419. DOI.
Mass00
Massart, P. (2000) Some applications of concentration inequalities to statistics. In Annales de la Faculté des sciences de Toulouse: Mathématiques (Vol. 9, pp. 245–303).
Mass07
Massart, P. (2007) Concentration inequalities and model selection: Ecole d’Eté de Probabilités de Saint-Flour XXXIII - 2003. . Berlin ; New York: Springer-Verlag
MaMi11
Maugis, C., & Michel, B. (2011) A non asymptotic penalized criterion for Gaussian mixture model selection. ESAIM: Probability and Statistics, 15, 41–68. DOI.
RaSa12
Raginsky, M., & Sason, I. (2012) Concentration of Measure Inequalities in Information Theory, Communications and Coding. arXiv:1212.4663 [Cs, Math].
RaRe09
Rahimi, A., & Recht, B. (2009) Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning. In Advances in neural information processing systems (pp. 1313–1320). Curran Associates, Inc.
ReRo07
Reynaud-Bouret, P., & Roy, E. (2007) Some non asymptotic tail estimates for Hawkes processes. Bulletin of the Belgian Mathematical Society - Simon Stevin, 13(5), 883–896.
Tala95
Talagrand, M. (1995) Concentration of measure and isoperimetric inequalities in product spaces. Publications Mathématiques de L’IHÉS, 81(1), 73–205. DOI.
Tala96
Talagrand, M. (1996) A new look at independence. The Annals of Probability, 24(1), 1–34.
Trop15
Tropp, J. A.(2015) An Introduction to Matrix Concentration Inequalities. arXiv:1501.01571 [Cs, Math, Stat].
Geer95
van de Geer, S. (1995) Exponential Inequalities for Martingales, with Application to Maximum Likelihood Estimation for Counting Processes. The Annals of Statistics, 23(5), 1779–1801. DOI.
Geer14
van de Geer, S. (2014) Statistical Theory for High-Dimensional Models. arXiv:1409.8557 [Math, Stat].
GeLe11
van de Geer, S., & Lederer, J. (2011) The Lasso, correlated design, and improved oracle inequalities. arXiv:1107.0189 [Stat].
WuZh16
Wu, X., & Zhang, J. (2016) Distribution-dependent concentration inequalities for tighter generalization bounds. arXiv:1607.05506 [Stat].