The Living Thing / Notebooks :

Measure concentration inequalities

On being 80% sure you are only 20% wrong

Welcome to the probability inequality mines!

When something in your process (measurement, estimation) means that you can be pretty sure that a whole bunch of your stuff is damn likely to be somewhere in particular.

This is basic workhorse stuff in univariate probability, and turns out to be yet more essential in multivariate matrix probability, as seen in matrix factorisation, compressive sensing, PAC-bounds and suchlike.

Background

Overviews include

The inequalities discussed in these notes bound tail probabilities of general functions of independent random variables.

The taxonomy is interesting:

Several methods have been known to prove such inequalities, including martingale methods (see Milman and Schechtman [1] and the surveys of McDiarmid [2, 3]), information-theoretic methods (see Alhswede, Ga ́cs, and Ko ̈rner [4], Marton [5, 6, 7], Dembo [8], Massart [9] and Rio [10]), Talagrand’s induction method [11, 12, 13] (see also Luczak and McDiarmid [14], McDiarmid [15] and Panchenko [16, 17, 18]), the decoupling method surveyed by de la Pen ̃a and Gin ́e [19], and the so-called “entropy method”, based on logarithmic Sobolev inequalities, developed by Ledoux [20, 21], see also Bobkov and Ledoux [22], Massart [23], Rio [10], Klein [24], Boucheron, Lugosi, and Mas- sart [25, 26], Bousquet [27, 28], and Boucheron, Bousquet, Lugosi, and Massart [29].

(actioned in his Combinatorial statistics notes)

Foundational but impenetrable things I won’t read right now: Talagrand’s opus that is commonly credited with kicking off the modern fad especially with the chaining method. (Tala95)

Finite sample bounds

These are everywhere in statistics. Special attention will be given here to finite-sample inequalities. Asymptotic normality is so last season. These days we care about finite sample performance, and asymptotic results don’t help us there. Apparently I can construct useful bounds using concentration inequalities? One suggested keyword to disambiguate: Ahlswede-Winterfeld bounds?

Basic inequalities

the classics

Markov

TBD

Chebychev

TBD

Hoeffding

TBD

Chernoff

TBD

Kolmogorov

TBD.

Gaussian

For the Gaussian distribution. Filed there.

Martingale type

TBD.

Khintchine

Let us copy from wikipedia:

Heuristically: if we pick \(N\) complex numbers \(x_1,\dots,x_N \in\mathbb{C}\)\(, and add them together, each multiplied by jointly independent random signs $\pm 1\), then the expected value of the sum’s magnitude is close to \(\sqrt{|x_1|^{2}+ \cdots + |x_N|^{2}}\).

Let :math:{varepsilon_n}_{n=1}^N i.i.d. random variables with \(P(\varepsilon_n=\pm1)=\frac12\) for \(n=1,\ldots, N\), i.e., a sequence with Rademacher distribution. Let :math:0<p<infty and let :math:x_1,ldots,x_Nin mathbb{C}. Then

$$ A_p \left( \sum_{n=1}^N |x_n|^2 \right)^{1/2} \leq \left(\operatorname{E} \left|\sum_{n=1}^N \varepsilon_n x_n\right|^p \right)^{1/p} \leq B_p \left(\sum_{n=1}^N |x_n|^2\right)^{1/2} $$

for some constants \(A_p,B_p>0 $. It is a simple matter to see that $A_p = 1\) when \(p \ge 2\), and \(B_p = 1\) when \(0 < p \le 2\).

Empirical process theory

Large deviation inequalities, empirical process inequalities, Talagrand chaining method. Berry-Esseen bound.

Matrix Chernoff bounds

Nikhil Srivastava’s Discrepancy, Graphs, and the Kadison-Singer Problem has an interesting example of bounds via discrepancy theory (and only indirectly probability). Gros11 is also readable, and gives quantum-mechanical results (i.e. the matrices are complex-valued).

Trop15 summarises:

In recent years, random matrices have come to play a major role in computational mathematics, but most of the classical areas of random matrix theory remain the province of experts. Over the last decade, with the advent of matrix concentration inequalities, research has advanced to the point where we can conquer many (formerly) challenging problems with a page or two of arithmetic.

To read

Refs