Welcome to the probability inequality mines!
When something in your process (measurement, estimation) means that you can be pretty sure that a whole bunch of your stuff is damn likely to be somewhere in particular.
This is basic workhorse stuff in univariate probability, and turns out to be yet more essential in multivariate matrix probability, as seen in matrix factorisation, compressive sensing, PACbounds and suchlike.
Background
Overviews include
 this super simple intro to chaining and controlling maxima by Thomas Lumley
 Dasgupta, Asymptotic Theory of Statistics and Probability (Dasg08) is very easy, and despite its name introduces some nice basic nonasymptotic inequalities
 Raginsky and Sason, Concentration of Measure Inequalities in Information Theory, Communications and Coding (RaSa12)
 Tropp, An Introduction to Matrix Concentration Inequalities (Trop15) highdimensional data! free!
 Boucheron, Bousquet & Lugosi, Concentration inequalities (BoBl04a) (Clear and brisk but missing some newer stuff)
 Massart, Concentration inequalities and model section (Mass07). Clear, and focussed, but very quick and further, depressingly, by being applied it also demonstrates the limitations of these techniques. Mass00 is an earlier draft.
 Boucheron, Lugosi & Massart, Concentration inequalities: a nonasymptotic theory of independence (BoLM13). Haven't read it yet.

Lugosi's Concentrationofmeasure Lecture notes:
The inequalities discussed in these notes bound tail probabilities of general functions of independent random variables.
The taxonomy is interesting:
Several methods have been known to prove such inequalities, including martingale methods (see Milman and Schechtman [1] and the surveys of McDiarmid [2, 3]), informationtheoretic methods (see Alhswede, Ga ́cs, and Ko ̈rner [4], Marton [5, 6, 7], Dembo [8], Massart [9] and Rio [10]), Talagrand’s induction method [11, 12, 13] (see also Luczak and McDiarmid [14], McDiarmid [15] and Panchenko [16, 17, 18]), the decoupling method surveyed by de la Pen ̃a and Gin ́e [19], and the socalled “entropy method”, based on logarithmic Sobolev inequalities, developed by Ledoux [20, 21], see also Bobkov and Ledoux [22], Massart [23], Rio [10], Klein [24], Boucheron, Lugosi, and Mas sart [25, 26], Bousquet [27, 28], and Boucheron, Bousquet, Lugosi, and Massart [29].
(actioned in his Combinatorial statistics notes)
Foundational but impenetrable things I won't read right now: Talagrand's opus that is commonly credited with kicking off the modern fad especially with the chaining method. (Tala95)
Finite sample bounds
These are everywhere in statistics. Special attention will be given here to finitesample inequalities. Asymptotic normality is so last season. These days we care about finite sample performance, and asymptotic results don't help us there. Apparently I can construct useful bounds using concentration inequalities? One suggested keyword to disambiguate: AhlswedeWinterfeld bounds?
Basic inequalities
the classics
Markov
TBD
Chebychev
TBD
Hoeffding
TBD
Chernoff
TBD
Kolmogorov
TBD.
Gaussian
For the Gaussian distribution. Filed there.
Martingale type
TBD.
Khintchine
Let us copy from wikipedia:
Heuristically: if we pick complex numbers , and add them together, each multiplied by jointly independent random signs , then the expected value of the sum's magnitude is close to .
Let i.i.d. random variables with for , i.e., a sequence with Rademacher distribution. Let and let . Then
for some constants . It is a simple matter to see that when , and when )0 < p \le 2).
Empirical process theory
Large deviation inequalities, empirical process inequalities, Talagrand chaining method. BerryEsseen bound.
Matrix Chernoff bounds
Nikhil Srivastava's Discrepancy, Graphs, and the KadisonSinger Problem has an interesting example of bounds via discrepancy theory (and only indirectly probability). Gros11 is also readable, and gives quantummechanical results (i.e. the matrices are complexvalued).
Trop15 summarises:
In recent years, random matrices have come to play a major role in computational mathematics, but most of the classical areas of random matrix theory remain the province of experts. Over the last decade, with the advent of matrix concentration inequalities, research has advanced to the point where we can conquer many (formerly) challenging problems with a page or two of arithmetic.
To read
Refs
 DaHV12: (2012) A concentration theorem for projections. ArXiv Preprint ArXiv:1206.6813.
 Tala96: (1996) A new look at independence. The Annals of Probability, 24(1), 1–34.
 MaMi11: (2011) A non asymptotic penalized criterion for Gaussian mixture model selection. ESAIM: Probability and Statistics, 15, 41–68. DOI
 Trop15: (2015) An Introduction to Matrix Concentration Inequalities. ArXiv:1501.01571 [Cs, Math, Stat].
 Dasg08: (2008) Asymptotic Theory of Statistics and Probability. New York: Springer New York
 HoPr02: (2002) Concentration and deviation inequalities in infinite dimensions via covariance representations. Bernoulli, 8(6), 697–720.
 BoBL04: (2004) Concentration inequalities. In Advanced Lectures in Machine Learning.
 BoLM13: (2013) Concentration inequalities: a nonasymptotic theory of independence. Oxford: Oxford University Press
 Mass07: (2007) Concentration inequalities and model selection: Ecole d’Eté de Probabilités de SaintFlour XXXIII  2003. Berlin ; New York: SpringerVerlag
 Krol16: (2016) Concentration inequalities for Poisson point processes with application to adaptive intensity estimation. ArXiv:1612.07901 [Math, Stat].
 BoLM03: (2003) Concentration inequalities using the entropy method. , 31(3), 1583–1614. DOI
 Tala95: (1995) Concentration of measure and isoperimetric inequalities in product spaces. Publications Mathématiques de L’IHÉS, 81(1), 73–205. DOI
 RaSa12: (2012) Concentration of Measure Inequalities in Information Theory, Communications and Coding. Foundations and Trends in Communications and Information Theory.
 WuZh16: (2016) Distributiondependent concentration inequalities for tighter generalization bounds. ArXiv:1607.05506 [Stat].
 CaRe09: (2009) Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6), 717–772. DOI
 Dasg00: (2000) Experiments with Random Projection. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (pp. 143–151). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
 Geer95: (1995) Exponential Inequalities for Martingales, with Application to Maximum Likelihood Estimation for Counting Processes. The Annals of Statistics, 23(5), 1779–1801. DOI
 KuMo16: (2016) Generalization Bounds for Nonstationary Mixing Processes. In Machine Learning Journal.
 KuMo14: (2014) Generalization Bounds for Time Series Prediction with Nonstationary Processes. In Algorithmic Learning Theory (pp. 260–274). Bled, Slovenia: Springer International Publishing DOI
 BoBL04: (2004) Introduction to Statistical Learning Theory. In Advanced Lectures on Machine Learning (pp. 169–207). Springer Berlin Heidelberg
 HaRR15: (2015) Lasso and probabilistic inequalities for multivariate point processes. Bernoulli, 21(1), 83–143. DOI
 KuMo15: (2015) Learning Theory and Algorithms for Forecasting NonStationary Time Series. In Advances in Neural Information Processing Systems (pp. 541–549). Curran Associates, Inc.
 BeHK12: (2012) Minimum KLdivergence on complements of balls. ArXiv:1206.6544 [Cs, Math].
 LeGe14: (2014) New concentration inequalities for suprema of empirical processes. Bernoulli, 20(4), 2020–2038. DOI
 Geer02: (2002) On Hoeffdoing’s inequality for dependent random variables. In Empirical Process Techniques for Dependent Data. Birkhhäuser
 DeHW11: (2011) On the concentration properties of Interacting particle processes. Foundations and Trends® in Machine Learning, 3(3–4), 225–389. DOI
 BePo05: (2005) Optimal Inequalities in Probability Theory: A Convex Optimization Approach. SIAM Journal on Optimization, 15(3), 780–804. DOI
 Kolt11: (2011) Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Springer Berlin Heidelberg DOI
 GLFB10: (2010) Quantum state tomography via compressed sensing. Physical Review Letters, 105(15). DOI
 FGLE12: (2012) Quantum Tomography via Compressed Sensing: Error Bounds, Sample Complexity, and Efficient Estimators. New Journal of Physics, 14(9), 095022. DOI
 LaGG16: (2016) Random projections of random manifolds. ArXiv:1607.04331 [Cs, qBio, Stat].
 Gros11: (2011) Recovering LowRank Matrices From Few Coefficients in Any Basis. IEEE Transactions on Information Theory, 57(3), 1548–1566. DOI
 Houd02: (2002) Remarks on deviation inequalities for functions of infinitely divisible random vectors. The Annals of Probability, 30(3), 1223–1237. DOI
 BaBM99: (1999) Risk bounds for model selection via penalization. Probability Theory and Related Fields, 113(3), 301–413.
 CaRT06: (2006) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory, 52(2), 489–509. DOI
 Kenn15: (2015) Semiparametric theory and empirical processes in causal inference. ArXiv Preprint ArXiv:1510.04740.
 Bach13: (2013) Sharp analysis of lowrank kernel matrix approximations. In COLT (Vol. 30, pp. 185–209).
 DSBN13: (2013) Sketching Sparse Matrices. ArXiv:1303.6544 [Cs, Math].
 Mass00: (2000) Some applications of concentration inequalities to statistics. In Annales de la Faculté des sciences de Toulouse: Mathématiques (Vol. 9, pp. 245–303).
 Horn79: (1979) Some inequalities for the expectation of a product of functions of a random variable and for the multivariate distribution function at a random point. Biometrical Journal, 21(3), 243–245. DOI
 ReRo07: (2007) Some non asymptotic tail estimates for Hawkes processes. Bulletin of the Belgian Mathematical Society  Simon Stevin, 13(5), 883–896.
 Geer14: (2014) Statistical Theory for HighDimensional Models. ArXiv:1409.8557 [Math, Stat].
 BüGe11: (2011) Statistics for HighDimensional Data: Methods, Theory and Applications. Heidelberg ; New York: Springer
 Ligg10: (2010) Stochastic models for large interacting systems and related correlation inequalities. Proceedings of the National Academy of Sciences of the United States of America, 107(38), 16413–16419. DOI
 GeLe11: (2011) The Lasso, correlated design, and improved oracle inequalities. ArXiv:1107.0189 [Stat].
 AuNe11: (2011) The multiplicative property characterizes and norms. Confluentes Mathematici, 03(04), 637–647. DOI
 BeLT17: (2017) Towards the study of least squares estimators with convex penalty. ArXiv:1701.09120 [Math, Stat].
 GiNi09: (2009) Uniform limit theorems for wavelet density estimators. The Annals of Probability, 37(4), 1605–1646. DOI
 RaRe09: (2009) Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning. In Advances in neural information processing systems (pp. 1313–1320). Curran Associates, Inc.