Techniques to improve the failure modes of your estimates. Surprisingly rarely used despite being fairly straightforward.

This is more-or-less a frequentist project.

Bayesians seem to claim to achieve robustness largely by choosing heavy-tailed priors where they might have chosen light-tailed ones, e.g. Laplacian priors instead of Gaussian ones. Such priors may have arbitrary parameters, but not *more arbitrary than usual* in Bayesian statistics and therefore do not attract so much need to rationalise away the guilt.

## TODO

relation to penalized regression.

connection with Lasso.

Beran’s Hellinger-ball contamination model, which I also don’t yet understand.

Breakdown point explanation

glm connection.

## Corruption models

(Adversarial) total variation \(\epsilon\)-corruption.

Random (mixture) corruption

other?

## M-estimation with robust loss

The one that I, at least, would think of when considering robust estimation.

In M-estimation, instead of hunting an maximum of the likelihood function as you do in maximum likelihood, or an minimum of the sum of squared residuals, as you do in least-squares estimation, you minimised a specifically choses loss funciton for those residuals. You may select an objective function more robust to deviations between your model and reality. Credited to Huber (Hube64).

See M-estimation for the details

Aside: AFAICT, the definition of M-estimation includes the possibility that you *could* in principle select a *less*-robust loss function than least sum-of-squares or negative log likelihood, but I have not seen this in the literature. Generally, some robustified approach is presumed.

For M-estimation as robust estimation, various complications ensue, such as the different between noise in your predictors, noise in your regressors, and whether the “true” model is included in your class, and which of these difficulties you have resolved or not.

Loosely speaking, no, you haven’t solved problems of noise in your predictors, only the problem of noise in your responses.

And the cost is that you now have a loss function with some extra arbitrary parameters in which you have to justify, which is anathema to frequentists, who like to claim to be less arbitrary than Bayesians. You then have to justify why you chose that loss function and its particular parameterisation. There are various procedures to choose these parameters, based on scale estimation.

## MM-estimation

TBD. Don’t know

## Median-based estimators

Rousseeuw and Yohai’s school. (RoYo84)

Many permutations on the theme here, but it rapidly gets complex. The only one of these families I have looked into are the near trivial cases of the Least Median Of Squares and Least Trimmed Squares estimations. (Rous84)

More broadly we should also consider S-estimators, which do something with… robust estimation of scale and using this to do robust estimation of location? TBD.

Theil-Sen-(Oja) estimators: Something about medians of inferred regression slopes. TBD.

Tukey median, and why no-one uses it what with it being NP-Hard.

## Others

RANSAC – some kind of randomised outlier detection estimator. TBD.

## Refs

- MaRo97: M. Markatou, E. Ronchetti (1997) 3 Robust inference: The approach based on influence functions. In Handbook of Statistics (Vol. 15, pp. 49–75). Elsevier DOI
- ThCl13: Kukatharmini Tharmaratnam, Gerda Claeskens (2013) A comparison of robust versions of the AIC based on M-, S- and MM-estimators.
*Statistics*, 47(1), 216–235. DOI - BuNo95: P. Burman, D. Nolan (1995) A general Akaike-type criterion for model selection in robust regression.
*Biometrika*, 82(4), 877–886. DOI - MoNS13: Elchanan Mossel, Joe Neeman, Allan Sly (2013) A Proof Of The Block Model Threshold Conjecture.
*ArXiv:1311.4115 [Cs, Math]*. - Thei92: Henri Theil (1992) A Rank-Invariant Method of Linear and Polynomial Regression Analysis. In Henri Theil’s Contributions to Economics and Econometrics (pp. 345–381). Springer Netherlands DOI
- GoNu90: Grigori K. Golubev, Michael Nussbaum (1990) A Risk Bound in Sobolev Class Regression.
*The Annals of Statistics*, 18(2), 758–778. DOI - YaXu13: Wenzhuo Yang, Huan Xu (2013) A Unified Robust Regression Model for Lasso-like Algorithms. In ICML (3) (pp. 585–593).
- CzRo10: Veronika Czellar, Elvezio Ronchetti (2010) Accurate and robust tests for indirect inference.
*Biometrika*, 97(3), 621–630. DOI - KoKi03: Sadanori Konishi, Genshiro Kitagawa (2003) Asymptotic theory for information criteria in model selection—functional approach.
*Journal of Statistical Planning and Inference*, 114(1–2), 45–61. DOI - MoNS16: Elchanan Mossel, Joe Neeman, Allan Sly (2016) Belief propagation, robust reconstruction and optimal recovery of block models.
*The Annals of Applied Probability*, 26(4), 2211–2256. DOI - JaGe16: Jana Janková, Sara van de Geer (2016) Confidence regions for high-dimensional generalized linear models under sparsity.
*ArXiv:1610.01353 [Math, Stat]*. - Oja83: Hannu Oja (1983) Descriptive statistics for multivariate distributions.
*Statistics & Probability Letters*, 1(6), 327–332. DOI - Bera81: Rudolf Beran (1981) Efficient robust estimates in parametric models.
*Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete*, 55(1), 91–108. DOI - GhBa16: Abhik Ghosh, Ayanendranath Basu (2016) General Model Adequacy Tests and Robust Statistical Inference Based on A New Family of Divergences.
*ArXiv:1611.05224 [Math, Stat]*. - KoKi96: Sadanori Konishi, Genshiro Kitagawa (1996) Generalised information criteria in model selection.
*Biometrika*, 83(4), 875–890. DOI - DoMo13: David L. Donoho, Andrea Montanari (2013) High Dimensional Robust M-Estimation: Asymptotic Variance via Approximate Message Passing.
*ArXiv:1310.7320 [Cs, Math, Stat]*. - KoKi08: Sadanori Konishi, G. Kitagawa (2008)
*Information criteria and statistical modeling*. New York: Springer - MaKP98: J. H. Manton, V. Krishnamurthy, H. V. Poor (1998) James-Stein state filtering algorithms.
*IEEE Transactions on Signal Processing*, 46(9), 2431–2447. DOI - BoKG10: Howard D. Bondell, Arun Krishna, Sujit K. Ghosh (2010) Joint Variable Selection for Fixed and Random Effects in Linear Mixed-Effects Models.
*Biometrics*, 66(4), 1069–1077. DOI - MKRL86: Desire L. Massart, Leonard Kaufman, Peter J. Rousseeuw, Annick Leroy (1986) Least median of squares: a robust method for outlier and model error detection in regression and calibration.
*Analytica Chimica Acta*, 187, 171–179. DOI - Rous84: Peter J. Rousseeuw (1984) Least Median of Squares Regression.
*Journal of the American Statistical Association*, 79(388), 871–880. DOI - Roya86: Richard M. Royall (1986) Model Robust Confidence Intervals Using Maximum Likelihood Estimators.
*International Statistical Review / Revue Internationale de Statistique*, 54(2), 221–226. DOI - Barn83: O. Barndorff-Nielsen (1983) On a formula for the distribution of the maximum likelihood estimator.
*Biometrika*, 70(2), 343–365. DOI - Qian96: Guoqi Qian (1996) On model selection in robust linear regression
- QiKü98: Guoqi Qian, Hans R. Künsch (1998) On model selection via stochastic complexity in robust linear regression.
*Journal of Statistical Planning and Inference*, 75(1), 91–116. DOI - LuGF12: W. LU, Y. GOLDBERG, J. P. FINE (2012) On the robustness of the adaptive lasso to model misspecification.
*Biometrika*, 99(3), 717–731. DOI - Bick75: P. J. Bickel (1975) One-Step Huber Estimates in the Linear Model.
*Journal of the American Statistical Association*, 70(350), 428–434. DOI - Wedd74: R. W. M. Wedderburn (1974) Quasi-likelihood functions, generalized linear models, and the Gauss—Newton method.
*Biometrika*, 61(3), 439–447. DOI - MaZa02: Ricardo A. Maronna, Ruben H. Zamar (2002) Robust Estimates of Location and Dispersion for High-Dimensional Datasets.
*Technometrics*, 44(4), 307–317. - Bera82: Rudolf Beran (1982) Robust Estimation in Models for Independent Non-Identically Distributed Data.
*The Annals of Statistics*, 10(2), 415–428. DOI - Hube64: Peter J. Huber (1964) Robust Estimation of a Location Parameter.
*The Annals of Mathematical Statistics*, 35(1), 73–101. DOI - MaYo14: Ricardo A. Maronna, Víctor J. Yohai (2014) Robust Estimation of Multivariate Location and Scatter. In Wiley StatsRef: Statistics Reference Online. John Wiley & Sons, Ltd
- GeRo03: Marc G Genton, Elvezio Ronchetti (2003) Robust Indirect Inference.
*Journal of the American Statistical Association*, 98(461), 67–76. DOI - Ronc97: Elvezio Ronchetti (1997) Robust inference by influence functions.
*Journal of Statistical Planning and Inference*, 57(1), 59–72. DOI - CaRo01: Eva Cantoni, Elvezio Ronchetti (2001) Robust Inference for Generalized Linear Models.
*Journal of the American Statistical Association*, 96(455), 1022–1030. DOI - RoTr01: Elvezio Ronchetti, Fabio Trojani (2001) Robust inference with GMM estimators.
*Journal of Econometrics*, 101(1), 37–69. DOI - Maro76: Ricardo Antonio Maronna (1976) Robust M-Estimators of Multivariate Location and Scatter.
*The Annals of Statistics*, 4(1), 51–67. - Mach93: José A.F. Machado (1993) Robust Model Selection and M-Estimation.
*Econometric Theory*, 9(03), 478–493. DOI - Ronc85: Elvezio Ronchetti (1985) Robust model selection in regression.
*Statistics & Probability Letters*, 3(1), 21–23. DOI - Tsou06: Tsung-Shan Tsou (2006) Robust Poisson regression.
*Journal of Statistical Planning and Inference*, 136(9), 3173–3186. DOI - XuCM10: H. Xu, C. Caramanis, S. Mannor (2010) Robust Regression and Lasso.
*IEEE Transactions on Information Theory*, 56(7), 3561–3574. DOI - RoLe87: Peter J. Rousseeuw, Annick M. Leroy (1987)
*Robust regression and outlier detection*. New York: Wiley - RoYo84: P. Rousseeuw, V. Yohai (1984) Robust Regression by Means of S-Estimators. In Robust and Nonlinear Time Series Analysis (pp. 256–272). Springer US DOI
- Ronc00: E. Ronchetti (2000) Robust Regression Methods and Model Selection. In Data Segmentation and Model Selection for Computer Vision (pp. 31–40). Springer New York DOI
- Hube09: Peter J. Huber (2009)
*Robust statistics*. Hoboken, N.J: Wiley - Bühl14: Peter Bühlmann (2014) Robust Statistics. In Selected Works of Peter J. Bickel (pp. 51–98). Springer New York
- HRRS11: Frank R. Hampel, Elvezio M. Ronchetti, Peter J. Rousseeuw, Werner A. Stahel (2011)
*Robust Statistics: The Approach Based on Influence Functions*. John Wiley & Sons - MaMY06: Ricardo A. Maronna, Douglas Martin, Víctor J. Yohai (2006)
*Robust statistics: theory and methods*. Chichester: Wiley - QiHa96: Guoqi Qian, R. K. Hans (1996) Some notes on Rissanen’s stochastic complexity
- Cox83: D. R. Cox (1983) Some remarks on overdispersion.
*Biometrika*, 70(1), 269–274. DOI - KMMN13: Florent Krzakala, Cristopher Moore, Elchanan Mossel, Joe Neeman, Allan Sly, Lenka Zdeborová, Pan Zhang (2013) Spectral redemption in clustering sparse networks.
*Proceedings of the National Academy of Sciences*, 110(52), 20935–20940. DOI - DuGN16: John Duchi, Peter Glynn, Hongseok Namkoong (2016) Statistics of Robust Optimization: A Generalized Empirical Likelihood Approach.
*ArXiv:1610.03425 [Stat]*. - DoLi88: David L. Donoho, Richard C. Liu (1988) The “Automatic” Robustness of Minimum Distance Functionals.
*The Annals of Statistics*, 16(2), 552–586. DOI - MaYo95: Ricardo A. Maronna, Víctor J. Yohai (1995) The Behavior of the Stahel-Donoho Robust Multivariate Estimator.
*Journal of the American Statistical Association*, 90(429), 330–341. DOI - Stig10: Stephen M. Stigler (2010) The Changing History of Robustness.
*The American Statistician*, 64(4), 277–281. DOI - Hamp74: Frank R. Hampel (1974) The Influence Curve and its Role in Robust Estimation.
*Journal of the American Statistical Association*, 69(346), 383–393. DOI - DoHu83: David L. Donoho, Peter J. Huber (1983) The notion of breakdown point.
*A Festschrift for Erich L. Lehmann*, 157184.