Techniques to improve the failure modes of your estimates. Surprisingly rarely used despite being fairly straightforward.
This is moreorless a frequentist project.
Bayesians seem to claim to achieve robustness largely by choosing heavytailed priors where they might have chosen lighttailed ones, e.g. Laplacian priors instead of Gaussian ones. Such priors may have arbitrary parameters, but not more arbitrary than usual in Bayesian statistics and therefore do not attract so much need to rationalise away the guilt.
TODO

relation to penalized regression.

connection with Lasso.

Beran’s Hellingerball contamination model, which I also don’t yet understand.

Breakdown point explanation

glm connection.
Corruption models

(Adversarial) total variation \(\epsilon\)corruption.

Random (mixture) corruption

other?
Mestimation with robust loss
The one that I, at least, would think of when considering robust estimation.
In Mestimation, instead of hunting an maximum of the likelihood function as you do in maximum likelihood, or an minimum of the sum of squared residuals, as you do in leastsquares estimation, you minimised a specifically choses loss funciton for those residuals. You may select an objective function more robust to deviations between your model and reality. Credited to Huber (Hube64).
See Mestimation for the details
Aside: AFAICT, the definition of Mestimation includes the possibility that you could in principle select a lessrobust loss function than least sumofsquares or negative log likelihood, but I have not seen this in the literature. Generally, some robustified approach is presumed.
For Mestimation as robust estimation, various complications ensue, such as the different between noise in your predictors, noise in your regressors, and whether the “true” model is included in your class, and which of these difficulties you have resolved or not.
Loosely speaking, no, you haven’t solved problems of noise in your predictors, only the problem of noise in your responses.
And the cost is that you now have a loss function with some extra arbitrary parameters in which you have to justify, which is anathema to frequentists, who like to claim to be less arbitrary than Bayesians. You then have to justify why you chose that loss function and its particular parameterisation. There are various procedures to choose these parameters, based on scale estimation.
MMestimation
TBD. Don’t know
Medianbased estimators
Rousseeuw and Yohai’s school. (RoYo84)
Many permutations on the theme here, but it rapidly gets complex. The only one of these families I have looked into are the near trivial cases of the Least Median Of Squares and Least Trimmed Squares estimations. (Rous84)
More broadly we should also consider Sestimators, which do something with… robust estimation of scale and using this to do robust estimation of location? TBD.
TheilSen(Oja) estimators: Something about medians of inferred regression slopes. TBD.
Tukey median, and why noone uses it what with it being NPHard.
Others
RANSAC  some kind of randomised outlier detection estimator. TBD.
Refs
 MaRo97: M. Markatou, E. Ronchetti (1997) 3 Robust inference: The approach based on influence functions. In Handbook of Statistics (Vol. 15, pp. 49–75). Elsevier DOI
 ThCl13: Kukatharmini Tharmaratnam, Gerda Claeskens (2013) A comparison of robust versions of the AIC based on M, S and MMestimators. Statistics, 47(1), 216–235. DOI
 BuNo95: P. Burman, D. Nolan (1995) A general Akaiketype criterion for model selection in robust regression. Biometrika, 82(4), 877–886. DOI
 MoNS13: Elchanan Mossel, Joe Neeman, Allan Sly (2013) A Proof Of The Block Model Threshold Conjecture. ArXiv:1311.4115 [Cs, Math].
 Thei92: Henri Theil (1992) A RankInvariant Method of Linear and Polynomial Regression Analysis. In Henri Theil’s Contributions to Economics and Econometrics (pp. 345–381). Springer Netherlands DOI
 GoNu90: Grigori K. Golubev, Michael Nussbaum (1990) A Risk Bound in Sobolev Class Regression. The Annals of Statistics, 18(2), 758–778. DOI
 YaXu13: Wenzhuo Yang, Huan Xu (2013) A Unified Robust Regression Model for Lassolike Algorithms. In ICML (3) (pp. 585–593).
 CzRo10: Veronika Czellar, Elvezio Ronchetti (2010) Accurate and robust tests for indirect inference. Biometrika, 97(3), 621–630. DOI
 KoKi03: Sadanori Konishi, Genshiro Kitagawa (2003) Asymptotic theory for information criteria in model selection—functional approach. Journal of Statistical Planning and Inference, 114(1–2), 45–61. DOI
 MoNS16: Elchanan Mossel, Joe Neeman, Allan Sly (2016) Belief propagation, robust reconstruction and optimal recovery of block models. The Annals of Applied Probability, 26(4), 2211–2256. DOI
 JaGe16: Jana Janková, Sara van de Geer (2016) Confidence regions for highdimensional generalized linear models under sparsity. ArXiv:1610.01353 [Math, Stat].
 Oja83: Hannu Oja (1983) Descriptive statistics for multivariate distributions. Statistics & Probability Letters, 1(6), 327–332. DOI
 Bera81: Rudolf Beran (1981) Efficient robust estimates in parametric models. Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete, 55(1), 91–108. DOI
 GhBa16: Abhik Ghosh, Ayanendranath Basu (2016) General Model Adequacy Tests and Robust Statistical Inference Based on A New Family of Divergences. ArXiv:1611.05224 [Math, Stat].
 KoKi96: Sadanori Konishi, Genshiro Kitagawa (1996) Generalised information criteria in model selection. Biometrika, 83(4), 875–890. DOI
 DoMo13: David L. Donoho, Andrea Montanari (2013) High Dimensional Robust MEstimation: Asymptotic Variance via Approximate Message Passing. ArXiv:1310.7320 [Cs, Math, Stat].
 KoKi08: Sadanori Konishi, G. Kitagawa (2008) Information criteria and statistical modeling. New York: Springer
 MaKP98: J. H. Manton, V. Krishnamurthy, H. V. Poor (1998) JamesStein state filtering algorithms. IEEE Transactions on Signal Processing, 46(9), 2431–2447. DOI
 BoKG10: Howard D. Bondell, Arun Krishna, Sujit K. Ghosh (2010) Joint Variable Selection for Fixed and Random Effects in Linear MixedEffects Models. Biometrics, 66(4), 1069–1077. DOI
 MKRL86: Desire L. Massart, Leonard Kaufman, Peter J. Rousseeuw, Annick Leroy (1986) Least median of squares: a robust method for outlier and model error detection in regression and calibration. Analytica Chimica Acta, 187, 171–179. DOI
 Rous84: Peter J. Rousseeuw (1984) Least Median of Squares Regression. Journal of the American Statistical Association, 79(388), 871–880. DOI
 Roya86: Richard M. Royall (1986) Model Robust Confidence Intervals Using Maximum Likelihood Estimators. International Statistical Review / Revue Internationale de Statistique, 54(2), 221–226. DOI
 Barn83: O. BarndorffNielsen (1983) On a formula for the distribution of the maximum likelihood estimator. Biometrika, 70(2), 343–365. DOI
 Qian96: Guoqi Qian (1996) On model selection in robust linear regression
 QiKü98: Guoqi Qian, Hans R. Künsch (1998) On model selection via stochastic complexity in robust linear regression. Journal of Statistical Planning and Inference, 75(1), 91–116. DOI
 LuGF12: W. LU, Y. GOLDBERG, J. P. FINE (2012) On the robustness of the adaptive lasso to model misspecification. Biometrika, 99(3), 717–731. DOI
 Bick75: P. J. Bickel (1975) OneStep Huber Estimates in the Linear Model. Journal of the American Statistical Association, 70(350), 428–434. DOI
 Wedd74: R. W. M. Wedderburn (1974) Quasilikelihood functions, generalized linear models, and the Gauss—Newton method. Biometrika, 61(3), 439–447. DOI
 MaZa02: Ricardo A. Maronna, Ruben H. Zamar (2002) Robust Estimates of Location and Dispersion for HighDimensional Datasets. Technometrics, 44(4), 307–317.
 Bera82: Rudolf Beran (1982) Robust Estimation in Models for Independent NonIdentically Distributed Data. The Annals of Statistics, 10(2), 415–428. DOI
 Hube64: Peter J. Huber (1964) Robust Estimation of a Location Parameter. The Annals of Mathematical Statistics, 35(1), 73–101. DOI
 MaYo14: Ricardo A. Maronna, Víctor J. Yohai (2014) Robust Estimation of Multivariate Location and Scatter. In Wiley StatsRef: Statistics Reference Online. John Wiley & Sons, Ltd
 GeRo03: Marc G Genton, Elvezio Ronchetti (2003) Robust Indirect Inference. Journal of the American Statistical Association, 98(461), 67–76. DOI
 Ronc97: Elvezio Ronchetti (1997) Robust inference by influence functions. Journal of Statistical Planning and Inference, 57(1), 59–72. DOI
 CaRo01: Eva Cantoni, Elvezio Ronchetti (2001) Robust Inference for Generalized Linear Models. Journal of the American Statistical Association, 96(455), 1022–1030. DOI
 RoTr01: Elvezio Ronchetti, Fabio Trojani (2001) Robust inference with GMM estimators. Journal of Econometrics, 101(1), 37–69. DOI
 Maro76: Ricardo Antonio Maronna (1976) Robust MEstimators of Multivariate Location and Scatter. The Annals of Statistics, 4(1), 51–67.
 Mach93: José A.F. Machado (1993) Robust Model Selection and MEstimation. Econometric Theory, 9(03), 478–493. DOI
 Ronc85: Elvezio Ronchetti (1985) Robust model selection in regression. Statistics & Probability Letters, 3(1), 21–23. DOI
 Tsou06: TsungShan Tsou (2006) Robust Poisson regression. Journal of Statistical Planning and Inference, 136(9), 3173–3186. DOI
 XuCM10: H. Xu, C. Caramanis, S. Mannor (2010) Robust Regression and Lasso. IEEE Transactions on Information Theory, 56(7), 3561–3574. DOI
 RoLe87: Peter J. Rousseeuw, Annick M. Leroy (1987) Robust regression and outlier detection. New York: Wiley
 RoYo84: P. Rousseeuw, V. Yohai (1984) Robust Regression by Means of SEstimators. In Robust and Nonlinear Time Series Analysis (pp. 256–272). Springer US DOI
 Ronc00: E. Ronchetti (2000) Robust Regression Methods and Model Selection. In Data Segmentation and Model Selection for Computer Vision (pp. 31–40). Springer New York DOI
 Hube09: Peter J. Huber (2009) Robust statistics. Hoboken, N.J: Wiley
 Bühl14: Peter Bühlmann (2014) Robust Statistics. In Selected Works of Peter J. Bickel (pp. 51–98). Springer New York
 HRRS11: Frank R. Hampel, Elvezio M. Ronchetti, Peter J. Rousseeuw, Werner A. Stahel (2011) Robust Statistics: The Approach Based on Influence Functions. John Wiley & Sons
 MaMY06: Ricardo A. Maronna, Douglas Martin, Víctor J. Yohai (2006) Robust statistics: theory and methods. Chichester: Wiley
 QiHa96: Guoqi Qian, R. K. Hans (1996) Some notes on Rissanen’s stochastic complexity
 Cox83: D. R. Cox (1983) Some remarks on overdispersion. Biometrika, 70(1), 269–274. DOI
 KMMN13: Florent Krzakala, Cristopher Moore, Elchanan Mossel, Joe Neeman, Allan Sly, Lenka Zdeborová, Pan Zhang (2013) Spectral redemption in clustering sparse networks. Proceedings of the National Academy of Sciences, 110(52), 20935–20940. DOI
 DuGN16: John Duchi, Peter Glynn, Hongseok Namkoong (2016) Statistics of Robust Optimization: A Generalized Empirical Likelihood Approach. ArXiv:1610.03425 [Stat].
 DoLi88: David L. Donoho, Richard C. Liu (1988) The “Automatic” Robustness of Minimum Distance Functionals. The Annals of Statistics, 16(2), 552–586. DOI
 MaYo95: Ricardo A. Maronna, Víctor J. Yohai (1995) The Behavior of the StahelDonoho Robust Multivariate Estimator. Journal of the American Statistical Association, 90(429), 330–341. DOI
 Stig10: Stephen M. Stigler (2010) The Changing History of Robustness. The American Statistician, 64(4), 277–281. DOI
 Hamp74: Frank R. Hampel (1974) The Influence Curve and its Role in Robust Estimation. Journal of the American Statistical Association, 69(346), 383–393. DOI
 DoHu83: David L. Donoho, Peter J. Huber (1983) The notion of breakdown point. A Festschrift for Erich L. Lehmann, 157184.