The Living Thing / Notebooks :

Robust statistics

Techniques to improve the failure modes of your estimates. Surprisingly rarely used despite being fairly straightforward.

This is more-or-less a frequentist project.

Bayesians seem to claim to achieve robustness largely by choosing heavy-tailed priors where they might have chosen light-tailed ones, e.g. Laplacian priors instead of Gaussian ones. Such priors may have arbitrary parameters, but not more arbitrary than usual in Bayesian statistics and therefore do not attract so much need to rationalise away the guilt.


Corruption models

  1. (Adversarial) total variation \(\epsilon\)-corruption.
  2. Random (mixture) corruption
  3. other?

M-estimation with robust loss

The one that I, at least, would think of when considering robust estimation.

In M-estimation, instead of hunting an maximum of the likelihood function as you do in maximum likelihood, or an minimum of the sum of squared residuals, as you do in least-squares estimation, you minimised a specifically choses loss funciton for those residuals. You may select an objective function more robust to deviations between your model and reality. Credited to Huber (Hube64).

See M-estimation for the details

Aside: AFAICT, the definition of M-estimation includes the possibility that you could in principle select a less-robust loss function than least sum-of-squares or negative log likelihood, but I have not seen this in the literature. Generally, some robustified approach is presumed.

For M-estimation as robust estimation, various complications ensue, such as the different between noise in your predictors, noise in your regressors, and whether the “true” model is included in your class, and which of these difficulties you have resolved or not.

Loosely speaking, no, you haven’t solved problems of noise in your predictors, only the problem of noise in your responses.

And the cost is that you now have a loss function with some extra arbitrary parameters in which you have to justify, which is anathema to frequentists, who like to claim to be less arbitrary than Bayesians. You then have to justify why you chose that loss function and its particular parameterisation. There are various procedures to choose these parameters, based on scale estimation.


TBD. Don’t know

Median-based estimators

Rousseeuw and Yohai’s school. (RoYo84)

Many permutations on the theme here, but it rapidly gets complex. The only one of these families I have looked into are the near trivial cases of the Least Median Of Squares and Least Trimmed Squares estimations. (Rous84)

More broadly we should also consider S-estimators, which do something with… robust estimation of scale and using this to do robust estimation of location? TBD.

Theil-Sen-(Oja) estimators: Something about medians of inferred regression slopes. TBD.

Tukey median, and why no-one uses it what with it being NP-Hard.


RANSAC - some kind of randomised outlier detection estimator. TBD.


Barndorff-Nielsen, O. (1983) On a formula for the distribution of the maximum likelihood estimator. Biometrika, 70(2), 343–365. DOI.
Beran, R. (1981) Efficient robust estimates in parametric models. Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete, 55(1), 91–108. DOI.
Beran, R. (1982) Robust Estimation in Models for Independent Non-Identically Distributed Data. The Annals of Statistics, 10(2), 415–428. DOI.
Bickel, P. J.(1975) One-Step Huber Estimates in the Linear Model. Journal of the American Statistical Association, 70(350), 428–434. DOI.
Bondell, H. D., Krishna, A., & Ghosh, S. K.(2010) Joint Variable Selection for Fixed and Random Effects in Linear Mixed-Effects Models. Biometrics, 66(4), 1069–1077. DOI.
Bühlmann, P. (2014) Robust Statistics. In J. Fan, Y. Ritov, & C. F. J. Wu (Eds.), Selected Works of Peter J. Bickel (pp. 51–98). Springer New York
Buja, A., Berk, R., Brown, L., George, E., Pitkin, E., Traskin, M., … Zhao, L. (2014) Models as Approximations: How Random Predictors and Model Violations Invalidate Classical Inference in Regression. arXiv:1404.1578 [stat].
Burman, P., & Nolan, D. (1995) A general Akaike-type criterion for model selection in robust regression. Biometrika, 82(4), 877–886. DOI.
Cantoni, E., & Ronchetti, E. (2001) Robust Inference for Generalized Linear Models. Journal of the American Statistical Association, 96(455), 1022–1030. DOI.
Cox, D. R.(1983) Some remarks on overdispersion. Biometrika, 70(1), 269–274. DOI.
Czellar, V., & Ronchetti, E. (2010) Accurate and robust tests for indirect inference. Biometrika, 97(3), 621–630. DOI.
Dang, X., Peng, H., Wang, X., & Zhang, H. (2008) Theil-Sen Estimators in a Multiple Linear Regression Model. . Citeseer
Donoho, D. L., & Huber, P. J.(1983) The notion of breakdown point. A Festschrift for Erich L. Lehmann, 157184.
Donoho, D. L., & Liu, R. C.(1988) The “Automatic” Robustness of Minimum Distance Functionals. The Annals of Statistics, 16(2), 552–586. DOI.
Donoho, D., & Montanari, A. (2013) High Dimensional Robust M-Estimation: Asymptotic Variance via Approximate Message Passing. arXiv:1310.7320 [cs, Math, Stat].
Genton, M. G., & Ronchetti, E. (2003) Robust Indirect Inference. Journal of the American Statistical Association, 98(461), 67–76. DOI.
Golubev, G. K., & Nussbaum, M. (1990) A Risk Bound in Sobolev Class Regression. The Annals of Statistics, 18(2), 758–778. DOI.
Hampel, F. R.(1974) The Influence Curve and its Role in Robust Estimation. Journal of the American Statistical Association, 69(346), 383–393. DOI.
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., & Stahel, W. A.(2011) Robust Statistics: The Approach Based on Influence Functions. . John Wiley & Sons
Hosseinian, Sahar. (2009) Robust inference for generalized linear models: binary and poisson regression. . ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
Huber, P. J.(1964) Robust Estimation of a Location Parameter. The Annals of Mathematical Statistics, 35(1), 73–101. DOI.
Huber, P. J.(2009) Robust statistics. (2nd ed.). Hoboken, N.J: Wiley
Janková, J., & van de Geer, S. (2016) Confidence regions for high-dimensional generalized linear models under sparsity. arXiv:1610.01353 [math, Stat].
Karoui, N. E.(2013) Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators : rigorous results. arXiv:1311.2445 [math, Stat].
Konishi, S., & Kitagawa, G. (1996) Generalised information criteria in model selection. Biometrika, 83(4), 875–890. DOI.
Konishi, S., & Kitagawa, G. (2003) Asymptotic theory for information criteria in model selection—functional approach. Journal of Statistical Planning and Inference, 114(1–2), 45–61. DOI.
Konishi, S., & Kitagawa, G. (2008) Information criteria and statistical modeling. . New York: Springer
LU, W., GOLDBERG, Y., & FINE, J. P.(2012) On the robustness of the adaptive lasso to model misspecification. Biometrika, 99(3), 717–731. DOI.
Machado, J. A. F.(1993) Robust Model Selection and M-Estimation. Econometric Theory, 9(03), 478–493. DOI.
Markatou, M., & Ronchetti, E. (1997) 3 Robust inference: The approach based on influence functions. In B.-H. of Statistics (Ed.), (Vol. 15, pp. 49–75). Elsevier
Maronna, R. A.(1976) Robust M-Estimators of Multivariate Location and Scatter. The Annals of Statistics, 4(1), 51–67.
Maronna, R. A., & Yohai, V. J.(1995) The Behavior of the Stahel-Donoho Robust Multivariate Estimator. Journal of the American Statistical Association, 90(429), 330–341. DOI.
Maronna, R. A., & Yohai, V. J.(2014) Robust Estimation of Multivariate Location and Scatter. In Wiley StatsRef: Statistics Reference Online. John Wiley & Sons, Ltd
Maronna, R. A., & Zamar, R. H.(2002) Robust Estimates of Location and Dispersion for High-Dimensional Datasets. Technometrics, 44(4), 307–317.
Maronna, R., Martin, D., & Yohai, V. (n.d.) Robust statistics.
Massart, D. L., Kaufman, L., Rousseeuw, P. J., & Leroy, A. (1986) Least median of squares: a robust method for outlier and model error detection in regression and calibration. Analytica Chimica Acta, 187, 171–179. DOI.
Oja, H. (1983) Descriptive statistics for multivariate distributions. Statistics & Probability Letters, 1(6), 327–332. DOI.
Petersen, M. R., & Deddens, J. A.(2008) A comparison of two methods for estimating prevalence ratios. BMC Medical Research Methodology, 8, 9. DOI.
Qian, G. (1996) On model selection in robust linear regression.
Qian, G., & Hans, R. K.(1996) Some notes on Rissanen’s stochastic complexity.
Qian, G., & Künsch, H. R.(1998) On model selection via stochastic complexity in robust linear regression. Journal of Statistical Planning and Inference, 75(1), 91–116. DOI.
Ronchetti, E. (1985) Robust model selection in regression. Statistics & Probability Letters, 3(1), 21–23. DOI.
Ronchetti, E. (1997) Robust inference by influence functions. Journal of Statistical Planning and Inference, 57(1), 59–72. DOI.
Ronchetti, E. (2000) Robust Regression Methods and Model Selection. In A. Bab-Hadiashar & D. Suter (Eds.), Data Segmentation and Model Selection for Computer Vision (pp. 31–40). Springer New York
Ronchetti, E., & Trojani, F. (2001) Robust inference with GMM estimators. Journal of Econometrics, 101(1), 37–69. DOI.
Rousseeuw, P. J.(1984) Least Median of Squares Regression. Journal of the American Statistical Association, 79(388), 871–880. DOI.
Rousseeuw, P. J., & Leroy, A. M.(1987) Robust regression and outlier detection. . New York: Wiley
Rousseeuw, P., & Yohai, V. (1984) Robust Regression by Means of S-Estimators. In J. Franke, W. Härdle, & D. Martin (Eds.), Robust and Nonlinear Time Series Analysis (pp. 256–272). Springer US
Royall, R. M.(1986) Model Robust Confidence Intervals Using Maximum Likelihood Estimators. International Statistical Review / Revue Internationale de Statistique, 54(2), 221–226. DOI.
Stigler, S. M.(2010) The Changing History of Robustness. The American Statistician, 64(4), 277–281. DOI.
Tharmaratnam, K., & Claeskens, G. (2013) A comparison of robust versions of the AIC based on M-, S- and MM-estimators. Statistics, 47(1), 216–235. DOI.
Theil, H. (1992) A Rank-Invariant Method of Linear and Polynomial Regression Analysis. In B. Raj & J. Koerts (Eds.), Henri Theil’s Contributions to Economics and Econometrics (pp. 345–381). Springer Netherlands
Tsou, T.-S. (2006) Robust Poisson regression. Journal of Statistical Planning and Inference, 136(9), 3173–3186. DOI.
Wedderburn, R. W. M.(1974) Quasi-likelihood functions, generalized linear models, and the Gauss—Newton method. Biometrika, 61(3), 439–447. DOI.
Xu, H., Caramanis, C., & Mannor, S. (2010) Robust Regression and Lasso. IEEE Transactions on Information Theory, 56(7), 3561–3574. DOI.
Yang, W., & Xu, H. (2013) A Unified Robust Regression Model for Lasso-like Algorithms. In ICML (3) (pp. 585–593).