# Robust statistics

Techniques to improve the failure modes of your estimates. Surprisingly rarely used despite being fairly straightforward.

This is more-or-less a frequentist project.

Bayesians seem to claim to achieve robustness largely by choosing heavy-tailed priors where they might have chosen light-tailed ones, e.g. Laplacian priors instead of Gaussian ones. Such priors may have arbitrary parameters, but not more arbitrary than usual in Bayesian statistics and therefore do not attract so much need to rationalise away the guilt.

## TODO

• relation to penalized regression.
• connection with Lasso.
• Beran’s Hellinger-ball contamination model, which I also don’t yet understand.
• Breakdown point explanation
• glm connection.

## Corruption models

1. (Adversarial) total variation $\epsilon$-corruption.
2. Random (mixture) corruption
3. other?

## M-estimation with robust loss

The one that I, at least, would think of when considering robust estimation.

In M-estimation, instead of hunting an maximum of the likelihood function as you do in maximum likelihood, or an minimum of the sum of squared residuals, as you do in least-squares estimation, you minimised a specifically choses loss funciton for those residuals. You may select an objective function more robust to deviations between your model and reality. Credited to Huber (Hube64).

See M-estimation for the details

Aside: AFAICT, the definition of M-estimation includes the possibility that you could in principle select a less-robust loss function than least sum-of-squares or negative log likelihood, but I have not seen this in the literature. Generally, some robustified approach is presumed.

For M-estimation as robust estimation, various complications ensue, such as the different between noise in your predictors, noise in your regressors, and whether the “true” model is included in your class, and which of these difficulties you have resolved or not.

Loosely speaking, no, you haven’t solved problems of noise in your predictors, only the problem of noise in your responses.

And the cost is that you now have a loss function with some extra arbitrary parameters in which you have to justify, which is anathema to frequentists, who like to claim to be less arbitrary than Bayesians. You then have to justify why you chose that loss function and its particular parameterisation. There are various procedures to choose these parameters, based on scale estimation.

TBD. Don’t know

## Median-based estimators

Rousseeuw and Yohai’s school. (RoYo84)

Many permutations on the theme here, but it rapidly gets complex. The only one of these families I have looked into are the near trivial cases of the Least Median Of Squares and Least Trimmed Squares estimations. (Rous84)

More broadly we should also consider S-estimators, which do something with… robust estimation of scale and using this to do robust estimation of location? TBD.

Theil-Sen-(Oja) estimators: Something about medians of inferred regression slopes. TBD.

Tukey median, and why no-one uses it what with it being NP-Hard.

## Others

RANSAC - some kind of randomised outlier detection estimator. TBD.

## Refs

Barn83
Barndorff-Nielsen, O. (1983) On a formula for the distribution of the maximum likelihood estimator. Biometrika, 70(2), 343–365. DOI.
Bera81
Beran, R. (1981) Efficient robust estimates in parametric models. Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete, 55(1), 91–108. DOI.
Bera82
Beran, R. (1982) Robust Estimation in Models for Independent Non-Identically Distributed Data. The Annals of Statistics, 10(2), 415–428. DOI.
Bick75
Bickel, P. J.(1975) One-Step Huber Estimates in the Linear Model. Journal of the American Statistical Association, 70(350), 428–434. DOI.
BoKG10
Bondell, H. D., Krishna, A., & Ghosh, S. K.(2010) Joint Variable Selection for Fixed and Random Effects in Linear Mixed-Effects Models. Biometrics, 66(4), 1069–1077. DOI.
Bühl14
Bühlmann, P. (2014) Robust Statistics. In J. Fan, Y. Ritov, & C. F. J. Wu (Eds.), Selected Works of Peter J. Bickel (pp. 51–98). Springer New York
BBBG14
Buja, A., Berk, R., Brown, L., George, E., Pitkin, E., Traskin, M., … Zhao, L. (2014) Models as Approximations: How Random Predictors and Model Violations Invalidate Classical Inference in Regression. arXiv:1404.1578 [stat].
BuNo95
Burman, P., & Nolan, D. (1995) A general Akaike-type criterion for model selection in robust regression. Biometrika, 82(4), 877–886. DOI.
CaRo01
Cantoni, E., & Ronchetti, E. (2001) Robust Inference for Generalized Linear Models. Journal of the American Statistical Association, 96(455), 1022–1030. DOI.
Cox83
Cox, D. R.(1983) Some remarks on overdispersion. Biometrika, 70(1), 269–274. DOI.
CzRo10
Czellar, V., & Ronchetti, E. (2010) Accurate and robust tests for indirect inference. Biometrika, 97(3), 621–630. DOI.
DPWZ08
Dang, X., Peng, H., Wang, X., & Zhang, H. (2008) Theil-Sen Estimators in a Multiple Linear Regression Model. . Citeseer
DoHu83
Donoho, D. L., & Huber, P. J.(1983) The notion of breakdown point. A Festschrift for Erich L. Lehmann, 157184.
DoLi88
Donoho, D. L., & Liu, R. C.(1988) The “Automatic” Robustness of Minimum Distance Functionals. The Annals of Statistics, 16(2), 552–586. DOI.
DoMo13
Donoho, D., & Montanari, A. (2013) High Dimensional Robust M-Estimation: Asymptotic Variance via Approximate Message Passing. arXiv:1310.7320 [cs, Math, Stat].
GeRo03
Genton, M. G., & Ronchetti, E. (2003) Robust Indirect Inference. Journal of the American Statistical Association, 98(461), 67–76. DOI.
GoNu90
Golubev, G. K., & Nussbaum, M. (1990) A Risk Bound in Sobolev Class Regression. The Annals of Statistics, 18(2), 758–778. DOI.
Hamp74
Hampel, F. R.(1974) The Influence Curve and its Role in Robust Estimation. Journal of the American Statistical Association, 69(346), 383–393. DOI.
HRRS11
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., & Stahel, W. A.(2011) Robust Statistics: The Approach Based on Influence Functions. . John Wiley & Sons
Hoss09
Hosseinian, Sahar. (2009) Robust inference for generalized linear models: binary and poisson regression. . ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
Hube64
Huber, P. J.(1964) Robust Estimation of a Location Parameter. The Annals of Mathematical Statistics, 35(1), 73–101. DOI.
Hube09
Huber, P. J.(2009) Robust statistics. (2nd ed.). Hoboken, N.J: Wiley
JaGe16
Janková, J., & van de Geer, S. (2016) Confidence regions for high-dimensional generalized linear models under sparsity. arXiv:1610.01353 [math, Stat].
Karo13
Karoui, N. E.(2013) Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators : rigorous results. arXiv:1311.2445 [math, Stat].
KoKi96
Konishi, S., & Kitagawa, G. (1996) Generalised information criteria in model selection. Biometrika, 83(4), 875–890. DOI.
KoKi03
Konishi, S., & Kitagawa, G. (2003) Asymptotic theory for information criteria in model selection—functional approach. Journal of Statistical Planning and Inference, 114(1–2), 45–61. DOI.
KoKi08
Konishi, S., & Kitagawa, G. (2008) Information criteria and statistical modeling. . New York: Springer
LuGF12
LU, W., GOLDBERG, Y., & FINE, J. P.(2012) On the robustness of the adaptive lasso to model misspecification. Biometrika, 99(3), 717–731. DOI.
Mach93
Machado, J. A. F.(1993) Robust Model Selection and M-Estimation. Econometric Theory, 9(03), 478–493. DOI.
MaRo97
Markatou, M., & Ronchetti, E. (1997) 3 Robust inference: The approach based on influence functions. In B.-H. of Statistics (Ed.), (Vol. 15, pp. 49–75). Elsevier
Maro76
Maronna, R. A.(1976) Robust M-Estimators of Multivariate Location and Scatter. The Annals of Statistics, 4(1), 51–67.
MaYo95
Maronna, R. A., & Yohai, V. J.(1995) The Behavior of the Stahel-Donoho Robust Multivariate Estimator. Journal of the American Statistical Association, 90(429), 330–341. DOI.
MaYo14
Maronna, R. A., & Yohai, V. J.(2014) Robust Estimation of Multivariate Location and Scatter. In Wiley StatsRef: Statistics Reference Online. John Wiley & Sons, Ltd
MaZa02
Maronna, R. A., & Zamar, R. H.(2002) Robust Estimates of Location and Dispersion for High-Dimensional Datasets. Technometrics, 44(4), 307–317.
MaMY00
Maronna, R., Martin, D., & Yohai, V. (n.d.) Robust statistics.
MKRL86
Massart, D. L., Kaufman, L., Rousseeuw, P. J., & Leroy, A. (1986) Least median of squares: a robust method for outlier and model error detection in regression and calibration. Analytica Chimica Acta, 187, 171–179. DOI.
Oja83
Oja, H. (1983) Descriptive statistics for multivariate distributions. Statistics & Probability Letters, 1(6), 327–332. DOI.
PeDe08
Petersen, M. R., & Deddens, J. A.(2008) A comparison of two methods for estimating prevalence ratios. BMC Medical Research Methodology, 8, 9. DOI.
Qian96
Qian, G. (1996) On model selection in robust linear regression.
QiHa96
Qian, G., & Hans, R. K.(1996) Some notes on Rissanen’s stochastic complexity.
QiKü98
Qian, G., & Künsch, H. R.(1998) On model selection via stochastic complexity in robust linear regression. Journal of Statistical Planning and Inference, 75(1), 91–116. DOI.
Ronc85
Ronchetti, E. (1985) Robust model selection in regression. Statistics & Probability Letters, 3(1), 21–23. DOI.
Ronc97
Ronchetti, E. (1997) Robust inference by influence functions. Journal of Statistical Planning and Inference, 57(1), 59–72. DOI.
Ronc00
Ronchetti, E. (2000) Robust Regression Methods and Model Selection. In A. Bab-Hadiashar & D. Suter (Eds.), Data Segmentation and Model Selection for Computer Vision (pp. 31–40). Springer New York
RoTr01
Ronchetti, E., & Trojani, F. (2001) Robust inference with GMM estimators. Journal of Econometrics, 101(1), 37–69. DOI.
Rous84
Rousseeuw, P. J.(1984) Least Median of Squares Regression. Journal of the American Statistical Association, 79(388), 871–880. DOI.
RoLe87
Rousseeuw, P. J., & Leroy, A. M.(1987) Robust regression and outlier detection. . New York: Wiley
RoYo84
Rousseeuw, P., & Yohai, V. (1984) Robust Regression by Means of S-Estimators. In J. Franke, W. Härdle, & D. Martin (Eds.), Robust and Nonlinear Time Series Analysis (pp. 256–272). Springer US
Roya86
Royall, R. M.(1986) Model Robust Confidence Intervals Using Maximum Likelihood Estimators. International Statistical Review / Revue Internationale de Statistique, 54(2), 221–226. DOI.
Stig10
Stigler, S. M.(2010) The Changing History of Robustness. The American Statistician, 64(4), 277–281. DOI.
ThCl13
Tharmaratnam, K., & Claeskens, G. (2013) A comparison of robust versions of the AIC based on M-, S- and MM-estimators. Statistics, 47(1), 216–235. DOI.
Thei92
Theil, H. (1992) A Rank-Invariant Method of Linear and Polynomial Regression Analysis. In B. Raj & J. Koerts (Eds.), Henri Theil’s Contributions to Economics and Econometrics (pp. 345–381). Springer Netherlands
Tsou06
Tsou, T.-S. (2006) Robust Poisson regression. Journal of Statistical Planning and Inference, 136(9), 3173–3186. DOI.
Wedd74
Wedderburn, R. W. M.(1974) Quasi-likelihood functions, generalized linear models, and the Gauss—Newton method. Biometrika, 61(3), 439–447. DOI.
XuCM10
Xu, H., Caramanis, C., & Mannor, S. (2010) Robust Regression and Lasso. IEEE Transactions on Information Theory, 56(7), 3561–3574. DOI.
YaXu13
Yang, W., & Xu, H. (2013) A Unified Robust Regression Model for Lasso-like Algorithms. In ICML (3) (pp. 585–593).