(Outlier) robust statistics

November 25, 2014 — January 21, 2022

functional analysis

likelihood

optimization

statistics

Terminology note: I mean robust statistics in the sense of Huber, which is, informally, outlier robustness.

There are also robust estimators in econometrics; then it means something about good behaviour under heteroskedastic and/or correlated error. Robust Bayes means something about inference that is robust to the choice of prior (which could overlap but is a rather different emphasis).

Outlier robustness is AFAICT more-or-less a frequentist project. Bayesian approaches seem to achieve robustness largely by choosing heavy-tailed priors or heavy-tailed noise distributions where they might have chosen light-tailed ones, e.g. Laplacian distributions instead of Gaussian ones. Such heavy-tailed distributions may have arbitrary prior parameters, but not more arbitrary than usual in Bayesian statistics and therefore do not attract so much need to wash away the guilt as frequentists seem to feel.

One can off course use heavy-tailed noise distributions in frequentist inference as well and that will buy a kind of robustness. That seems to be unpopular due to making frequentist inference as difficult as Bayesian inference.

1 Corruption models

Random (mixture) corruption
(Adversarial) total variation \(\epsilon\)-corruption.
wasserstein corruption models (does one usually assume adversarial here or random) as seen in “distributionally robust” models.
other?

2 M-estimation with robust loss

The one that I, at least, would think of when considering robust estimation.

In M-estimation, instead of hunting a maximum of the likelihood function as you do in maximum likelihood, or a minimum of the sum of squared residuals, as you do in least-squares estimation, you minimise a specifically chosen loss function for those residuals. You may select an objective function more robust to deviations between your model and reality. Credited to Huber (1964).

See M-estimation for some details.

AFAICT, the definition of M-estimation includes the possibility that you could in principle select a less-robust loss function than least sum-of-squares but I have not seen this in the literature. Generally, some robustified approach is presumed, which penalises outliers less severly than least-squares.

For M-estimation as robust estimation, various complications ensue, such as the different between noise in your predictors, noise in your regressors, and whether the “true” model is included in your class, and which of these difficulties you have resolved or not.

Loosely speaking, no, you haven’t solved problems of noise in your predictors, only the problem of noise in your responses.

And the cost is that you now have a loss function with some extra arbitrary parameters in which you have to justify, which is anathema to frequentists, who like to claim to be less arbitrary than Bayesians.

2.1 Huber loss

2.2 Tukey loss

What is the Tukey loss function?

3 MM-estimation

🏗 Don’t know

4 Median-based estimators

Rousseeuw and Yohai’s idea (P. Rousseeuw and Yohai 1984)

Many permutations on the theme here, but it rapidly gets complex. The only one of these families I have looked into are the near trivial cases of the Least Median Of Squares and Least Trimmed Squares estimations. (P. J. Rousseeuw 1984) More broadly we should also consider S-estimators, which do something with… robust estimation of scale and using this to do robust estimation of location? 🏗

Theil-Sen-(Oja) estimators: Something about medians of inferred regression slopes. 🏗

Tukey median, and why no-one uses it what with it being NP-Hard.

5 Others

RANSAC — some kind of randomised outlier detection estimator. 🏗

6 Incoming

relation to penalized regression.
connection with Lasso.
Beran’s Hellinger-ball contamination model, which I also don’t yet understand.
Breakdown point explanation
Yet Another Math Programming Consultant: Huber regression: different formulations

7 References

Barndorff-Nielsen. 1983. “On a Formula for the Distribution of the Maximum Likelihood Estimator.” Biometrika.

Beran. 1981. “Efficient Robust Estimates in Parametric Models.” Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete.

———. 1982. “Robust Estimation in Models for Independent Non-Identically Distributed Data.” The Annals of Statistics.

Bickel. 1975. “One-Step Huber Estimates in the Linear Model.” Journal of the American Statistical Association.

Bondell, Krishna, and Ghosh. 2010. “Joint Variable Selection for Fixed and Random Effects in Linear Mixed-Effects Models.” Biometrics.

Bühlmann. 2014. “Robust Statistics.” In Selected Works of Peter J. Bickel. Selected Works in Probability and Statistics 13.

Burman, and Nolan. 1995. “A General Akaike-Type Criterion for Model Selection in Robust Regression.” Biometrika.

Cantoni, and Ronchetti. 2001. “Robust Inference for Generalized Linear Models.” Journal of the American Statistical Association.

Charikar, Steinhardt, and Valiant. 2016. “Learning from Untrusted Data.” arXiv:1611.02315 [Cs, Math, Stat].

Cox. 1983. “Some Remarks on Overdispersion.” Biometrika.

Czellar, and Ronchetti. 2010. “Accurate and Robust Tests for Indirect Inference.” Biometrika.

Davison, and Ortiz. 2019. “FutureMapping 2: Gaussian Belief Propagation for Spatial AI.” arXiv:1910.14139 [Cs].

Diakonikolas, Kamath, Kane, et al. 2016. “Robust Estimators in High Dimensions Without the Computational Intractability.” arXiv:1604.06443 [Cs, Math, Stat].

Diakonikolas, Kamath, Kane, et al. 2017. “Being Robust (in High Dimensions) Can Be Practical.” arXiv:1703.00893 [Cs, Math, Stat].

Donoho, and Huber. 1983. “The Notion of Breakdown Point.” A Festschrift for Erich L. Lehmann.

Donoho, and Liu. 1988. “The ‘Automatic’ Robustness of Minimum Distance Functionals.” The Annals of Statistics.

Donoho, and Montanari. 2013. “High Dimensional Robust M-Estimation: Asymptotic Variance via Approximate Message Passing.” arXiv:1310.7320 [Cs, Math, Stat].

Duchi, Glynn, and Namkoong. 2016. “Statistics of Robust Optimization: A Generalized Empirical Likelihood Approach.” arXiv:1610.03425 [Stat].

Genton, and Ronchetti. 2003. “Robust Indirect Inference.” Journal of the American Statistical Association.

Ghosh, and Basu. 2016. “General Model Adequacy Tests and Robust Statistical Inference Based on A New Family of Divergences.” arXiv:1611.05224 [Math, Stat].

Golubev, and Nussbaum. 1990. “A Risk Bound in Sobolev Class Regression.” The Annals of Statistics.

Hampel. 1974. “The Influence Curve and Its Role in Robust Estimation.” Journal of the American Statistical Association.

Hampel, Ronchetti, Rousseeuw, et al. 2011. Robust Statistics: The Approach Based on Influence Functions.

Holland, and Welsch. 1977. “Robust Regression Using Iteratively Reweighted Least-Squares.” Communications in Statistics - Theory and Methods.

Huang, and Lederer. 2022. “DeepMoM: Robust Deep Learning With Median-of-Means.” Journal of Computational and Graphical Statistics.

Huber. 1964. “Robust Estimation of a Location Parameter.” The Annals of Mathematical Statistics.

———. 2009. Robust Statistics. Wiley Series in Probability and Statistics.

Janková, and van de Geer. 2016. “Confidence Regions for High-Dimensional Generalized Linear Models Under Sparsity.” arXiv:1610.01353 [Math, Stat].

Konishi, and Kitagawa. 1996. “Generalised Information Criteria in Model Selection.” Biometrika.

———. 2003. “Asymptotic Theory for Information Criteria in Model Selection—Functional Approach.” Journal of Statistical Planning and Inference, C.R. Rao 80th Birthday Felicitation Volume, Part IV,.

Konishi, and Kitagawa. 2008. Information Criteria and Statistical Modeling. Springer Series in Statistics.

Krzakala, Moore, Mossel, et al. 2013. “Spectral Redemption in Clustering Sparse Networks.” Proceedings of the National Academy of Sciences.

Li. 2017. “Robust Sparse Estimation Tasks in High Dimensions.” arXiv:1702.05860 [Cs].

Lu, Goldberg, and Fine. 2012. “On the Robustness of the Adaptive Lasso to Model Misspecification.” Biometrika.

Machado. 1993. “Robust Model Selection and M-Estimation.” Econometric Theory.

Manton, Krishnamurthy, and Poor. 1998. “James-Stein State Filtering Algorithms.” IEEE Transactions on Signal Processing.

Markatou, Marianthi, Karlis, and Ding. 2021. “Distance-Based Statistical Inference.” Annual Review of Statistics and Its Application.

Markatou, M., and Ronchetti. 1997. “3 Robust Inference: The Approach Based on Influence Functions.” In Handbook of Statistics. Robust Inference.

Maronna, Ricardo Antonio. 1976. “Robust M-Estimators of Multivariate Location and Scatter.” The Annals of Statistics.

Maronna, Ricardo A., Martin, and Yohai. 2006. Robust statistics: theory and methods. Wiley series in probability and statistics.

Maronna, Ricardo A., and Yohai. 1995. “The Behavior of the Stahel-Donoho Robust Multivariate Estimator.” Journal of the American Statistical Association.

———. 2014. “Robust Estimation of Multivariate Location and Scatter.” In Wiley StatsRef: Statistics Reference Online.

Maronna, Ricardo A., and Zamar. 2002. “Robust Estimates of Location and Dispersion for High-Dimensional Datasets.” Technometrics.

Massart, Kaufman, Rousseeuw, et al. 1986. “Least Median of Squares: A Robust Method for Outlier and Model Error Detection in Regression and Calibration.” Analytica Chimica Acta.

Mossel, Neeman, and Sly. 2013. “A Proof Of The Block Model Threshold Conjecture.” arXiv:1311.4115 [Cs, Math].

———. 2016. “Belief Propagation, Robust Reconstruction and Optimal Recovery of Block Models.” The Annals of Applied Probability.

Oja. 1983. “Descriptive Statistics for Multivariate Distributions.” Statistics & Probability Letters.

Ortiz, Evans, and Davison. 2021. “A Visual Introduction to Gaussian Belief Propagation.” arXiv:2107.02308 [Cs].

Qian. 1996. “On Model Selection in Robust Linear Regression.”

Qian, and Hans. 1996. “Some Notes on Rissanen’s Stochastic Complexity.”

Qian, and Künsch. 1998. “On Model Selection via Stochastic Complexity in Robust Linear Regression.” Journal of Statistical Planning and Inference.

Ronchetti, Elvezio. 1985. “Robust Model Selection in Regression.” Statistics & Probability Letters.

———. 1997. “Robust Inference by Influence Functions.” Journal of Statistical Planning and Inference, Robust Statistics and Data Analysis, Part I,.

Ronchetti, E. 2000. “Robust Regression Methods and Model Selection.” In Data Segmentation and Model Selection for Computer Vision.

Ronchetti, Elvezio, and Trojani. 2001. “Robust Inference with GMM Estimators.” Journal of Econometrics.

Rousseeuw, Peter J. 1984. “Least Median of Squares Regression.” Journal of the American Statistical Association.

Rousseeuw, Peter J., and Leroy. 1987. Robust Regression and Outlier Detection. Wiley Series in Probability and Mathematical Statistics.

Rousseeuw, P., and Yohai. 1984. “Robust Regression by Means of S-Estimators.” In Robust and Nonlinear Time Series Analysis. Lecture Notes in Statistics 26.

Royall. 1986. “Model Robust Confidence Intervals Using Maximum Likelihood Estimators.” International Statistical Review / Revue Internationale de Statistique.

Stigler. 2010. “The Changing History of Robustness.” The American Statistician.

Street, Carroll, and Ruppert. 1988. “A Note on Computing Robust Regression Estimates via Iteratively Reweighted Least Squares.” The American Statistician.

Tharmaratnam, and Claeskens. 2013. “A Comparison of Robust Versions of the AIC Based on M-, S- and MM-Estimators.” Statistics.

Theil. 1992. “A Rank-Invariant Method of Linear and Polynomial Regression Analysis.” In Henri Theil’s Contributions to Economics and Econometrics. Advanced Studies in Theoretical and Applied Econometrics 23.

Tsou. 2006. “Robust Poisson Regression.” Journal of Statistical Planning and Inference.

Wedderburn. 1974. “Quasi-Likelihood Functions, Generalized Linear Models, and the Gauss—Newton Method.” Biometrika.

Xu, Caramanis, and Mannor. 2010. “Robust Regression and Lasso.” IEEE Transactions on Information Theory.

Yang, Tao, Gallagher, and McMahan. 2019. “A Robust Regression Methodology via M-Estimation.” Communications in Statistics - Theory and Methods.

Yang, Wenzhuo, and Xu. 2013. “A Unified Robust Regression Model for Lasso-Like Algorithms.” In ICML (3).