Estimating a quantity by choosing it to be the extremum of a function, or, if it’s well-behaved enough, a zero of its derivative.
Very popular with machine learning, where loss-function based methods are ubiquitous. In statistics we see this implicitly in maximum likelihood estimation and robust estimation, and least squares loss, for which M-estimation provides a unifying formalism based on asymptotic theory.
TODO: Discuss large sample theory influence function motivation.
Robust Loss functions
Discuss representation (and implementation) in terms of weight functions for least-squares loss.
Mallows, Schweppe etc.
- Barndorff-Nielsen, O. (1983) On a formula for the distribution of the maximum likelihood estimator. Biometrika, 70(2), 343–365. DOI.
- Bühlmann, P. (2014) Robust Statistics. In J. Fan, Y. Ritov, & C. F. J. Wu (Eds.), Selected Works of Peter J. Bickel (pp. 51–98). Springer New York
- Donoho, D., & Montanari, A. (2013) High Dimensional Robust M-Estimation: Asymptotic Variance via Approximate Message Passing. arXiv:1310.7320 [Cs, Math, Stat].
- Hampel, F. R.(1974) The Influence Curve and its Role in Robust Estimation. Journal of the American Statistical Association, 69(346), 383–393. DOI.
- Huber, P. J.(1964) Robust Estimation of a Location Parameter. The Annals of Mathematical Statistics, 35(1), 73–101. DOI.
- Mondal, D., & Percival, D. B.(2010) M-estimation of wavelet variance. Annals of the Institute of Statistical Mathematics, 64(1), 27–53. DOI.
- Ronchetti, E. (2000) Robust Regression Methods and Model Selection. In A. Bab-Hadiashar & D. Suter (Eds.), Data Segmentation and Model Selection for Computer Vision (pp. 31–40). Springer New York
- Tharmaratnam, K., & Claeskens, G. (2013) A comparison of robust versions of the AIC based on M-, S- and MM-estimators. Statistics, 47(1), 216–235. DOI.
- van de Geer, S. (2014) Worst possible sub-directions in high-dimensional models. In arXiv:1403.7023 [math, stat] (Vol. 131).