# Robust statistics

Techniques to improve the failure modes of your estimates. Surprisingly rarely used despite being fairly straightforward.

This is more-or-less a frequentist project.

Bayesians seem to claim to achieve robustness largely by choosing heavy-tailed priors where they might have chosen light-tailed ones, e.g. Laplacian priors instead of Gaussian ones. Such priors may have arbitrary parameters, but not more arbitrary than usual in Bayesian statistics and therefore do not attract so much need to rationalise away the guilt.

## TODO

• relation to penalized regression.

• connection with Lasso.

• Beran's Hellinger-ball contamination model, which I also don't yet understand.

• Breakdown point explanation

• glm connection.

## Corruption models

• (Adversarial) total variation $\epsilon$-corruption.

• Random (mixture) corruption

• other?

## M-estimation with robust loss

The one that I, at least, would think of when considering robust estimation.

In M-estimation, instead of hunting an maximum of the likelihood function as you do in maximum likelihood, or an minimum of the sum of squared residuals, as you do in least-squares estimation, you minimised a specifically choses loss funciton for those residuals. You may select an objective function more robust to deviations between your model and reality. Credited to Huber (Hube64).

See M-estimation for the details

Aside: AFAICT, the definition of M-estimation includes the possibility that you could in principle select a less-robust loss function than least sum-of-squares or negative log likelihood, but I have not seen this in the literature. Generally, some robustified approach is presumed.

For M-estimation as robust estimation, various complications ensue, such as the different between noise in your predictors, noise in your regressors, and whether the “true” model is included in your class, and which of these difficulties you have resolved or not.

Loosely speaking, no, you haven't solved problems of noise in your predictors, only the problem of noise in your responses.

And the cost is that you now have a loss function with some extra arbitrary parameters in which you have to justify, which is anathema to frequentists, who like to claim to be less arbitrary than Bayesians. You then have to justify why you chose that loss function and its particular parameterisation. There are various procedures to choose these parameters, based on scale estimation.

TBD. Don't know

## Median-based estimators

Rousseeuw and Yohai's school. (RoYo84)

Many permutations on the theme here, but it rapidly gets complex. The only one of these families I have looked into are the near trivial cases of the Least Median Of Squares and Least Trimmed Squares estimations. (Rous84)

More broadly we should also consider S-estimators, which do something with… robust estimation of scale and using this to do robust estimation of location? TBD.

Theil-Sen-(Oja) estimators: Something about medians of inferred regression slopes. TBD.

Tukey median, and why no-one uses it what with it being NP-Hard.

## Others

RANSAC – some kind of randomised outlier detection estimator. TBD.