# Large sample theory

Delta methods, influence functions, and so on. A convenient feature of M-estimation, and especially maximum likelihood esteimation is simple behaviour of estimators in the asymptotic large-sample-size limit, which can give you, e.g. variance estimates, or motivate information criteria, or robust statistics, optimisation etc.

The convolution theorem fits here:

Suppose $\hat{\theta}$ is an efficient estimator of $\theta$ and $\tilde{\theta}$ is another, not fully efficient, estimator. The convolution theorem says that, if you rule out stupid exceptions, asymptotically $\tilde{\theta} = \hat{\theta} + \varepsilon$ where $\varepsilon$ is pure noise, independent of $\hat{\theta}.$

The reason that’s almost obvious is that if it weren’t true, there would be some information about $\theta$ in $\tilde{\theta}-\hat{\theta}$, and you could use this information to get a better estimator than $\hat{\theta}$, which (by assumption) can’t happen. The stupid exceptions are things like the Hodges superefficient estimator that do better at a few values of $\hat{\theta}$ but much worse at neighbouring values.

## Fisher Information

Used in ML theory and kinda-sorta in robust estimation A matrix that tells you how much a new datum affects your parameter estimates. (It is related, I am told, to garden variety Shannon information, and when that non-obvious fact is more clear to me I shall expand how precisely this is so.)