The Living Thing / Notebooks : Large sample theory

Delta methods, influence functions, and so on. A convenient feature of M-estimation, and especially maximum likelihood esteimation is simple behaviour of estimators in the asymptotic large-sample-size limit, which can give you, e.g. variance estimates, or motivate information criteria, or robust statistics, optimisation etc.

The convolution theorem fits here:

Suppose \(\hat{\theta}\) is an efficient estimator of \(\theta\) and \(\tilde{\theta}\) is another, not fully efficient, estimator. The convolution theorem says that, if you rule out stupid exceptions, asymptotically \(\tilde{\theta} = \hat{\theta} + \varepsilon\) where \(\varepsilon\) is pure noise, independent of \(\hat{\theta}.\)

The reason that’s almost obvious is that if it weren’t true, there would be some information about \(\theta\) in \(\tilde{\theta}-\hat{\theta}\), and you could use this information to get a better estimator than \(\hat{\theta}\), which (by assumption) can’t happen. The stupid exceptions are things like the Hodges superefficient estimator that do better at a few values of \(\hat{\theta}\) but much worse at neighbouring values.

Fisher Information

Used in ML theory and kinda-sorta in robust estimation A matrix that tells you how much a new datum affects your parameter estimates. (It is related, I am told, to garden variety Shannon information, and when that non-obvious fact is more clear to me I shall expand how precisely this is so.)