The Living Thing / Notebooks : Effective sample size

Statistics

When your experiment design (e.g. because it is a time series, or because of non-random sampling) is highly correlated, your data might give you less information than you hope, or expect from the uncorrelated case.

This is a kind of dual to effective degrees of freedom, which tells you how far your sample size can get you.

Turns out to be important in, e.g. LASSO and, circularly, covariance estimation.

If you maximize effective sample size with your design, you are doing importance sampling, I guess? Sebastian Nowozin makes this clear in excellent blog post.

Huber’s (1981) “equivalent number of observations” is probably the same?

Monte Carlo estimation

Related, but a different definition. Not about experimental samples, but number of simulations in simulation-based inference where you are using importance sampling. e.g. Sebastian Nowozin, Effective Sample Size in Importance Sampling.

[The effective sample size] can be used after or during importance sampling to provide a quantitative measure of the quality of the estimated mean. Even better, the estimate is provided on a natural scale of worth in samples from p, that is, if we use \(n=1000\) samples \(X_i\sim q\) and obtain an ESS of say 350 then this indicates that the quality of our estimate is about the same as if we would have used 350 direct samples[…]

Refs

Kong92
Kong, A. (1992) A note on importance sampling using standardized weights.
Lent01
Lenth, R. V.(2001) Some Practical Guidelines for Effective Sample Size Determination. The American Statistician, 55(3), 187–193. DOI.
Liu96
Liu, J. S.(1996) Metropolized independent sampling with comparisons to rejection sampling and importance sampling. Statistics and Computing, 6(2), 113–119. DOI.
ThZw84
Thiébaux, H. J., & Zwiers, F. W.(1984) The Interpretation and Estimation of Effective Sample Size. Journal of Climate and Applied Meteorology, 23(5), 800–811.