# Frequentist properties of Bayesian methods

## Bayesian consistency

Life is short. You want to use some tasty tool, such as a hierarchical model without anyone getting cross at you for apostasy? Why not use whatever estimator works, and then show that it works on both frequentist and Bayesian grounds?

• Shalizi’s overview

There is a basic result here, due to Doob, which essentially says that the Bayesian learner is consistent, except on a set of data of prior probability zero. That is, the Bayesian is subjectively certain they will converge on the truth. This is not as reassuring as one might wish, and showing Bayesian consistency under the true distribution is harder. In fact, it usually involves assumptions under which non-Bayes procedures will also converge. […]

Concentration of the posterior around the truth is only a preliminary. One would also want to know that, say, the posterior mean converges, or even better that the predictive distribution converges. For many finite-dimensional problems, what’s called the “Bernstein-von Mises theorem” basically says that the posterior mean and the maximum likelihood estimate converge, so if one works the other will too. This breaks down for infinite-dimensional problems.

## Regularisation and priors

An excellent answer by Tymoteusz Wołodźko must be in the running for punchiest summary ever, made precise by Andrew Milne.

Question: What do nonconvex regularizers look like in a Bayesian context, and are they an argument for Bayesian sampling from the posterior rather than the frequntist’s NP-hard optimum search? And what does, e.g. the GJPS08’s recommended alternative Cauchy prior look like?

## Refs

BaBe04
Bayarri, M. J., & Berger, J. O.(2004) The Interplay of Bayesian and Frequentist Analysis. Statistical Science, 19(1), 58–80. DOI.
DiFr86
Diaconis, P., & Freedman, D. (1986) On the Consistency of Bayes Estimates. The Annals of Statistics, 14(1), 1–26.
Doob49
Doob, J. L.(1949) Application of the theory of martingales. In Le Calcul des Probabilités et ses Applications (pp. 23–27). Centre National de la Recherche Scientifique, Paris
Efro12
Efron, B. (2012) Bayesian inference and the parametric bootstrap. The Annals of Applied Statistics, 6(4), 1971–1997. DOI.
Efro15
Efron, B. (2015) Frequentist accuracy of Bayesian estimates. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77(3), 617–646. DOI.
Free99
Freedman, D. (1999) Wald Lecture: On the Bernstein-von Mises theorem with infinite-dimensional parameters. The Annals of Statistics, 27(4), 1119–1141. DOI.
GJPS08
Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y.-S. (2008) A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 2(4), 1360–1383. DOI.
Nick14
Nickl, R. (2014) Discussion of: “Frequentist coverage of adaptive nonparametric Bayesian credible sets”. arXiv:1410.7600 [Math, Stat].
Nort84
Norton, R. M.(1984) The Double Exponential Distribution: Using Calculus to Find a Maximum Likelihood Estimator. The American Statistician, 38(2), 135–136. DOI.
Shal09
Shalizi, C. R.(2009) Dynamics of Bayesian updating with dependent data and misspecified models. Electronic Journal of Statistics, 3, 1039–1074. DOI.
Sims10
Sims, C. (2010) Understanding non-bayesians. Unpublished Chapter, Department of Economics, Princeton University.
SzVZ13
Szabo, B., van der Vaart, A., & van Zanten, H. (2013) Frequentist coverage of adaptive nonparametric Bayesian credible sets. arXiv:1310.4489 [Math, Stat].
Tibs96
Tibshirani, R. (1996) Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267–288.
Valp11
Valpine, P. de. (2011) Frequentist analysis of hierarchical models for population dynamics and demographic data. Journal of Ornithology, 152(2), 393–408. DOI.