Frequentist model selection is not the only type, but I know less about Bayesian model selection. What is model selection in a Bayesian context? Surely you don’t ever get some models with zero posterior probability? In my intro Bayesian classes I learned that one simply keeps all the models weighted by posterior likelihood when making predictions. But sometimes we wish to get rid of some models. When does this work, and when not? Typically this seems to be done by comparing model marginal evidence.
Interesting special case: Bayesian sparsity.
Cross-validation and Bayes
There is a relation between cross-validation and Bayes evidence, a.k.a. marginal likelihood - see (Claeskens and Hjort 2008; Fong and Holmes 2019).
Chipman, Hugh, Edward I. George, Robert E. McCulloch, and P Lahiri. 2001. “The Practical Implementation of Bayesian Model Selection.” In Model Selection. Vol. 38. IMS Lecture Notes - Monograph Series. Beachwood, OH: Institute of Mathematical Statistics. https://doi.org/10.1214/lnms/1215540964.
Claeskens, Gerda, and Nils Lid Hjort. 2008. Model Selection and Model Averaging. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge ; New York: Cambridge University Press.
Efron, Bradley. 2012. “Bayesian Inference and the Parametric Bootstrap.” The Annals of Applied Statistics 6 (4): 1971–97. https://doi.org/10.1214/12-AOAS571.
Fong, Edwin, and Chris Holmes. 2019. “On the Marginal Likelihood and Cross-Validation,” May. http://arxiv.org/abs/1905.08737.
Ishwaran, Hemant, and J. Sunil Rao. 2005. “Spike and Slab Variable Selection: Frequentist and Bayesian Strategies.” The Annals of Statistics 33 (2): 730–73. https://doi.org/10.1214/009053604000001147.
Kadane, Joseph B., and Nicole A. Lazar. 2004. “Methods and Criteria for Model Selection.” Journal of the American Statistical Association 99 (465): 279–90. https://doi.org/10.1198/016214504000000269.
Li, Meng, and David B. Dunson. 2016. “A Framework for Probabilistic Inferences from Imperfect Models,” November. http://arxiv.org/abs/1611.01241.
MacKay, David JC. 1999. “Comparison of Approximate Methods for Handling Hyperparameters.” Neural Computation 11 (5): 1035–68. https://doi.org/10.1162/089976699300016331.
Mackay, David J. C. 1995. “Probable Networks and Plausible Predictions — a Review of Practical Bayesian Methods for Supervised Neural Networks.” Network: Computation in Neural Systems 6 (3): 469–505. https://doi.org/10.1088/0954-898X_6_3_011.
Ormerod, John T., Michael Stewart, Weichang Yu, and Sarah E. Romanes. 2017. “Bayesian Hypothesis Tests with Diffuse Priors: Can We Have Our Cake and Eat It Too?” October. http://arxiv.org/abs/1710.09146.
Piironen, Juho, and Aki Vehtari. 2017. “Comparison of Bayesian Predictive Methods for Model Selection.” Statistics and Computing 27 (3): 711–35. https://doi.org/10.1007/s11222-016-9649-y.
Ročková, Veronika, and Edward I. George. 2018. “The Spike-and-Slab LASSO.” Journal of the American Statistical Association 113 (521): 431–44. https://doi.org/10.1080/01621459.2016.1260469.
Stein, Michael L. 2008. “A Modeling Approach for Large Spatial Datasets.” Journal of the Korean Statistical Society 37 (1): 3–10. https://doi.org/10.1016/j.jkss.2007.09.001.
Vehtari, Aki, and Janne Ojanen. 2012. “A Survey of Bayesian Predictive Methods for Model Assessment, Selection and Comparison.” Statistics Surveys 6: 142–228. https://doi.org/10.1214/12-SS102.