The Living Thing / Notebooks : Model/hyperparameter selection

When you have a discrete number of predictors or terms in your model, you need to choose how many to use, based on the amount of data you have, and how well it is explained by these various models. This is a kind of dual to statistical learning theory where you hope to quantify how complicated a model you should bother fitting to a given amount of data.

If your predictors are discrete and small in number, you can do this in the traditional fashion, by stepwise model selection, and you might discuss the degrees of freedom of the model and the data. If you are in the luxurious position of having a small tractable number of parameters and the ability to perform controlled trials, then you do ANOVA.

When you have regularisation parameters, we tend to phrase this as smoothing and talk about smoothing parameter selection, which we can do in various ways. I’m fond of generalised infomration criteria because they aren’t worse than cross-validation, but much quicker. However, I’m not yet sure how to make that work in sparse regression.

Multiple testing is model selection writ large, where you can considering many hypothesis tests, possible effectively infinitely many hypothesis tests, or you have a combinatorial explosion of possible predictors to include.

TODO: document connection with graphical models and thus conditional independence tests.

Consistency

If the model order itself is the parameter of interest, how do you do consistent inference of that?

An exhausting, exhaustive review of various model selection procedures with an eye to consistency, is given in RaWu01.

Cross validation

See cross validation.

Under sparsity

Fiddly. See sparse model selection.

Reads

BeGa09
Benjamini, Y., & Gavrilov, Y. (2009) A simple forward selection procedure based on false discovery rate control. The Annals of Applied Statistics, 3(1), 179–198. DOI.
BLZS15
Bloniarz, A., Liu, H., Zhang, C.-H., Sekhon, J., & Yu, B. (2015) Lasso adjustments of treatment effect estimates in randomized experiments. arXiv:1507.03652 [math, Stat].
BüKü99
Bühlmann, P., & Künsch, H. R.(1999) Block length selection in the bootstrap for time series. Computational Statistics & Data Analysis, 31(3), 295–310. DOI.
BuNo95
Burman, P., & Nolan, D. (1995) A general Akaike-type criterion for model selection in robust regression. Biometrika, 82(4), 877–886. DOI.
BuAn02
Burnham, K. P., & Anderson, D. R.(2002) Model selection and multimodel inference: a practical information-theoretic approach. (2nd ed.). New York: Springer
CaTa10
Cawley, G. C., & Talbot, N. L. C.(2010) On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation. Journal of Machine Learning Research, 11, 2079−2107.
ClHj08
Claeskens, G., & Hjort, N. L.(2008) Model selection and model averaging. . Cambridge ; New York: Cambridge University Press
GeHw82
Geman, S., & Hwang, C.-R. (1982) Nonparametric Maximum Likelihood Estimation by the Method of Sieves. The Annals of Statistics, 10(2), 401–414. DOI.
GuEl03
Guyon, I., & Elisseeff, A. (2003) An Introduction to Variable and Feature Selection. Journal of Machine Learning Research, 3(Mar), 1157–1182.
HMCH08
Hong, X., Mitchell, R. J., Chen, S., Harris, C. J., Li, K., & Irwin, G. W.(2008) Model selection approaches for non-linear system identification: a review. International Journal of Systems Science, 39(10), 925–946. DOI.
JaTa15
Jamieson, K., & Talwalkar, A. (2015) Non-stochastic Best Arm Identification and Hyperparameter Optimization. arXiv:1502.07943 [cs, Stat].
JaFH13
Janson, L., Fithian, W., & Hastie, T. (2013) Effective Degrees of Freedom: A Flawed Metaphor. arXiv:1312.7851 [stat].
KlRB10
Kloft, M., Rückert, U., & Bartlett, P. L.(2010) A Unifying View of Multiple Kernel Learning. In J. L. Balcázar, F. Bonchi, A. Gionis, & M. Sebag (Eds.), Machine Learning and Knowledge Discovery in Databases (pp. 66–81). Springer Berlin Heidelberg DOI.
KoKi96
Konishi, S., & Kitagawa, G. (1996) Generalised information criteria in model selection. Biometrika, 83(4), 875–890. DOI.
KoKi08
Konishi, S., & Kitagawa, G. (2008) Information criteria and statistical modeling. . New York: Springer
LJDR16
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., & Talwalkar, A. (2016) Efficient Hyperparameter Optimization and Infinitely Many Armed Bandits. arXiv:1603.06560 [cs, Stat].
PaSa14
Paparoditis, E., & Sapatinas, T. (2014) Bootstrap-based testing for functional data. arXiv:1409.4317 [math, Stat].
Qian96
Qian, G. (1996) On model selection in robust linear regression.
QiHa96
Qian, G., & Hans, R. K.(1996) Some notes on Rissanen’s stochastic complexity.
QiKü98
Qian, G., & Künsch, H. R.(1998) On model selection via stochastic complexity in robust linear regression. Journal of Statistical Planning and Inference, 75(1), 91–116. DOI.
RaWu01
Rao, C. R., & Wu, Y. (2001) On model selection. In Institute of Mathematical Statistics Lecture Notes - Monograph Series (Vol. 38, pp. 1–57). Beachwood, OH: Institute of Mathematical Statistics
RaWu89
Rao, R., & Wu, Y. (1989) A strongly consistent procedure for model selection in a regression problem. Biometrika, 76(2), 369–374. DOI.
Ronc00
Ronchetti, E. (2000) Robust Regression Methods and Model Selection. In A. Bab-Hadiashar & D. Suter (Eds.), Data Segmentation and Model Selection for Computer Vision (pp. 31–40). Springer New York
Roya86
Royall, R. M.(1986) Model Robust Confidence Intervals Using Maximum Likelihood Estimators. International Statistical Review / Revue Internationale de Statistique, 54(2), 221–226. DOI.
Shao96
Shao, J. (1996) Bootstrap Model Selection. Journal of the American Statistical Association, 91(434), 655–665. DOI.
Shib89
Shibata, R. (1989) Statistical Aspects of Model Selection. In P. J. C. Willems (Ed.), From Data to Model (pp. 215–240). Springer Berlin Heidelberg
Ston77
Stone, M. (1977) An Asymptotic Equivalence of Choice of Model by Cross-Validation and Akaike’s Criterion. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 44–47.
Take76
Takeuchi, K. (1976) Distribution of informational statistics and a criterion of model fitting. Suri-Kagaku (Mathematical Sciences), 153(1), 12–18.
TLTT14
Taylor, J., Lockhart, R., Tibshirani, R. J., & Tibshirani, R. (2014) Exact Post-selection Inference for Forward Stepwise and Least Angle Regression. arXiv:1401.3889 [stat].
ThCl13
Tharmaratnam, K., & Claeskens, G. (2013) A comparison of robust versions of the AIC based on M-, S- and MM-estimators. Statistics, 47(1), 216–235. DOI.
TRTW15
Tibshirani, R. J., Rinaldo, A., Tibshirani, R., & Wasserman, L. (2015) Uniform Asymptotic Inference and the Bootstrap After Model Selection. arXiv:1506.06266 [math, Stat].
VaBC12
Vansteelandt, S., Bekaert, M., & Claeskens, G. (2012) On model selection and model misspecification in causal inference. Statistical Methods in Medical Research, 21(1), 7–30. DOI.