The Living Thing / Notebooks :

Model/hyperparameter selection

Choosing which of an ensemble of models to use, or, which is the same things more or less, the number of predictors, or the regularisation. This is a kind of complement to statistical learning theory where you hope to quantify how complicated a model you should bother fitting to a given amount of data.

If your predictors are discrete and small in number, you can do this in the traditional fashion, by stepwise model selection, and you might discuss the degrees of freedom of the model and the data. If you are in the luxurious position of having a small tractable number of parameters and the ability to perform controlled trials, then you do ANOVA.

When you have penalisation parameters, we sometimes phrase this as regularisation and talk about regularisation parameter selection, or hyperparameter selection, which we can do in various ways. Methods for this include degrees-of-freedom penalties, cross-validation etc.. However, I’m not yet sure how to make that work in sparse regression.

Multiple testing is model selection writ large, where you can considering many hypothesis tests, possible effectively infinitely many hypothesis tests, or you have a combinatorial explosion of possible predictors to include.

TODO: document connection with graphical models and thus conditional independence tests.


Bayesian model selection is also a thing, although the framing is a little different. In the classic Bayesian methodI keep all my models about although some might become very unlikely. But apparently I can also throw some out entirely? Presumably for reasons of computational tractability or what-have-you.


If the model order itself is the parameter of interest, how do you do consistent inference on that? AIC, for example, is derived for optimising prediction loss, not model selection. (Doesn’t BIC do better?)

An exhausting, exhaustive review of various model selection procedures with an eye to consistency, is given in RaWu01.

Cross validation

See cross validation.

For densities

See density model selection.

Under sparsity

See sparse model selection.

How do you choose your hyperparameters? NB hyperparameters might not always be about model selection per se; there are also ones that are about, e.g. convergence rate. Anyway. Also one could well regard hyperparameters as normal parameters.

Turns out you can cast this as a bandit problem, or a sequential Bayesian optimisation problem.

For time series

See model secltion in time series.