The Living Thing / Notebooks :

Model/hyperparameter selection

When you have a number predictors or regularisation terms in your model, you need to choose how many to use, based on the you have, and by these various models. This is a kind of complement to statistical learning theory where you hope to quantify how complicated a model you should bother fitting to a given amount of data.

If your predictors are discrete and small in number, you can do this in the traditional fashion, by stepwise model selection, and you might discuss the degrees of freedom of the model and the data. If you are in the luxurious position of having a small tractable number of parameters and the ability to perform controlled trials, then you do ANOVA.

When you have regularisation parameters, we tend to phrase this as smoothing and talk about smoothing parameter selection, which we can do in various ways. I’m fond of degrees-of-freedom penalties because they aren’t worse than cross-validation, but much quicker. However, I’m not yet sure how to make that work in sparse regression.

Multiple testing is model selection writ large, where you can considering many hypothesis tests, possible effectively infinitely many hypothesis tests, or you have a combinatorial explosion of possible predictors to include.

TODO: document connection with graphical models and thus conditional independence tests.

Bayesian model selection is also a thing, although the framing must be a little different, since in the Bayesian method in principle I keep all my models about and then weight them; but we still might wish to discard some models for reasons of computational tractability or what-have-you.


If the model order itself is the parameter of interest, how do you do consistent inference of that?

An exhausting, exhaustive review of various model selection procedures with an eye to consistency, is given in RaWu01.

Cross validation

See cross validation.

For densities

See density model selection.

Under sparsity

See sparse model selection.

How do you choose your hyperparameters? NB hyperparameters might not always be about model selection per se; there are also ones that are about, e.g. convergence rate. Anyway. Also one could well regard hyperparameters as normal parameters.

Turns out you can cast this as a bandit problem, or a sequential Bayesian optimisation problem.


…means not quite the same thing: Bayesian model selection.