# Penalised/regularised regression

Regression estimation with penalties on the model parameters. I am especially interested when the penalties are sparsifying penalties, and I have more notes to sparse regression.

Here I consider general penalties: ridge etc. At least in principle – I have no active projects using penalties without sparsifying them at the moment.

Why might I use such penalties? One reason would be that $$L_2$$ penalties have simple forms for their information criteria, as shown by Konishi and Kitagawa (KoKi08 5.2.4).

To discuss:

Ridge penalties, relationship with robust regression, statistical learning theory etc.

In nonparametric statistics we might estimate simultaneously what look like many, many parameters, which we constrain in some clever fashion, which usually boils down to something we can interpret as a “penalty” on the parameters.

“Penalization” has a genealogy unknown to me, but is probably the least abstruse for common, general usage.

The “regularisation” nomenclature claims descent from Tikhonov, (eg TiGl65 etc) who wanted to solve ill-conditioned integral and differential equations, so it’s somewhat more general. “Smoothing” seems to be common in the spline and kernel estimate communities of Wahba (Wahb90) and Silverman (Silv82) et al, who usually actually want to smooth curves. When you say “smoothing” you usually mean that you can express your predictions as a “linear smoother”/hat matrix, which has certain nice properties in generalised cross validation.

“smoothing” is not a great general term, since penalisation does not necessarily cause “smoothness” – for example, some penalties cause the coefficients to become sparse and therefore, from the perspective of coefficients, it promotes non-smooth vectors.

In every case, you wish to solve an ill-conditioned inverse problem, so you tame it by adding a penalty to solutions you feel one should be reluctant to accept.

TODO: specifics