The Living Thing / Notebooks : Regularising neural networks and other overfitting hacks

Q: Which of these tricks can I apply outside of deep settings?

Early stopping

Noise layers

Dropout

Another type of noise layer.

Input perturbation

Parametric noise layer. If you are hip you will take this further and do it by…

Adversarial training

See adversarial learning.

Regularisation penalties

L_1, L_2, dropout… Seems to be applied to weights, but rarely to actual neurons.

see Compressing neural networks for that.

Refs

Bach14
Bach, F. (2014) Breaking the Curse of Dimensionality with Convex Neural Networks. arXiv:1412.8690 [Cs, Math, Stat].
DaYO16
Dasgupta, S., Yoshizumi, T., & Osogami, T. (2016) Regularized Dynamic Boltzmann Machine with Delay Pruning for Unsupervised Learning of Temporal Sequences. arXiv:1610.01989 [Cs, Stat].
Gal15
Gal, Y. (2015) A Theoretically Grounded Application of Dropout in Recurrent Neural Networks. arXiv:1512.05287 [Stat].
MoAV17
Molchanov, D., Ashukha, A., & Vetrov, D. (2017) Variational Dropout Sparsifies Deep Neural Networks. arXiv:1701.05369 [Cs, Stat].
PaDG16
Pan, W., Dong, H., & Guo, Y. (2016) DropNeuron: Simplifying the Structure of Deep Neural Networks. arXiv:1606.07326 [Cs, Stat].
SCHU16
Scardapane, S., Comminiello, D., Hussain, A., & Uncini, A. (2016) Group Sparse Regularization for Deep Neural Networks. arXiv:1607.00485 [Cs, Stat].
SrBa16
Srinivas, S., & Babu, R. V.(2016) Generalized Dropout. arXiv:1611.06791 [Cs].
XiLS16
Xie, B., Liang, Y., & Song, L. (2016) Diversity Leads to Generalization in Neural Networks. arXiv:1611.03131 [Cs, Stat].
ZBHR16
Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2016) Understanding deep learning requires rethinking generalization. arXiv:1611.03530 [Cs].