The Living Thing / Notebooks : Neural network activation functions

There is a whole cottage industry in showing neural networks are reasonably universal function approximators with fairly general nonlinearities as activations, under fairly general conditions. Nonetheless, you might like to play with the precise form of the nonlinearities, even making them themselves directly learnable, because some function shapes might have better approximation properties in a sense I will not trouble to make rigorous now, vague hand-waving arguments being the whole point of deep learning.

Nonetheless, here are a some handy references.

Refs

AHSB15
Agostinelli, F., Hoffman, M., Sadowski, P., & Baldi, P. (2015) Learning Activation Functions to Improve Deep Neural Networks. In Proceedings of International Conference on Learning Representations (ICLR) 2015.
GlBe10
Glorot, X., & Bengio, Y. (2010) Understanding the difficulty of training deep feedforward neural networks. In Aistats (Vol. 9, pp. 249–256).
GlBB11
Glorot, X., Bordes, A., & Bengio, Y. (2011) Deep Sparse Rectifier Neural Networks. In Aistats (Vol. 15, p. 275).
GWMC13
Goodfellow, I. J., Warde-Farley, D., Mirza, M., Courville, A., & Bengio, Y. (2013) Maxout Networks. In ICML (3) (Vol. 28, pp. 1319–1327).
MaHN13
Maas, A. L., Hannun, A. Y., & Ng, A. Y.(2013) Rectifier nonlinearities improve neural network acoustic models. In Proceedings of ICML (Vol. 30).