There is a whole cottage industry in showing neural networks are reasonably universal function approximators with fairly general nonlinearities as activations, under fairly general conditions. Nonetheless, you might like to play with the precise form of the nonlinearities, even making them themselves directly learnable, because some function shapes might have better approximation properties in a sense I will not trouble to make rigorous now, vague hand-waving arguments being the whole point of deep learning.
Nonetheless, here are a some handy references.
- Agostinelli, F., Hoffman, M., Sadowski, P., & Baldi, P. (2015) Learning Activation Functions to Improve Deep Neural Networks. In Proceedings of International Conference on Learning Representations (ICLR) 2015.
- Glorot, X., & Bengio, Y. (2010) Understanding the difficulty of training deep feedforward neural networks. In Aistats (Vol. 9, pp. 249–256).
- Glorot, X., Bordes, A., & Bengio, Y. (2011) Deep Sparse Rectifier Neural Networks. In Aistats (Vol. 15, p. 275).
- Goodfellow, I. J., Warde-Farley, D., Mirza, M., Courville, A., & Bengio, Y. (2013) Maxout Networks. In ICML (3) (Vol. 28, pp. 1319–1327).
- Maas, A. L., Hannun, A. Y., & Ng, A. Y.(2013) Rectifier nonlinearities improve neural network acoustic models. In Proceedings of ICML (Vol. 30).