Second order optimisation that does not require the hessian matrix to be given explicitly.
Notes here iff I need 'em.
Andre Gibiansky's example for coders.
- LeJH15: (2015) A Simple Way to Initialize Recurrent Networks of Rectified Linear Units. ArXiv:1504.00941 [Cs].
- Mart10: (2010) Deep Learning via Hessian-free Optimization. In Proceedings of the 27th International Conference on International Conference on Machine Learning (pp. 735–742). USA: Omnipress
- BaGM16: (2016) Distributed Second-Order Optimization using Kronecker-Factored Approximations.
- Schr02: (2002) Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent. Neural Computation, 14(7), 1723–1738. DOI
- ChDL15: (2015) Hessian-free Optimization for Learning Deep Multidimensional Recurrent Neural Networks. In Advances In Neural Information Processing Systems.
- MaSu11: (2011) Learning Recurrent Neural Networks with Hessian-free Optimization. In Proceedings of the 28th International Conference on International Conference on Machine Learning (pp. 1033–1040). USA: Omnipress
- BoBV12: (2012) Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription. In 29th International Conference on Machine Learning.
- BoCN16: (2016) Optimization Methods for Large-Scale Machine Learning. ArXiv:1606.04838 [Cs, Math, Stat].
- MaSu12: (2012) Training deep and recurrent networks with hessian-free optimization. In Neural networks: Tricks of the trade (pp. 479–535). Springer
- Suts13: (2013) Training recurrent neural networks