The Living Thing / Notebooks :

Deep learning as a dynamical system

Image credit: Donny Darko

A recurring movement within deep learning research which tries to render the learning of prediction functions tractable by considering them as dynamical systems, and using the theory of stability in the context of Hamiltonians,r optimal control and/or ODE solvers, to make it all work.

I’ve been interested by this since seeing the Haber and Ruthotto paper, but it’s got a real kick recently since the Vector Institute team’s paper won the prize at NeurIPS for learning the ODEs themselves.

Stability of training

Related, but not quite the same, notion of stability, as in data-stability in learning. Arguing that neural networks are in the limit approximants to quadrature solutions of certain ODES, work and gain insights and new tricks into neural nets by using ODE tricks.. This is mostly what Haber and Rhutthoto et al do. ([Haber, Ruthotto, Holtham, & Jun, 2017][#HRHJ17], [Haber, Lucka, & Ruthotto, 2018][#HaLR18], [Ruthotto & Haber, 2018][#RuHa18])

Can it work on time series?

Good question; It looks like it should, since there is an implicit time series the ODE-solver. But these problems so far have use non-time-series data.

Neural ODE regression

By which I mean learning an ODE whose solution is the regression problem. This is what e.g. the famouse Vector Institute paper did, although I’m not sure its quire as novel as they imply. There are various laypersons’ introductions to this, including the simple and practical magical take in julia.

There are some syntheses of these approaches that try to do everything with ODEs, all the time. [Niu, Horesh, & Chuang, 2019][#NiHC19], [Rackauckas et al., 2018][#RMDG18], and even some tutorial implementations by the indefatigable Chris Rackauckas.

I’m particularly interested on jump ODE regression

Random stuff

My question: How can this be made Bayesian? Priors on dynamics, posterior uncertainties etc.

TBC. Lyapunov analysis, Hamiltonian dynamics.

Refs

Anil, Cem, James Lucas, and Roger Grosse. 2018. “Sorting Out Lipschitz Function Approximation,” November. https://arxiv.org/abs/1811.05381v1.

Arjovsky, Martin, Amar Shah, and Yoshua Bengio. 2016. “Unitary Evolution Recurrent Neural Networks.” In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, 1120–8. ICML’16. New York, NY, USA: JMLR.org. http://arxiv.org/abs/1511.06464.

Babtie, Ann C., Paul Kirk, and Michael P. H. Stumpf. 2014. “Topological Sensitivity Analysis for Systems Biology.” Proceedings of the National Academy of Sciences 111 (52): 18507–12. https://doi.org/10.1073/pnas.1414026112.

Chang, Bo, Minmin Chen, Eldad Haber, and Ed H. Chi. 2019. “AntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks.” In Proceedings of ICLR. http://arxiv.org/abs/1902.09689.

Chang, Bo, Lili Meng, Eldad Haber, Lars Ruthotto, David Begert, and Elliot Holtham. 2018. “Reversible Architectures for Arbitrarily Deep Residual Neural Networks.” In. http://arxiv.org/abs/1709.03698.

Chang, Bo, Lili Meng, Eldad Haber, Frederick Tung, and David Begert. 2018. “Multi-Level Residual Networks from Dynamical Systems View.” In PRoceedings of ICLR. http://arxiv.org/abs/1710.10348.

Chen, Tianqi, Ian Goodfellow, and Jonathon Shlens. 2015. “Net2Net: Accelerating Learning via Knowledge Transfer,” November. http://arxiv.org/abs/1511.05641.

Chen, Tian Qi, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. 2018. “Neural Ordinary Differential Equations.” In Advances in Neural Information Processing Systems 31, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 6572–83. Curran Associates, Inc. http://papers.nips.cc/paper/7892-neural-ordinary-differential-equations.pdf.

E, Weinan. 2017. “A Proposal on Machine Learning via Dynamical Systems.” Communications in Mathematics and Statistics 5 (1): 1–11. https://doi.org/10.1007/s40304-017-0103-z.

E, Weinan, Jiequn Han, and Qianxiao Li. 2018. “A Mean-Field Optimal Control Formulation of Deep Learning,” July. http://arxiv.org/abs/1807.01083.

Gholami, Amir, Kurt Keutzer, and George Biros. 2019. “ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs,” February. http://arxiv.org/abs/1902.10298.

Haber, Eldad, Felix Lucka, and Lars Ruthotto. 2018. “Never Look Back - A Modified EnKF Method and Its Application to the Training of Neural Networks Without Back Propagation,” May. http://arxiv.org/abs/1805.08034.

Haber, Eldad, and Lars Ruthotto. 2018. “Stable Architectures for Deep Neural Networks.” Inverse Problems 34 (1): 014004. https://doi.org/10.1088/1361-6420/aa9a90.

Haber, Eldad, Lars Ruthotto, Elliot Holtham, and Seong-Hwan Jun. 2017. “Learning Across Scales - A Multiscale Method for Convolution Neural Networks,” March. http://arxiv.org/abs/1703.02009.

Han, Jiequn, Arnulf Jentzen, and Weinan E. 2018. “Solving High-Dimensional Partial Differential Equations Using Deep Learning.” Proceedings of the National Academy of Sciences 115 (34): 8505–10. https://doi.org/10.1073/pnas.1718942115.

Hardt, Moritz, Benjamin Recht, and Yoram Singer. 2015. “Train Faster, Generalize Better: Stability of Stochastic Gradient Descent,” September. http://arxiv.org/abs/1509.01240.

Haro, A. 2008. “Automatic Differentiation Methods in Computational Dynamical Systems: Invariant Manifolds and Normal Forms of Vector Fields at Fixed Points.” IMA Note. http://www.maia.ub.es/~alex/admcds/admcds.pdf.

He, Junxian, Daniel Spokoyny, Graham Neubig, and Taylor Berg-Kirkpatrick. 2019. “Lagging Inference Networks and Posterior Collapse in Variational Autoencoders.” In PRoceedings of ICLR. http://arxiv.org/abs/1901.05534.

Jing, Li, Yichen Shen, Tena Dubcek, John Peurifoy, Scott Skirlo, Yann LeCun, Max Tegmark, and Marin Soljačić. 2017. “Tunable Efficient Unitary Neural Networks (EUNN) and Their Application to RNNs.” In PMLR, 1733–41. http://proceedings.mlr.press/v70/jing17a.html.

Liu, Hanxiao, Karen Simonyan, and Yiming Yang. 2018. “DARTS: Differentiable Architecture Search,” June. http://arxiv.org/abs/1806.09055.

Meng, Qi, Yue Wang, Wei Chen, Taifeng Wang, Zhi-Ming Ma, and Tie-Yan Liu. 2016. “Generalization Error Bounds for Optimization Algorithms via Stability.” In, 10:441–74. http://arxiv.org/abs/1609.08397.

Mhammedi, Zakaria, Andrew Hellicar, Ashfaqur Rahman, and James Bailey. 2017. “Efficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections.” In PMLR, 2401–9. http://proceedings.mlr.press/v70/mhammedi17a.html.

Niu, Murphy Yuezhen, Lior Horesh, and Isaac Chuang. 2019. “Recurrent Neural Networks in the Eye of Differential Equations,” April. http://arxiv.org/abs/1904.12933.

Rackauckas, Christopher. 2019. “The Essential Tools of Scientific Machine Learning (Scientific ML).” The Winnower, August. https://doi.org/10.15200/winn.156631.13064.

Rackauckas, Christopher, Yingbo Ma, Vaibhav Dixit, Xingjian Guo, Mike Innes, Jarrett Revels, Joakim Nyberg, and Vijay Ivaturi. 2018. “A Comparison of Automatic Differentiation and Continuous Sensitivity Analysis for Derivatives of Differential Equation Solutions,” December. http://arxiv.org/abs/1812.01892.

Ruthotto, Lars, and Eldad Haber. 2018. “Deep Neural Networks Motivated by Partial Differential Equations,” April. http://arxiv.org/abs/1804.04272.

Wiatowski, Thomas, and Helmut Bölcskei. 2015. “A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction.” In Proceedings of IEEE International Symposium on Information Theory. http://arxiv.org/abs/1512.06293.

Wiatowski, Thomas, Philipp Grohs, and Helmut Bölcskei. 2018. “Energy Propagation in Deep Convolutional Neural Networks.” IEEE Transactions on Information Theory 64 (7): 1–1. https://doi.org/10.1109/TIT.2017.2756880.