A recurring movement within neural network learning research which tries to render the learning of prediction functions tractable by considering them as dynamical systems, and using the theory of stability in the context of Hamiltonians, optimal control and/or ODE solvers, to make it all work.

Iâ€™ve been interested by this since seeing the (Haber and Ruthotto 2018) paper, but itâ€™s got a real kick recently since the (Chen et al. 2018) won the prize at NeurIPS for learning the ODEs themselves.

Coming from the ODE side, Chris Rackauckasâ€™ lecture notes christen this development â€śscientific machine learningâ€ť.

## Convnets/Resnets as discrete ODE approximations

Arguing that neural networks are in the limit approximants to quadrature solutions of certain ODES, work and gain insights and new tricks into neural nets by using ODE tricks. This is mostly what Haber and Rhutthoto et al do. â€śStability of trainingâ€ť is a useful outcome here. Related, but not quite the same, notion of stability, as in input-stability in learning. (Haber and Ruthotto 2018; Haber et al. 2017; Chang, Meng, Haber, Ruthotto, et al. 2018; Ruthotto and Haber 2018)

## Neural ODE regression

By which I mean *learning an ODE/SDE whose solution is the regression problem*. This is what e.g.Â the famous Vector Institute paper (Chen et al. 2018) did, although Iâ€™m not sure its quite as novel as they imply, since it does

*look*like earlier work at first glance. There are various laypersonsâ€™ introductions/ tutorials in this area, including the simple and practical magical take in julia.

Iâ€™m particularly interested on jump ODE regression.

There are syntheses of these approaches that try to do everything with ODEs, all the time. (Rackauckas et al. 2018; Niu, Horesh, and Chuang 2019), and even some tutorial implementations by the indefatigable Chris Rackauckas, and a whole MIT course.

# Refs

Anil, Cem, James Lucas, and Roger Grosse. 2018. â€śSorting Out Lipschitz Function Approximation,â€ť November. https://arxiv.org/abs/1811.05381v1.

Arjovsky, Martin, Amar Shah, and Yoshua Bengio. 2016. â€śUnitary Evolution Recurrent Neural Networks.â€ť In *Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48*, 1120â€“8. ICMLâ€™16. New York, NY, USA: JMLR.org. http://arxiv.org/abs/1511.06464.

Babtie, Ann C., Paul Kirk, and Michael P. H. Stumpf. 2014. â€śTopological Sensitivity Analysis for Systems Biology.â€ť *Proceedings of the National Academy of Sciences* 111 (52): 18507â€“12. https://doi.org/10.1073/pnas.1414026112.

Chang, Bo, Minmin Chen, Eldad Haber, and Ed H. Chi. 2019. â€śAntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks.â€ť In *Proceedings of ICLR*. http://arxiv.org/abs/1902.09689.

Chang, Bo, Lili Meng, Eldad Haber, Lars Ruthotto, David Begert, and Elliot Holtham. 2018. â€śReversible Architectures for Arbitrarily Deep Residual Neural Networks.â€ť In. http://arxiv.org/abs/1709.03698.

Chang, Bo, Lili Meng, Eldad Haber, Frederick Tung, and David Begert. 2018. â€śMulti-Level Residual Networks from Dynamical Systems View.â€ť In *PRoceedings of ICLR*. http://arxiv.org/abs/1710.10348.

Chen, Tianqi, Ian Goodfellow, and Jonathon Shlens. 2015. â€śNet2Net: Accelerating Learning via Knowledge Transfer,â€ť November. http://arxiv.org/abs/1511.05641.

Chen, Tian Qi, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. 2018. â€śNeural Ordinary Differential Equations.â€ť In *Advances in Neural Information Processing Systems 31*, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 6572â€“83. Curran Associates, Inc. http://papers.nips.cc/paper/7892-neural-ordinary-differential-equations.pdf.

Dupont, Emilien, Arnaud Doucet, and Yee Whye Teh. 2019. â€śAugmented Neural ODEs,â€ť April. http://arxiv.org/abs/1904.01681.

E, Weinan. 2017. â€śA Proposal on Machine Learning via Dynamical Systems.â€ť *Communications in Mathematics and Statistics* 5 (1): 1â€“11. https://doi.org/10.1007/s40304-017-0103-z.

E, Weinan, Jiequn Han, and Qianxiao Li. 2018. â€śA Mean-Field Optimal Control Formulation of Deep Learning,â€ť July. http://arxiv.org/abs/1807.01083.

Garnelo, Marta, Dan Rosenbaum, Chris J. Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo J. Rezende, and S. M. Ali Eslami. 2018. â€śConditional Neural Processes,â€ť July, 10. https://arxiv.org/abs/1807.01613v1.

Garnelo, Marta, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S. M. Ali Eslami, and Yee Whye Teh. 2018. â€śNeural Processes,â€ť July. https://arxiv.org/abs/1807.01622v1.

Gholami, Amir, Kurt Keutzer, and George Biros. 2019. â€śANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs,â€ť February. http://arxiv.org/abs/1902.10298.

Grathwohl, Will, Ricky T. Q. Chen, Jesse Bettencourt, Ilya Sutskever, and David Duvenaud. 2018. â€śFFJORD: Free-Form Continuous Dynamics for Scalable Reversible Generative Models,â€ť October. http://arxiv.org/abs/1810.01367.

Haber, Eldad, Felix Lucka, and Lars Ruthotto. 2018. â€śNever Look Back - A Modified EnKF Method and Its Application to the Training of Neural Networks Without Back Propagation,â€ť May. http://arxiv.org/abs/1805.08034.

Haber, Eldad, and Lars Ruthotto. 2018. â€śStable Architectures for Deep Neural Networks.â€ť *Inverse Problems* 34 (1): 014004. https://doi.org/10.1088/1361-6420/aa9a90.

Haber, Eldad, Lars Ruthotto, Elliot Holtham, and Seong-Hwan Jun. 2017. â€śLearning Across Scales - A Multiscale Method for Convolution Neural Networks,â€ť March. http://arxiv.org/abs/1703.02009.

Han, Jiequn, Arnulf Jentzen, and Weinan E. 2018. â€śSolving High-Dimensional Partial Differential Equations Using Deep Learning.â€ť *Proceedings of the National Academy of Sciences* 115 (34): 8505â€“10. https://doi.org/10.1073/pnas.1718942115.

Hardt, Moritz, Benjamin Recht, and Yoram Singer. 2015. â€śTrain Faster, Generalize Better: Stability of Stochastic Gradient Descent,â€ť September. http://arxiv.org/abs/1509.01240.

Haro, A. 2008. â€śAutomatic Differentiation Methods in Computational Dynamical Systems: Invariant Manifolds and Normal Forms of Vector Fields at Fixed Points.â€ť *IMA Note*. http://www.maia.ub.es/~alex/admcds/admcds.pdf.

He, Junxian, Daniel Spokoyny, Graham Neubig, and Taylor Berg-Kirkpatrick. 2019. â€śLagging Inference Networks and Posterior Collapse in Variational Autoencoders.â€ť In *PRoceedings of ICLR*. http://arxiv.org/abs/1901.05534.

Jing, Li, Yichen Shen, Tena Dubcek, John Peurifoy, Scott Skirlo, Yann LeCun, Max Tegmark, and Marin SoljaÄŤiÄ‡. 2017. â€śTunable Efficient Unitary Neural Networks (EUNN) and Their Application to RNNs.â€ť In *PMLR*, 1733â€“41. http://proceedings.mlr.press/v70/jing17a.html.

Liu, Hanxiao, Karen Simonyan, and Yiming Yang. 2018. â€śDARTS: Differentiable Architecture Search,â€ť June. http://arxiv.org/abs/1806.09055.

Meng, Qi, Yue Wang, Wei Chen, Taifeng Wang, Zhi-Ming Ma, and Tie-Yan Liu. 2016. â€śGeneralization Error Bounds for Optimization Algorithms via Stability.â€ť In, 10:441â€“74. http://arxiv.org/abs/1609.08397.

Mhammedi, Zakaria, Andrew Hellicar, Ashfaqur Rahman, and James Bailey. 2017. â€śEfficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections.â€ť In *PMLR*, 2401â€“9. http://proceedings.mlr.press/v70/mhammedi17a.html.

Niu, Murphy Yuezhen, Lior Horesh, and Isaac Chuang. 2019. â€śRecurrent Neural Networks in the Eye of Differential Equations,â€ť April. http://arxiv.org/abs/1904.12933.

Rackauckas, Christopher. 2019. â€śThe Essential Tools of Scientific Machine Learning (Scientific ML).â€ť *The Winnower*, August. https://doi.org/10.15200/winn.156631.13064.

Rackauckas, Christopher, Yingbo Ma, Vaibhav Dixit, Xingjian Guo, Mike Innes, Jarrett Revels, Joakim Nyberg, and Vijay Ivaturi. 2018. â€śA Comparison of Automatic Differentiation and Continuous Sensitivity Analysis for Derivatives of Differential Equation Solutions,â€ť December. http://arxiv.org/abs/1812.01892.

Roeder, Geoffrey, Paul K. Grant, Andrew Phillips, Neil Dalchau, and Edward Meeds. 2019. â€śEfficient Amortised Bayesian Inference for Hierarchical and Nonlinear Dynamical Systems,â€ť May. http://arxiv.org/abs/1905.12090.

Ruthotto, Lars, and Eldad Haber. 2018. â€śDeep Neural Networks Motivated by Partial Differential Equations,â€ť April. http://arxiv.org/abs/1804.04272.

Vorontsov, Eugene, Chiheb Trabelsi, Samuel Kadoury, and Chris Pal. 2017. â€śOn Orthogonality and Learning Recurrent Networks with Long Term Dependencies.â€ť In *PMLR*, 3570â€“8. http://proceedings.mlr.press/v70/vorontsov17a.html.

Wiatowski, Thomas, and Helmut BĂ¶lcskei. 2015. â€śA Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction.â€ť In *Proceedings of IEEE International Symposium on Information Theory*. http://arxiv.org/abs/1512.06293.

Wiatowski, Thomas, Philipp Grohs, and Helmut BĂ¶lcskei. 2018. â€śEnergy Propagation in Deep Convolutional Neural Networks.â€ť *IEEE Transactions on Information Theory* 64 (7): 1â€“1. https://doi.org/10.1109/TIT.2017.2756880.

YÄ±ldÄ±z, Ă‡aÄźatay, Markus Heinonen, and Harri LĂ¤hdesmĂ¤ki. 2019. â€śODE$2Ě‚$VAE: Deep Generative Second Order ODEs with Bayesian Neural Networks,â€ť October. http://arxiv.org/abs/1905.10994.