A recurring movement within deep learning research which tries to render the learning of prediction functions tractable by considering them as dynamical systems, and using the theory of stability in the context of Hamiltonians,r optimal control and/or ODE solvers, to make it all work.

I’ve been interested by this ever since seeing the Haber and Ruthotto paper, but it’s got a real kick recently since the Vector Institute team’s paper won the prize at NeurIPS for the ODE formulation of the problem.

# Stability of training

Related, but not quite the same, notion of stability, as in data-stability in learning.

## Can it work on time series?

Good question; It looks like it should, since there is an implicit time series the ODE-solver. But these problems so far have use non-time-series data.

## Neural ODE regressors

By which I mean *learning an ODE whose solution is the regression problem*, which is a particular case of learning an ODE. There are various layperons’ introductions to this, including the simple and practical magical take in julia.

## Random stuff

My question: How can this be made Bayesian?

Is this bit of overwrought? Improving Neural Models by Compensating for Discrete Rather Than Continuous Filter Dynamics when Simulating on Digital Systems.

TBC. Lyapunov analysis, Hamiltonian dynamics.

## Refs

- AnLG18: Cem Anil, James Lucas, Roger Grosse (2018) Sorting out Lipschitz function approximation.
- ArSB16: Martin Arjovsky, Amar Shah, Yoshua Bengio (2016) Unitary Evolution Recurrent Neural Networks. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 (pp. 1120–1128). New York, NY, USA: JMLR.org
- ChGS15: Tianqi Chen, Ian Goodfellow, Jonathon Shlens (2015) Net2Net: Accelerating Learning via Knowledge Transfer.
*ArXiv:1511.05641 [Cs]*. - CMHR18: Bo Chang, Lili Meng, Eldad Haber, Lars Ruthotto, David Begert, Elliot Holtham (2018) Reversible Architectures for Arbitrarily Deep Residual Neural Networks. In arXiv:1709.03698 [cs, stat].
- CMHT18: Bo Chang, Lili Meng, Eldad Haber, Frederick Tung, David Begert (2018) Multi-level Residual Networks from Dynamical Systems View. In PRoceedings of ICLR.
- CRBD18: Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, David K Duvenaud (2018) Neural Ordinary Differential Equations. In Advances in Neural Information Processing Systems 31 (pp. 6572–6583). Curran Associates, Inc.
- E17: Weinan E (2017) A Proposal on Machine Learning via Dynamical Systems.
*Communications in Mathematics and Statistics*, 5(1), 1–11. DOI - EHL18: Weinan E, Jiequn Han, Qianxiao Li (2018) A Mean-Field Optimal Control Formulation of Deep Learning.
*ArXiv:1807.01083 [Cs, Math]*. - HaLR18: Eldad Haber, Felix Lucka, Lars Ruthotto (2018) Never look back - A modified EnKF method and its application to the training of neural networks without back propagation.
*ArXiv:1805.08034 [Cs, Math]*. - HaRS15: Moritz Hardt, Benjamin Recht, Yoram Singer (2015) Train faster, generalize better: Stability of stochastic gradient descent.
*ArXiv:1509.01240 [Cs, Math, Stat]*. - HaRu18: Eldad Haber, Lars Ruthotto (2018) Stable architectures for deep neural networks.
*Inverse Problems*, 34(1), 014004. DOI - HRHJ17: Eldad Haber, Lars Ruthotto, Elliot Holtham, Seong-Hwan Jun (2017) Learning across scales - A multiscale method for Convolution Neural Networks.
*ArXiv:1703.02009 [Cs]*. - HSNB19: Junxian He, Daniel Spokoyny, Graham Neubig, Taylor Berg-Kirkpatrick (2019) Lagging Inference Networks and Posterior Collapse in Variational Autoencoders. In PRoceedings of ICLR.
- JSDP17: Li Jing, Yichen Shen, Tena Dubcek, John Peurifoy, Scott Skirlo, Yann LeCun, … Marin Soljačić (2017) Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNNs. In PMLR (pp. 1733–1741).
- LiSY18: Hanxiao Liu, Karen Simonyan, Yiming Yang (2018) DARTS: Differentiable Architecture Search.
*ArXiv:1806.09055 [Cs, Stat]*. - MHRB17: Zakaria Mhammedi, Andrew Hellicar, Ashfaqur Rahman, James Bailey (2017) Efficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections. In PMLR (pp. 2401–2409).
- MWCW16: Qi Meng, Yue Wang, Wei Chen, Taifeng Wang, Zhi-Ming Ma, Tie-Yan Liu (2016) Generalization Error Bounds for Optimization Algorithms via Stability. In arXiv:1609.08397 [stat] (Vol. 10, pp. 441–474).
- RMDG18: Christopher Rackauckas, Yingbo Ma, Vaibhav Dixit, Xingjian Guo, Mike Innes, Jarrett Revels, … Vijay Ivaturi (2018) A Comparison of Automatic Differentiation and Continuous Sensitivity Analysis for Derivatives of Differential Equation Solutions.
*ArXiv:1812.01892 [Cs]*. - RuHa18: Lars Ruthotto, Eldad Haber (2018) Deep Neural Networks motivated by Partial Differential Equations.
*ArXiv:1804.04272 [Cs, Math, Stat]*. - WiBö15: Thomas Wiatowski, Helmut Bölcskei (2015) A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction. In Proceedings of IEEE International Symposium on Information Theory.
- WiGB17: Thomas Wiatowski, Philipp Grohs, Helmut Bölcskei (2017) Energy Propagation in Deep Convolutional Neural Networks.
*IEEE Transactions on Information Theory*, 1–1. DOI