Creating neural networks which infer whole probability densities their predictions, (usually approximately) rather than point estimates. Or part, at least, of the density estimating problem, accomplished with neural nets, in a Bayesian setting. Prediction uncertainties, approximate model averaging etc would all fit in this category.

AFAICT this usually boils down to doing variational inference, in which case the neural network is a big approximate PDGM. Apparently you can also do simulation-based inference here, somehow using gradients? Must look into that. Also, Gaussian Processes can be made to fit into this framing.

To learn:

- natural gradient
- how does this work outside of KL-divergence?
- marginal likelihood in model selection: how does it work with many optima?

## Backgrounders

Radford Neal’s thesis (Neal96) is a foundational asymptotically-Bayesian use of neural netwroks. Yarin Gal’s PhD Thesis (Gal16) summarizes some implicit approximate approaches (e.g. the Bayesian interpretation of dropout). Diederik P. Kingma’s thesis is the latest blockbuster in this tradition.

Alex Graves did a nice poster of his paper (Grav11) of a simplest prior uncertainty thing for recurrent nets - (diagonal Gaussian weight uncertainty) There is a half-arsed implementation.

## Reparameterisation

## Practicalities

Blei Lab’s software tool: Edward (source) Tensorflow indeed comes with a contributed Bayesian library called BayesFlow (Which is not the same as the cytometry library of the same name) which by contrast has documentation so perfunctory that I can’t imagine it not being easier to reimplement it than to understand it.

Thomas Wiecki, Bayesian Deep Learning shows how to some variants with PyMC3.

Christopher Bonnett: Mixture Density Networks with Edward, Keras and TensorFlow.

## Refs

- GaGh16a: Yarin Gal, Zoubin Ghahramani (2016a) A Theoretically Grounded Application of Dropout in Recurrent Neural Networks. In arXiv:1512.05287 [stat].
- WaJo05: M. Wainwright, M. Jordan (2005) A variational principle for graphical models. In New Directions in Statistical Signal Processing (Vol. 155). MIT Press
- LSLW15: Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, Ole Winther (2015) Autoencoding beyond pixels using a learned similarity metric.
*ArXiv:1512.09300 [Cs, Stat]*. - LIJR17: Tuan Anh Le, Maximilian Igl, Tom Jin, Tom Rainforth, Frank Wood (2017) Auto-Encoding Sequential Monte Carlo.
*ArXiv Preprint ArXiv:1705.10306*. - KiWe14: Diederik P. Kingma, Max Welling (2014) Auto-Encoding Variational Bayes. In ICLR 2014 conference.
- KBCF16: Karl Krauth, Edwin V. Bonilla, Kurt Cutajar, Maurizio Filippone (2016) AutoGP: Exploring the Capabilities and Limitations of Gaussian Process Models. In UAI17.
- GaGh16b: Yarin Gal, Zoubin Ghahramani (2016b) Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference. In 4th International Conference on Learning Representations (ICLR) workshop track.
- Neal96: Radford M. Neal (1996)
*Bayesian Learning for Neural Networks*(Vol. 118). Secaucus, NJ, USA: Springer-Verlag New York, Inc. - LoCV17: Ekaterina Lobacheva, Nadezhda Chirkova, Dmitry Vetrov (2017) Bayesian Sparsification of Recurrent Neural Networks. In Workshop on Learning to Generate Natural Language.
- APBC15: Evan Archer, Il Memming Park, Lars Buesing, John Cunningham, Liam Paninski (2015) Black box variational inference for state space models.
*ArXiv:1511.07367 [Stat]*. - RGMP18: Thomas Ryder, Andrew Golightly, A. Stephen McGough, Dennis Prangle (2018) Black-box Variational Inference for Stochastic Differential Equations.
*ArXiv:1802.03335 [Stat]*. - LSMS17: Christos Louizos, Uri Shalit, Joris M Mooij, David Sontag, Richard Zemel, Max Welling (2017) Causal Effect Inference with Deep Latent-Variable Models. In Advances in Neural Information Processing Systems 30 (pp. 6446–6456). Curran Associates, Inc.
- JDWD16: Matthew J. Johnson, David Duvenaud, Alexander B. Wiltschko, Sandeep R. Datta, Ryan P. Adams (2016) Composing graphical models with neural networks for structured representations and fast inference.
*ArXiv:1603.06277 [Stat]*. - BJPD17: Ashish Bora, Ajil Jalal, Eric Price, Alexandros G. Dimakis (2017) Compressed Sensing using Generative Models. In International Conference on Machine Learning (pp. 537–546).
- KrSS15: Rahul G. Krishnan, Uri Shalit, David Sontag (2015) Deep kalman filters.
*ArXiv Preprint ArXiv:1511.05121*. - THSB17: Dustin Tran, Matthew D. Hoffman, Rif A. Saurous, Eugene Brevdo, Kevin Murphy, David M. Blei (2017) Deep Probabilistic Programming. In ICLR.
- KSBS16: Maximilian Karl, Maximilian Soelch, Justin Bayer, Patrick van der Smagt (2016) Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data. In Proceedings of ICLR.
- FlSG17: Valentin Flunkert, David Salinas, Jan Gasthaus (2017) DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks.
*ArXiv:1704.04110 [Cs, Stat]*. - GDGR15: Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, Daan Wierstra (2015) DRAW: A Recurrent Neural Network For Image Generation.
*ArXiv:1502.04623 [Cs]*. - GaGh15a: Yarin Gal, Zoubin Ghahramani (2015a) Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML-16).
- TKDR16: Dustin Tran, Alp Kucukelbir, Adji B. Dieng, Maja Rudolph, Dawen Liang, David M. Blei (2016) Edward: A library for probabilistic modeling, inference, and criticism.
*ArXiv:1610.09787 [Cs, Stat]*. - MLTH17: Chris J. Maddison, Dieterich Lawson, George Tucker, Nicolas Heess, Mohammad Norouzi, Andriy Mnih, … Yee Whye Teh (2017) Filtering Variational Objectives.
*ArXiv Preprint ArXiv:1705.09279*. - RaWi06: Carl Edward Rasmussen, Christopher K. I. Williams (2006)
*Gaussian processes for machine learning*. Cambridge, Mass: MIT Press - MWNF16: Alexander G. de G. Matthews, Mark van der Wilk, Tom Nickson, Keisuke Fujii, Alexis Boukouvalas, Pablo León-Villagrá, … James Hensman (2016) GPflow: A Gaussian process library using TensorFlow.
*ArXiv:1610.08733 [Stat]*. - KSJC16: Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling (2016) Improving Variational Inference with Inverse Autoregressive Flow. In Advances in Neural Information Processing Systems 29. Curran Associates, Inc.
- AbDH16: Ehsan Abbasnejad, Anthony Dick, Anton van den Hengel (2016) Infinite Variational Autoencoder for Semi-Supervised Learning. In Advances in Neural Information Processing Systems 29.
- Mack02: David J C MacKay (2002)
*Information Theory, Inference & Learning Algorithms*. Cambridge University Press - RaDi16: Sujith Ravi, Qiming Diao (2016) Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation. In PMLR (pp. 519–528).
- NCKN11: Jiquan Ngiam, Zhenghao Chen, Pang W. Koh, Andrew Y. Ng (2011) Learning deep energy models. In Proceedings of the 28th International Conference on Machine Learning (ICML-11) (pp. 1105–1112).
- Bish94: Christopher Bishop (1994) Mixture Density Networks.
*Microsoft Research*. - LoWe17: Christos Louizos, Max Welling (2017) Multiplicative Normalizing Flows for Variational Bayesian Neural Networks. In PMLR (pp. 2218–2227).
- GLSM16: Shixiang Gu, Sergey Levine, Ilya Sutskever, Andriy Mnih (2016) MuProp: Unbiased Backpropagation for Stochastic Neural Networks. In Proceedings of ICLR.
- GuGT15: Shixiang Gu, Zoubin Ghahramani, Richard E Turner (2015) Neural Adaptive Sequential Monte Carlo. In Advances in Neural Information Processing Systems 28 (pp. 2629–2637). Curran Associates, Inc.
- BuRR17: Thang D. Bui, Sujith Ravi, Vivek Ramavajjala (2017) Neural Graph Machines: Learning Neural Networks Using Graphs.
*ArXiv:1703.04818 [Cs]*. - GaGh15b: Yarin Gal, Zoubin Ghahramani (2015b) On Modern Deep Learning and Variational Inference. In Advances in Approximate Bayesian Inference workshop, NIPS.
- Grav11: Alex Graves (2011) Practical Variational Inference for Neural Networks. In Proceedings of the 24th International Conference on Neural Information Processing Systems (pp. 2348–2356). USA: Curran Associates Inc.
- DDSN18: Andreas Doerr, Christian Daniel, Martin Schiegg, Duy Nguyen-Tuong, Stefan Schaal, Marc Toussaint, Sebastian Trimpe (2018) Probabilistic Recurrent State-Space Models.
*ArXiv:1801.10395 [Stat]*. - CBMF17: Kurt Cutajar, Edwin V. Bonilla, Pietro Michiardi, Maurizio Filippone (2017) Random Feature Expansions for Deep Gaussian Processes. In PMLR.
- Gal15: Yarin Gal (2015) Rapid Prototyping of Probabilistic Models: Emerging Challenges in Variational Inference. In Advances in Approximate Bayesian Inference workshop, NIPS.
- GrMH13: Alex Graves, Abdel-rahman Mohamed, Geoffrey Hinton (2013) Speech Recognition with Deep Recurrent Neural Networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. DOI
- HoBl15: Matthew Hoffman, David Blei (2015) Stochastic Structured Variational Inference. In PMLR (pp. 361–369).
- LoWe16: Christos Louizos, Max Welling (2016) Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors. In arXiv preprint arXiv:1603.04733 (pp. 1708–1716).
- MoAV17: Dmitry Molchanov, Arsenii Ashukha, Dmitry Vetrov (2017) Variational Dropout Sparsifies Deep Neural Networks. In Proceedings of ICML.
- FaAm14: Otto Fabius, Joost R. van Amersfoort (2014) Variational Recurrent Auto-Encoders. In Proceedings of ICLR.