Creating neural networks which infer whole probability densities their predictions, (usually approximately) rather than point estimates. Or part, at least, of the density estimating problem, accomplished with neural nets, in a Bayesian setting. Prediction uncertainties, approximate model averaging etc would all fit in this category.

AFAICT this usually boils down to doing variational inference, in which case the neural network is a big approximate PDGM. Apparently you can also do simulation-based inference here, somehow using gradients? Must look into that.

To learn:

- reparameterisation trick
- natural gradient
- how does this work outside of KL-divergence?
- marginal likelihood in model selection: how does it work with many optima?

Contents

## Backgrounders

Radford Neal’s thesis (Neal96) is a foundational asymptotically-bayesian use of neural netwroks. Yarin Gal’s PhD Thesis (Gal16) summarizes some implicit approximate approaches (e.g. the Bayesian interpretation of dropout).

Alex Graves did a nice poster of his paper (Grav11) of a simplest prior uncertainty thing for recurrent nets - (diagonal Gaussian weight uncertainty) There is a half-arsed implementation.

## Practicalities

Blei Lab’s software tool: Edward (source) Tensorflow indeed comes with a contributed Bayesian library called BayesFlow (Which is not the same as the cytometry library of the same name) which by contrast has documentation so perfunctory that I can’t imagine it not being easier to reimplement it than to understand it.

Thomas Wiecki, Bayesian Deep Learning shows how to some variants with PyMC3.

Christopher Bonnett: Mixture Density Networks with Edward, Keras and TensorFlow.

## Refs

- AbDH16
- Abbasnejad, E., Dick, A., & Hengel, A. van den. (2016) Infinite Variational Autoencoder for Semi-Supervised Learning. In Advances in Neural Information Processing Systems 29.
- APBC15
- Archer, E., Park, I. M., Buesing, L., Cunningham, J., & Paninski, L. (2015) Black box variational inference for state space models.
*ArXiv:1511.07367 [Stat]*. - Bish94
- Bishop, C. (1994) Mixture Density Networks.
*Microsoft Research*. - BJPD17
- Bora, A., Jalal, A., Price, E., & Dimakis, A. G.(2017) Compressed Sensing using Generative Models.
*ArXiv:1703.03208 [Cs, Math, Stat]*. - BuRR17
- Bui, T. D., Ravi, S., & Ramavajjala, V. (2017) Neural Graph Machines: Learning Neural Networks Using Graphs.
*ArXiv:1703.04818 [Cs]*. - CBMF17
- Cutajar, K., Bonilla, E. V., Michiardi, P., & Filippone, M. (2017) Random Feature Expansions for Deep Gaussian Processes. In PMLR.
- DDSN18
- Doerr, A., Daniel, C., Schiegg, M., Nguyen-Tuong, D., Schaal, S., Toussaint, M., & Trimpe, S. (2018) Probabilistic Recurrent State-Space Models.
*ArXiv:1801.10395 [Stat]*. - FaAm14
- Fabius, O., & van Amersfoort, J. R.(2014) Variational Recurrent Auto-Encoders. In Proceedings of ICLR.
- FlSG17
- Flunkert, V., Salinas, D., & Gasthaus, J. (2017) DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks.
*ArXiv:1704.04110 [Cs, Stat]*. - Gal15
- Gal, Y. (2015) Rapid Prototyping of Probabilistic Models: Emerging Challenges in Variational Inference. In Advances in Approximate Bayesian Inference workshop, NIPS.
- Gal16
- Gal, Y. (2016) Uncertainty in Deep Learning (phdthesis). . University of Cambridge
- GaGh15a
- Gal, Y., & Ghahramani, Z. (2015a) Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML-16).
- GaGh15b
- Gal, Y., & Ghahramani, Z. (2015b) On Modern Deep Learning and Variational Inference. In Advances in Approximate Bayesian Inference workshop, NIPS.
- GaGh16a
- Gal, Y., & Ghahramani, Z. (2016a) A Theoretically Grounded Application of Dropout in Recurrent Neural Networks. In arXiv:1512.05287 [stat].
- GaGh16b
- Gal, Y., & Ghahramani, Z. (2016b) Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference. In 4th International Conference on Learning Representations (ICLR) workshop track.
- Grav11
- Graves, A. (2011) Practical Variational Inference for Neural Networks. In Proceedings of the 24th International Conference on Neural Information Processing Systems (pp. 2348–2356). USA: Curran Associates Inc.
- GrMH13
- Graves, A., Mohamed, A., & Hinton, G. (2013) Speech Recognition with Deep Recurrent Neural Networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. DOI.
- GDGR15
- Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015) DRAW: A Recurrent Neural Network For Image Generation.
*ArXiv:1502.04623 [Cs]*. - GuGT15
- Gu, S., Ghahramani, Z., & Turner, R. E.(2015) Neural Adaptive Sequential Monte Carlo. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 28 (pp. 2629–2637). Curran Associates, Inc.
- GLSM15
- Gu, S., Levine, S., Sutskever, I., & Mnih, A. (2015) MuProp: Unbiased Backpropagation for Stochastic Neural Networks.
- HoBl15
- Hoffman, M., & Blei, D. (2015) Stochastic Structured Variational Inference. In PMLR (pp. 361–369).
- JDWD16
- Johnson, M. J., Duvenaud, D., Wiltschko, A. B., Datta, S. R., & Adams, R. P.(2016) Composing graphical models with neural networks for structured representations and fast inference.
*ArXiv:1603.06277 [Stat]*. - KSBS16
- Karl, M., Soelch, M., Bayer, J., & van der Smagt, P. (2016) Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data. In Proceedings of ICLR.
- KSJC16
- Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., & Welling, M. (2016) Improving Variational Inference with Inverse Autoregressive Flow. In Advances in Neural Information Processing Systems 29. Curran Associates, Inc.
- KiWe14
- Kingma, D. P., & Welling, M. (2014) Auto-Encoding Variational Bayes. In ICLR 2014 conference.
- KBCF16
- Krauth, K., Bonilla, E. V., Cutajar, K., & Filippone, M. (2016) AutoGP: Exploring the Capabilities and Limitations of Gaussian Process Models. In UAI17.
- KrSS15
- Krishnan, R. G., Shalit, U., & Sontag, D. (2015) Deep kalman filters.
*ArXiv Preprint ArXiv:1511.05121*. - LSLW15
- Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2015) Autoencoding beyond pixels using a learned similarity metric.
*ArXiv:1512.09300 [Cs, Stat]*. - LIJR17
- Le, T. A., Igl, M., Jin, T., Rainforth, T., & Wood, F. (2017) Auto-Encoding Sequential Monte Carlo.
*ArXiv Preprint ArXiv:1705.10306*. - LoCV17
- Lobacheva, E., Chirkova, N., & Vetrov, D. (2017) Bayesian Sparsification of Recurrent Neural Networks. In Workshop on Learning to Generate Natural Language.
- LoWe16
- Louizos, C., & Welling, M. (2016) Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors. In arXiv preprint arXiv:1603.04733 (pp. 1708–1716).
- LoWe17
- Louizos, C., & Welling, M. (2017) Multiplicative Normalizing Flows for Variational Bayesian Neural Networks. In PMLR (pp. 2218–2227).
- Mack02a
- MacKay, D. J. C.(2002a) Gaussian Processes. In Information Theory, Inference & Learning Algorithms (p. Chapter 45). Cambridge University Press
- Mack02b
- MacKay, D. J. C.(2002b) Information Theory, Inference & Learning Algorithms. . Cambridge University Press
- MLTH17
- Maddison, C. J., Lawson, D., Tucker, G., Heess, N., Norouzi, M., Mnih, A., … Teh, Y. W.(2017) Filtering Variational Objectives.
*ArXiv Preprint ArXiv:1705.09279*. - MWNF16
- Matthews, A. G. de G., van der Wilk, M., Nickson, T., Fujii, K., Boukouvalas, A., León-Villagrá, P., … Hensman, J. (2016) GPflow: A Gaussian process library using TensorFlow.
*ArXiv:1610.08733 [Stat]*. - MoAV17
- Molchanov, D., Ashukha, A., & Vetrov, D. (2017) Variational Dropout Sparsifies Deep Neural Networks. In Proceedings of ICML.
- Neal96
- Neal, R. M.(1996) Bayesian Learning for Neural Networks. (Vol. 118). Secaucus, NJ, USA: Springer-Verlag New York, Inc.
- NCKN11
- Ngiam, J., Chen, Z., Koh, P. W., & Ng, A. Y.(2011) Learning deep energy models. In Proceedings of the 28th International Conference on Machine Learning (ICML-11) (pp. 1105–1112).
- RaWi06
- Rasmussen, C. E., & Williams, C. K. I.(2006) Gaussian processes for machine learning. . Cambridge, Mass: MIT Press
- RaDi16
- Ravi, S., & Diao, Q. (2016) Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation. In PMLR (pp. 519–528).
- RGMP18
- Ryder, T., Golightly, A., McGough, A. S., & Prangle, D. (2018) Black-box Variational Inference for Stochastic Differential Equations.
*ArXiv:1802.03335 [Stat]*. - THSB17
- Tran, D., Hoffman, M. D., Saurous, R. A., Brevdo, E., Murphy, K., & Blei, D. M.(2017) Deep Probabilistic Programming. In ICLR.
- TKDR16
- Tran, D., Kucukelbir, A., Dieng, A. B., Rudolph, M., Liang, D., & Blei, D. M.(2016) Edward: A library for probabilistic modeling, inference, and criticism.
*ArXiv:1610.09787 [Cs, Stat]*. - WaJo05
- Wainwright, M., & Jordan, M. (2005) A variational principle for graphical models. In New Directions in Statistical Signal Processing (Vol. 155). MIT Press