The Living Thing / Notebooks :

Probabilistic deep learning

Creating neural networks which infer whole probability densities their predictions, (usually approximately) rather than point estimates. Or part, at least, of the density estimating problem, accomplished with neural nets, in a Bayesian setting. Prediction uncertainties, approximate model averaging etc would all fit in this category.

AFAICT this usually boils down to doing variational inference, in which case the neural network is a big approximate PDGM. Apparently you can also do simulation-based inference here, somehow using gradients? Must look into that.

To learn:



Radford Neal’s thesis (Neal96) is a foundational asymptotically-bayesian use of neural netwroks. Yarin Gal’s PhD Thesis (Gal16) summarizes some implicit approximate approaches (e.g. the Bayesian interpretation of dropout).

Alex Graves did a nice poster of his paper (Grav11) of a simplest prior uncertainty thing for recurrent nets - (diagonal Gaussian weight uncertainty) There is a half-arsed implementation.


Blei Lab’s software tool: Edward (source) Tensorflow indeed comes with a contributed Bayesian library called BayesFlow (Which is not the same as the cytometry library of the same name) which by contrast has documentation so perfunctory that I can’t imagine it not being easier to reimplement it than to understand it.

Thomas Wiecki, Bayesian Deep Learning shows how to some variants with PyMC3.

Christopher Bonnett: Mixture Density Networks with Edward, Keras and TensorFlow.


Abbasnejad, E., Dick, A., & Hengel, A. van den. (2016) Infinite Variational Autoencoder for Semi-Supervised Learning. In Advances in Neural Information Processing Systems 29.
Archer, E., Park, I. M., Buesing, L., Cunningham, J., & Paninski, L. (2015) Black box variational inference for state space models. ArXiv:1511.07367 [Stat].
Bishop, C. (1994) Mixture Density Networks. Microsoft Research.
Bora, A., Jalal, A., Price, E., & Dimakis, A. G.(2017) Compressed Sensing using Generative Models. ArXiv:1703.03208 [Cs, Math, Stat].
Bui, T. D., Ravi, S., & Ramavajjala, V. (2017) Neural Graph Machines: Learning Neural Networks Using Graphs. ArXiv:1703.04818 [Cs].
Cutajar, K., Bonilla, E. V., Michiardi, P., & Filippone, M. (2017) Random Feature Expansions for Deep Gaussian Processes. In PMLR.
Doerr, A., Daniel, C., Schiegg, M., Nguyen-Tuong, D., Schaal, S., Toussaint, M., & Trimpe, S. (2018) Probabilistic Recurrent State-Space Models. ArXiv:1801.10395 [Stat].
Fabius, O., & van Amersfoort, J. R.(2014) Variational Recurrent Auto-Encoders. In Proceedings of ICLR.
Flunkert, V., Salinas, D., & Gasthaus, J. (2017) DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks. ArXiv:1704.04110 [Cs, Stat].
Gal, Y. (2015) Rapid Prototyping of Probabilistic Models: Emerging Challenges in Variational Inference. In Advances in Approximate Bayesian Inference workshop, NIPS.
Gal, Y. (2016) Uncertainty in Deep Learning (phdthesis). . University of Cambridge
Gal, Y., & Ghahramani, Z. (2015a) Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML-16).
Gal, Y., & Ghahramani, Z. (2015b) On Modern Deep Learning and Variational Inference. In Advances in Approximate Bayesian Inference workshop, NIPS.
Gal, Y., & Ghahramani, Z. (2016a) A Theoretically Grounded Application of Dropout in Recurrent Neural Networks. In arXiv:1512.05287 [stat].
Gal, Y., & Ghahramani, Z. (2016b) Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference. In 4th International Conference on Learning Representations (ICLR) workshop track.
Graves, A. (2011) Practical Variational Inference for Neural Networks. In Proceedings of the 24th International Conference on Neural Information Processing Systems (pp. 2348–2356). USA: Curran Associates Inc.
Graves, A., Mohamed, A., & Hinton, G. (2013) Speech Recognition with Deep Recurrent Neural Networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. DOI.
Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015) DRAW: A Recurrent Neural Network For Image Generation. ArXiv:1502.04623 [Cs].
Gu, S., Ghahramani, Z., & Turner, R. E.(2015) Neural Adaptive Sequential Monte Carlo. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 28 (pp. 2629–2637). Curran Associates, Inc.
Gu, S., Levine, S., Sutskever, I., & Mnih, A. (2015) MuProp: Unbiased Backpropagation for Stochastic Neural Networks.
Hoffman, M., & Blei, D. (2015) Stochastic Structured Variational Inference. In PMLR (pp. 361–369).
Johnson, M. J., Duvenaud, D., Wiltschko, A. B., Datta, S. R., & Adams, R. P.(2016) Composing graphical models with neural networks for structured representations and fast inference. ArXiv:1603.06277 [Stat].
Karl, M., Soelch, M., Bayer, J., & van der Smagt, P. (2016) Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data. In Proceedings of ICLR.
Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., & Welling, M. (2016) Improving Variational Inference with Inverse Autoregressive Flow. In Advances in Neural Information Processing Systems 29. Curran Associates, Inc.
Kingma, D. P., & Welling, M. (2014) Auto-Encoding Variational Bayes. In ICLR 2014 conference.
Krauth, K., Bonilla, E. V., Cutajar, K., & Filippone, M. (2016) AutoGP: Exploring the Capabilities and Limitations of Gaussian Process Models. In UAI17.
Krishnan, R. G., Shalit, U., & Sontag, D. (2015) Deep kalman filters. ArXiv Preprint ArXiv:1511.05121.
Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2015) Autoencoding beyond pixels using a learned similarity metric. ArXiv:1512.09300 [Cs, Stat].
Le, T. A., Igl, M., Jin, T., Rainforth, T., & Wood, F. (2017) Auto-Encoding Sequential Monte Carlo. ArXiv Preprint ArXiv:1705.10306.
Lobacheva, E., Chirkova, N., & Vetrov, D. (2017) Bayesian Sparsification of Recurrent Neural Networks. In Workshop on Learning to Generate Natural Language.
Louizos, C., & Welling, M. (2016) Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors. In arXiv preprint arXiv:1603.04733 (pp. 1708–1716).
Louizos, C., & Welling, M. (2017) Multiplicative Normalizing Flows for Variational Bayesian Neural Networks. In PMLR (pp. 2218–2227).
MacKay, D. J. C.(2002a) Gaussian Processes. In Information Theory, Inference & Learning Algorithms (p. Chapter 45). Cambridge University Press
MacKay, D. J. C.(2002b) Information Theory, Inference & Learning Algorithms. . Cambridge University Press
Maddison, C. J., Lawson, D., Tucker, G., Heess, N., Norouzi, M., Mnih, A., … Teh, Y. W.(2017) Filtering Variational Objectives. ArXiv Preprint ArXiv:1705.09279.
Matthews, A. G. de G., van der Wilk, M., Nickson, T., Fujii, K., Boukouvalas, A., León-Villagrá, P., … Hensman, J. (2016) GPflow: A Gaussian process library using TensorFlow. ArXiv:1610.08733 [Stat].
Molchanov, D., Ashukha, A., & Vetrov, D. (2017) Variational Dropout Sparsifies Deep Neural Networks. In Proceedings of ICML.
Neal, R. M.(1996) Bayesian Learning for Neural Networks. (Vol. 118). Secaucus, NJ, USA: Springer-Verlag New York, Inc.
Ngiam, J., Chen, Z., Koh, P. W., & Ng, A. Y.(2011) Learning deep energy models. In Proceedings of the 28th International Conference on Machine Learning (ICML-11) (pp. 1105–1112).
Rasmussen, C. E., & Williams, C. K. I.(2006) Gaussian processes for machine learning. . Cambridge, Mass: MIT Press
Ravi, S., & Diao, Q. (2016) Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation. In PMLR (pp. 519–528).
Ryder, T., Golightly, A., McGough, A. S., & Prangle, D. (2018) Black-box Variational Inference for Stochastic Differential Equations. ArXiv:1802.03335 [Stat].
Tran, D., Hoffman, M. D., Saurous, R. A., Brevdo, E., Murphy, K., & Blei, D. M.(2017) Deep Probabilistic Programming. In ICLR.
Tran, D., Kucukelbir, A., Dieng, A. B., Rudolph, M., Liang, D., & Blei, D. M.(2016) Edward: A library for probabilistic modeling, inference, and criticism. ArXiv:1610.09787 [Cs, Stat].
Wainwright, M., & Jordan, M. (2005) A variational principle for graphical models. In New Directions in Statistical Signal Processing (Vol. 155). MIT Press