The Living Thing / Notebooks :

Probabilistic deep learning

Creating neural networks which infer whole probability densities or certainties for their predictions, rather than point estimates.

In Bayesian terms this is about estimating a posterior distribution, given prior distributions on parameters, and in frequentist terms, it’s an approximate model averageing.

AFAICT this usually boils down to doing variational inference, in which case the neural network is a big approximate PDGM. Apparently you can also do simulation-based inference here, somehow using gradients? Must look into that.

Yarin Gal’s PhD Thesis (Gal16) summarizes some implicit approximate approaches (esp the bayesian interpreation of dropout), and Radford Neal’s thesis (Neal96) approximates some

Alex Graves did a nice poster of his paper (Grav11) of a simplest prior uncertainty thing for recurrent nets - (diagonal Gaussian weight uncertainty) There is a half-arsed implementation.


Blei Lab’s software tool: Edward (source) Tensorflow indeed comes with a contributed Bayesian library called BayesFlow (Which is not the same as the cytometry library of the same name) which by contrast has documentation so perfunctory that I can’t imagine it not being easier to reimplement it than to understand it.

Thomas Wiecki, Bayesian Deep Learning shows how to some variatn with PyMC3.

Christopher Bonnett: Mixture Density Networks with Edward, Keras and TensorFlow.


Abbasnejad, E., Dick, A., & Hengel, A. van den. (2016) Infinite Variational Autoencoder for Semi-Supervised Learning. In Advances in Neural Information Processing Systems 29.
Bishop, C. (1994) Mixture Density Networks. Microsoft Research.
Bora, A., Jalal, A., Price, E., & Dimakis, A. G.(2017) Compressed Sensing using Generative Models. ArXiv:1703.03208 [Cs, Math, Stat].
Bui, T. D., Ravi, S., & Ramavajjala, V. (2017) Neural Graph Machines: Learning Neural Networks Using Graphs. ArXiv:1703.04818 [Cs].
Cutajar, K., Bonilla, E. V., Michiardi, P., & Filippone, M. (2016) Practical Learning of Deep Gaussian Processes via Random Fourier Features. ArXiv:1610.04386 [Stat].
Cutajar, K., Bonilla, E. V., Michiardi, P., & Filippone, M. (2017) Random Feature Expansions for Deep Gaussian Processes. In PMLR.
Gal, Y. (2015) Rapid Prototyping of Probabilistic Models: Emerging Challenges in Variational Inference. In Advances in Approximate Bayesian Inference workshop, NIPS.
Gal, Y. (2016) Uncertainty in Deep Learning (phdthesis). . University of Cambridge
Gal, Y., & Ghahramani, Z. (2015a) Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML-16).
Gal, Y., & Ghahramani, Z. (2015b) On Modern Deep Learning and Variational Inference. In Advances in Approximate Bayesian Inference workshop, NIPS.
Gal, Y., & Ghahramani, Z. (2016a) A Theoretically Grounded Application of Dropout in Recurrent Neural Networks. In arXiv:1512.05287 [stat].
Gal, Y., & Ghahramani, Z. (2016b) Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference. In 4th International Conference on Learning Representations (ICLR) workshop track.
Graves, A. (2011) Practical Variational Inference for Neural Networks. In Proceedings of the 24th International Conference on Neural Information Processing Systems (pp. 2348–2356). USA: Curran Associates Inc.
Graves, A., Mohamed, A., & Hinton, G. (2013) Speech Recognition with Deep Recurrent Neural Networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. DOI.
Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015) DRAW: A Recurrent Neural Network For Image Generation. ArXiv:1502.04623 [Cs].
Hoffman, M., & Blei, D. (2015) Stochastic Structured Variational Inference. In PMLR (pp. 361–369).
Johnson, M. J., Duvenaud, D., Wiltschko, A. B., Datta, S. R., & Adams, R. P.(2016) Composing graphical models with neural networks for structured representations and fast inference. ArXiv:1603.06277 [Stat].
Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., & Welling, M. (2016) Improving Variational Inference with Inverse Autoregressive Flow. ArXiv:1606.04934 [Cs, Stat].
Kingma, D. P., & Welling, M. (2013) Auto-Encoding Variational Bayes. ArXiv:1312.6114 [Cs, Stat].
Krishnan, R. G., Shalit, U., & Sontag, D. (2015) Deep kalman filters. ArXiv Preprint ArXiv:1511.05121.
Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2015) Autoencoding beyond pixels using a learned similarity metric. ArXiv:1512.09300 [Cs, Stat].
Lobacheva, E., Chirkova, N., & Vetrov, D. (2017) Bayesian Sparsification of Recurrent Neural Networks. In Workshop on Learning to Generate Natural Language.
Louizos, C., & Welling, M. (2016) Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors. ArXiv Preprint ArXiv:1603.04733.
MacKay, D. J. C.(2002a) Gaussian Processes. In Information Theory, Inference & Learning Algorithms (p. Chapter 45). Cambridge University Press
MacKay, D. J. C.(2002b) Information Theory, Inference & Learning Algorithms. . Cambridge University Press
Molchanov, D., Ashukha, A., & Vetrov, D. (2017) Variational Dropout Sparsifies Deep Neural Networks. In Proceedings of ICML.
Neal, R. M.(1996) Bayesian Learning for Neural Networks. (Vol. 118). Secaucus, NJ, USA: Springer-Verlag New York, Inc.
Rasmussen, C. E., & Williams, C. K. I.(2006) Gaussian processes for machine learning. . Cambridge, Mass: MIT Press
Ravi, S., & Diao, Q. (2016) Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation. In PMLR (pp. 519–528).
Tran, D., Hoffman, M. D., Saurous, R. A., Brevdo, E., Murphy, K., & Blei, D. M.(2017) Deep Probabilistic Programming. In ICLR.
Tran, D., Kucukelbir, A., Dieng, A. B., Rudolph, M., Liang, D., & Blei, D. M.(2016) Edward: A library for probabilistic modeling, inference, and criticism. ArXiv:1610.09787 [Cs, Stat].
Wainwright, M., & Jordan, M. (2005) A variational principle for graphical models. In New Directions in Statistical Signal Processing (Vol. 155). MIT Press