Creating neural networks which infer whole probability densities or certainties for their predictions, rather than point estimates.

In Bayesian terms this is about estimating a posterior distribution,
and in frequentist terms… uh… What *is* a pithy frequentist phrasing?

Anyway, AFAICT this usually boils down to doing variational inference, in which case the neural netwrok is a big approximate PDGM. Apparently you can also do simulation-based inference here, somehow using gradients? Must look into that.

Yarin Gal’s PhD Thesis summarises a lot of stuff here: Uncertainty in Deep Learning.

## Practicalities

Blei Lab’s software tool: Edward (source) Tensorflow indeed comes with a contributed Bayesian library called BayesFlow (Which is not the same as the cytometry library of the same name) which by contrast has documentation so perfunctory that I can’t imagine it not being easier to reimplement it.

Thomas Wiecki, Bayesian Deep Learning shows how to do it with PyMC3.

Christopher Bonnett: Mixture Density Networks with Edward, Keras and TensorFlow.

## Refs

- AbDH16
- Abbasnejad, E., Dick, A., & Hengel, A. van den. (2016) Infinite Variational Autoencoder for Semi-Supervised Learning. In Advances in Neural Information Processing Systems 29.
- Bish94
- Bishop, C. (1994) Mixture Density Networks.
*Microsoft Research*. - CBMF16
- Cutajar, K., Bonilla, E. V., Michiardi, P., & Filippone, M. (2016) Practical Learning of Deep Gaussian Processes via Random Fourier Features.
*arXiv:1610.04386 [Stat]*. - Gal15a
- Gal, Y. (2015a) A Theoretically Grounded Application of Dropout in Recurrent Neural Networks.
*arXiv:1512.05287 [Stat]*. - Gal15b
- Gal, Y. (2015b) Rapid Prototyping of Probabilistic Models: Emerging Challenges in Variational Inference. In Advances in Approximate Bayesian Inference workshop, NIPS.
- Gal16
- Gal, Y. (2016) Uncertainty in Deep Learning (phdthesis). . University of Cambridge
- GaGh15a
- Gal, Y., & Ghahramani, Z. (2015a) Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML-16).
- GaGh15b
- Gal, Y., & Ghahramani, Z. (2015b) On Modern Deep Learning and Variational Inference. In Advances in Approximate Bayesian Inference workshop, NIPS.
- GaGh16
- Gal, Y., & Ghahramani, Z. (2016) Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference. In 4th International Conference on Learning Representations (ICLR) workshop track.
- GDGR15
- Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015) DRAW: A Recurrent Neural Network For Image Generation.
*arXiv:1502.04623 [Cs]*. - KiSW16
- Kingma, D. P., Salimans, T., & Welling, M. (2016) Improving Variational Inference with Inverse Autoregressive Flow.
*arXiv:1606.04934 [Cs, Stat]*. - LSLW15
- Larsen, A. B. L., SĂ¸nderby, S. K., Larochelle, H., & Winther, O. (2015) Autoencoding beyond pixels using a learned similarity metric.
*arXiv:1512.09300 [Cs, Stat]*. - LoWe16
- Louizos, C., & Welling, M. (2016) Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors.
*arXiv Preprint arXiv:1603.04733*. - Mack02a
- MacKay, D. J. C.(2002a) Gaussian Processes. In Information Theory, Inference & Learning Algorithms (p. Chapter 45). Cambridge University Press
- Mack02b
- MacKay, D. J. C.(2002b) Information Theory, Inference & Learning Algorithms. . Cambridge University Press
- MoAV17
- Molchanov, D., Ashukha, A., & Vetrov, D. (2017) Variational Dropout Sparsifies Deep Neural Networks.
*arXiv:1701.05369 [Cs, Stat]*. - RaWi06
- Rasmussen, C. E., & Williams, C. K. I.(2006) Gaussian processes for machine learning. . Cambridge, Mass: MIT Press
- THSB17
- Tran, D., Hoffman, M. D., Saurous, R. A., Brevdo, E., Murphy, K., & Blei, D. M.(2017) Deep Probabilistic Programming.
*arXiv:1701.03757 [Cs, Stat]*. - TKDR16
- Tran, D., Kucukelbir, A., Dieng, A. B., Rudolph, M., Liang, D., & Blei, D. M.(2016) Edward: A library for probabilistic modeling, inference, and criticism.
*arXiv:1610.09787 [Cs, Stat]*. - WaJo05
- Wainwright, M., & Jordan, M. (2005) A variational principle for graphical models. In New Directions in Statistical Signal Processing (Vol. 155). MIT Press