The Living Thing / Notebooks : Probabilistic deep learning

Creating neural networks which infer whole probability densities or certainties for their predictions, rather than point estimates.

In Bayesian terms this is about estimating a posterior distribution, and in frequentist terms… uh… What is a pithy frequentist phrasing?

Anyway, AFAICT this usually boils down to doing variational inference, in which case the neural netwrok is a big approximate PDGM. Apparently you can also do simulation-based inference here, somehow using gradients? Must look into that.

Yarin Gal’s PhD Thesis summarises a lot of stuff here: Uncertainty in Deep Learning.

Practicalities

Blei Lab’s software tool: Edward (source) Tensorflow indeed comes with a contributed Bayesian library called BayesFlow (Which is not the same as the cytometry library of the same name) which by contrast has documentation so perfunctory that I can’t imagine it not being easier to reimplement it.

Thomas Wiecki, Bayesian Deep Learning shows how to do it with PyMC3.

Christopher Bonnett: Mixture Density Networks with Edward, Keras and TensorFlow.

Refs

AbDH16
Abbasnejad, E., Dick, A., & Hengel, A. van den. (2016) Infinite Variational Autoencoder for Semi-Supervised Learning. In Advances in Neural Information Processing Systems 29.
Bish94
Bishop, C. (1994) Mixture Density Networks. Microsoft Research.
CBMF16
Cutajar, K., Bonilla, E. V., Michiardi, P., & Filippone, M. (2016) Practical Learning of Deep Gaussian Processes via Random Fourier Features. arXiv:1610.04386 [Stat].
Gal15a
Gal, Y. (2015a) A Theoretically Grounded Application of Dropout in Recurrent Neural Networks. arXiv:1512.05287 [Stat].
Gal15b
Gal, Y. (2015b) Rapid Prototyping of Probabilistic Models: Emerging Challenges in Variational Inference. In Advances in Approximate Bayesian Inference workshop, NIPS.
Gal16
Gal, Y. (2016) Uncertainty in Deep Learning (phdthesis). . University of Cambridge
GaGh15a
Gal, Y., & Ghahramani, Z. (2015a) Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML-16).
GaGh15b
Gal, Y., & Ghahramani, Z. (2015b) On Modern Deep Learning and Variational Inference. In Advances in Approximate Bayesian Inference workshop, NIPS.
GaGh16
Gal, Y., & Ghahramani, Z. (2016) Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference. In 4th International Conference on Learning Representations (ICLR) workshop track.
GDGR15
Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015) DRAW: A Recurrent Neural Network For Image Generation. arXiv:1502.04623 [Cs].
KiSW16
Kingma, D. P., Salimans, T., & Welling, M. (2016) Improving Variational Inference with Inverse Autoregressive Flow. arXiv:1606.04934 [Cs, Stat].
LSLW15
Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2015) Autoencoding beyond pixels using a learned similarity metric. arXiv:1512.09300 [Cs, Stat].
LoWe16
Louizos, C., & Welling, M. (2016) Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors. arXiv Preprint arXiv:1603.04733.
Mack02a
MacKay, D. J. C.(2002a) Gaussian Processes. In Information Theory, Inference & Learning Algorithms (p. Chapter 45). Cambridge University Press
Mack02b
MacKay, D. J. C.(2002b) Information Theory, Inference & Learning Algorithms. . Cambridge University Press
MoAV17
Molchanov, D., Ashukha, A., & Vetrov, D. (2017) Variational Dropout Sparsifies Deep Neural Networks. arXiv:1701.05369 [Cs, Stat].
RaWi06
Rasmussen, C. E., & Williams, C. K. I.(2006) Gaussian processes for machine learning. . Cambridge, Mass: MIT Press
THSB17
Tran, D., Hoffman, M. D., Saurous, R. A., Brevdo, E., Murphy, K., & Blei, D. M.(2017) Deep Probabilistic Programming. arXiv:1701.03757 [Cs, Stat].
TKDR16
Tran, D., Kucukelbir, A., Dieng, A. B., Rudolph, M., Liang, D., & Blei, D. M.(2016) Edward: A library for probabilistic modeling, inference, and criticism. arXiv:1610.09787 [Cs, Stat].
WaJo05
Wainwright, M., & Jordan, M. (2005) A variational principle for graphical models. In New Directions in Statistical Signal Processing (Vol. 155). MIT Press