The Living Thing / Notebooks :

Probabilistic deep learning

Creating neural networks which infer whole probability densities or certainties for their predictions, rather than point estimates.

In Bayesian terms this is about estimating a posterior distribution, given prior distributions on parameters, and in frequentist terms, it’s an approximate model averageing.

AFAICT this usually boils down to doing variational inference, in which case the neural network is a big approximate PDGM. Apparently you can also do simulation-based inference here, somehow using gradients? Must look into that.

Yarin Gal’s PhD Thesis (Gal16) summarizes some implicit approximate approaches (esp the bayesian interpreation of dropout), and Radford Neal’s thesis (Neal96) approximates some

Alex Graves did a nice poster of his paper (Grav11) of a simplest prior uncertainty thing for recurrent nets - (diagonal Gaussian weight uncertainty) There is a half-arsed implementation.

Practicalities

Blei Lab’s software tool: Edward (source) Tensorflow indeed comes with a contributed Bayesian library called BayesFlow (Which is not the same as the cytometry library of the same name) which by contrast has documentation so perfunctory that I can’t imagine it not being easier to reimplement it than to understand it.

Thomas Wiecki, Bayesian Deep Learning shows how to some variatn with PyMC3.

Christopher Bonnett: Mixture Density Networks with Edward, Keras and TensorFlow.

Refs

AbDH16
Abbasnejad, E., Dick, A., & Hengel, A. van den. (2016) Infinite Variational Autoencoder for Semi-Supervised Learning. In Advances in Neural Information Processing Systems 29.
Bish94
Bishop, C. (1994) Mixture Density Networks. Microsoft Research.
BJPD17
Bora, A., Jalal, A., Price, E., & Dimakis, A. G.(2017) Compressed Sensing using Generative Models. ArXiv:1703.03208 [Cs, Math, Stat].
BuRR17
Bui, T. D., Ravi, S., & Ramavajjala, V. (2017) Neural Graph Machines: Learning Neural Networks Using Graphs. ArXiv:1703.04818 [Cs].
CBMF16
Cutajar, K., Bonilla, E. V., Michiardi, P., & Filippone, M. (2016) Practical Learning of Deep Gaussian Processes via Random Fourier Features. ArXiv:1610.04386 [Stat].
CBMF17
Cutajar, K., Bonilla, E. V., Michiardi, P., & Filippone, M. (2017) Random Feature Expansions for Deep Gaussian Processes. In PMLR.
Gal15
Gal, Y. (2015) Rapid Prototyping of Probabilistic Models: Emerging Challenges in Variational Inference. In Advances in Approximate Bayesian Inference workshop, NIPS.
Gal16
Gal, Y. (2016) Uncertainty in Deep Learning (phdthesis). . University of Cambridge
GaGh15a
Gal, Y., & Ghahramani, Z. (2015a) Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML-16).
GaGh15b
Gal, Y., & Ghahramani, Z. (2015b) On Modern Deep Learning and Variational Inference. In Advances in Approximate Bayesian Inference workshop, NIPS.
GaGh16a
Gal, Y., & Ghahramani, Z. (2016a) A Theoretically Grounded Application of Dropout in Recurrent Neural Networks. In arXiv:1512.05287 [stat].
GaGh16b
Gal, Y., & Ghahramani, Z. (2016b) Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference. In 4th International Conference on Learning Representations (ICLR) workshop track.
Grav11
Graves, A. (2011) Practical Variational Inference for Neural Networks. In Proceedings of the 24th International Conference on Neural Information Processing Systems (pp. 2348–2356). USA: Curran Associates Inc.
GrMH13
Graves, A., Mohamed, A., & Hinton, G. (2013) Speech Recognition with Deep Recurrent Neural Networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. DOI.
GDGR15
Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015) DRAW: A Recurrent Neural Network For Image Generation. ArXiv:1502.04623 [Cs].
HoBl15
Hoffman, M., & Blei, D. (2015) Stochastic Structured Variational Inference. In PMLR (pp. 361–369).
JDWD16
Johnson, M. J., Duvenaud, D., Wiltschko, A. B., Datta, S. R., & Adams, R. P.(2016) Composing graphical models with neural networks for structured representations and fast inference. ArXiv:1603.06277 [Stat].
KSJC16
Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., & Welling, M. (2016) Improving Variational Inference with Inverse Autoregressive Flow. ArXiv:1606.04934 [Cs, Stat].
KiWe13
Kingma, D. P., & Welling, M. (2013) Auto-Encoding Variational Bayes. ArXiv:1312.6114 [Cs, Stat].
KrSS15
Krishnan, R. G., Shalit, U., & Sontag, D. (2015) Deep kalman filters. ArXiv Preprint ArXiv:1511.05121.
LSLW15
Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2015) Autoencoding beyond pixels using a learned similarity metric. ArXiv:1512.09300 [Cs, Stat].
LoCV17
Lobacheva, E., Chirkova, N., & Vetrov, D. (2017) Bayesian Sparsification of Recurrent Neural Networks. In Workshop on Learning to Generate Natural Language.
LoWe16
Louizos, C., & Welling, M. (2016) Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors. ArXiv Preprint ArXiv:1603.04733.
Mack02a
MacKay, D. J. C.(2002a) Gaussian Processes. In Information Theory, Inference & Learning Algorithms (p. Chapter 45). Cambridge University Press
Mack02b
MacKay, D. J. C.(2002b) Information Theory, Inference & Learning Algorithms. . Cambridge University Press
MoAV17
Molchanov, D., Ashukha, A., & Vetrov, D. (2017) Variational Dropout Sparsifies Deep Neural Networks. In Proceedings of ICML.
Neal96
Neal, R. M.(1996) Bayesian Learning for Neural Networks. (Vol. 118). Secaucus, NJ, USA: Springer-Verlag New York, Inc.
RaWi06
Rasmussen, C. E., & Williams, C. K. I.(2006) Gaussian processes for machine learning. . Cambridge, Mass: MIT Press
RaDi16
Ravi, S., & Diao, Q. (2016) Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation. In PMLR (pp. 519–528).
THSB17
Tran, D., Hoffman, M. D., Saurous, R. A., Brevdo, E., Murphy, K., & Blei, D. M.(2017) Deep Probabilistic Programming. In ICLR.
TKDR16
Tran, D., Kucukelbir, A., Dieng, A. B., Rudolph, M., Liang, D., & Blei, D. M.(2016) Edward: A library for probabilistic modeling, inference, and criticism. ArXiv:1610.09787 [Cs, Stat].
WaJo05
Wainwright, M., & Jordan, M. (2005) A variational principle for graphical models. In New Directions in Statistical Signal Processing (Vol. 155). MIT Press