The Living Thing / Notebooks :

Probabilistic deep learning

Creating neural networks which infer whole probability densities their predictions, (usually approximately) rather than point estimates. Or part, at least, of the density estimating problem, accomplished with neural nets, in a Bayesian setting. Prediction uncertainties, approximate model averaging etc would all fit in this category.

AFAICT this usually boils down to doing variational inference, in which case the neural network is a big approximate PDGM. Apparently you can also do simulation-based inference here, somehow using gradients? Must look into that.

To learn:

Contents

Backgrounders

Radford Neal’s thesis (Neal96) is a foundational asymptotically-bayesian use of neural netwroks. Yarin Gal’s PhD Thesis (Gal16) summarizes some implicit approximate approaches (e.g. the Bayesian interpretation of dropout).

Alex Graves did a nice poster of his paper (Grav11) of a simplest prior uncertainty thing for recurrent nets - (diagonal Gaussian weight uncertainty) There is a half-arsed implementation.

Practicalities

Blei Lab’s software tool: Edward (source) Tensorflow indeed comes with a contributed Bayesian library called BayesFlow (Which is not the same as the cytometry library of the same name) which by contrast has documentation so perfunctory that I can’t imagine it not being easier to reimplement it than to understand it.

Thomas Wiecki, Bayesian Deep Learning shows how to some variants with PyMC3.

Christopher Bonnett: Mixture Density Networks with Edward, Keras and TensorFlow.

Refs

AbDH16
Abbasnejad, E., Dick, A., & Hengel, A. van den. (2016) Infinite Variational Autoencoder for Semi-Supervised Learning. In Advances in Neural Information Processing Systems 29.
APBC15
Archer, E., Park, I. M., Buesing, L., Cunningham, J., & Paninski, L. (2015) Black box variational inference for state space models. ArXiv:1511.07367 [Stat].
Bish94
Bishop, C. (1994) Mixture Density Networks. Microsoft Research.
BJPD17
Bora, A., Jalal, A., Price, E., & Dimakis, A. G.(2017) Compressed Sensing using Generative Models. ArXiv:1703.03208 [Cs, Math, Stat].
BuRR17
Bui, T. D., Ravi, S., & Ramavajjala, V. (2017) Neural Graph Machines: Learning Neural Networks Using Graphs. ArXiv:1703.04818 [Cs].
CBMF17
Cutajar, K., Bonilla, E. V., Michiardi, P., & Filippone, M. (2017) Random Feature Expansions for Deep Gaussian Processes. In PMLR.
DDSN18
Doerr, A., Daniel, C., Schiegg, M., Nguyen-Tuong, D., Schaal, S., Toussaint, M., & Trimpe, S. (2018) Probabilistic Recurrent State-Space Models. ArXiv:1801.10395 [Stat].
FaAm14
Fabius, O., & van Amersfoort, J. R.(2014) Variational Recurrent Auto-Encoders. In Proceedings of ICLR.
FlSG17
Flunkert, V., Salinas, D., & Gasthaus, J. (2017) DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks. ArXiv:1704.04110 [Cs, Stat].
Gal15
Gal, Y. (2015) Rapid Prototyping of Probabilistic Models: Emerging Challenges in Variational Inference. In Advances in Approximate Bayesian Inference workshop, NIPS.
Gal16
Gal, Y. (2016) Uncertainty in Deep Learning (phdthesis). . University of Cambridge
GaGh15a
Gal, Y., & Ghahramani, Z. (2015a) Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML-16).
GaGh15b
Gal, Y., & Ghahramani, Z. (2015b) On Modern Deep Learning and Variational Inference. In Advances in Approximate Bayesian Inference workshop, NIPS.
GaGh16a
Gal, Y., & Ghahramani, Z. (2016a) A Theoretically Grounded Application of Dropout in Recurrent Neural Networks. In arXiv:1512.05287 [stat].
GaGh16b
Gal, Y., & Ghahramani, Z. (2016b) Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference. In 4th International Conference on Learning Representations (ICLR) workshop track.
Grav11
Graves, A. (2011) Practical Variational Inference for Neural Networks. In Proceedings of the 24th International Conference on Neural Information Processing Systems (pp. 2348–2356). USA: Curran Associates Inc.
GrMH13
Graves, A., Mohamed, A., & Hinton, G. (2013) Speech Recognition with Deep Recurrent Neural Networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. DOI.
GDGR15
Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015) DRAW: A Recurrent Neural Network For Image Generation. ArXiv:1502.04623 [Cs].
GuGT15
Gu, S., Ghahramani, Z., & Turner, R. E.(2015) Neural Adaptive Sequential Monte Carlo. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 28 (pp. 2629–2637). Curran Associates, Inc.
GLSM15
Gu, S., Levine, S., Sutskever, I., & Mnih, A. (2015) MuProp: Unbiased Backpropagation for Stochastic Neural Networks.
HoBl15
Hoffman, M., & Blei, D. (2015) Stochastic Structured Variational Inference. In PMLR (pp. 361–369).
JDWD16
Johnson, M. J., Duvenaud, D., Wiltschko, A. B., Datta, S. R., & Adams, R. P.(2016) Composing graphical models with neural networks for structured representations and fast inference. ArXiv:1603.06277 [Stat].
KSBS16
Karl, M., Soelch, M., Bayer, J., & van der Smagt, P. (2016) Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data. In Proceedings of ICLR.
KSJC16
Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., & Welling, M. (2016) Improving Variational Inference with Inverse Autoregressive Flow. In Advances in Neural Information Processing Systems 29. Curran Associates, Inc.
KiWe14
Kingma, D. P., & Welling, M. (2014) Auto-Encoding Variational Bayes. In ICLR 2014 conference.
KBCF16
Krauth, K., Bonilla, E. V., Cutajar, K., & Filippone, M. (2016) AutoGP: Exploring the Capabilities and Limitations of Gaussian Process Models. In UAI17.
KrSS15
Krishnan, R. G., Shalit, U., & Sontag, D. (2015) Deep kalman filters. ArXiv Preprint ArXiv:1511.05121.
LSLW15
Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2015) Autoencoding beyond pixels using a learned similarity metric. ArXiv:1512.09300 [Cs, Stat].
LIJR17
Le, T. A., Igl, M., Jin, T., Rainforth, T., & Wood, F. (2017) Auto-Encoding Sequential Monte Carlo. ArXiv Preprint ArXiv:1705.10306.
LoCV17
Lobacheva, E., Chirkova, N., & Vetrov, D. (2017) Bayesian Sparsification of Recurrent Neural Networks. In Workshop on Learning to Generate Natural Language.
LoWe16
Louizos, C., & Welling, M. (2016) Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors. In arXiv preprint arXiv:1603.04733 (pp. 1708–1716).
LoWe17
Louizos, C., & Welling, M. (2017) Multiplicative Normalizing Flows for Variational Bayesian Neural Networks. In PMLR (pp. 2218–2227).
Mack02a
MacKay, D. J. C.(2002a) Gaussian Processes. In Information Theory, Inference & Learning Algorithms (p. Chapter 45). Cambridge University Press
Mack02b
MacKay, D. J. C.(2002b) Information Theory, Inference & Learning Algorithms. . Cambridge University Press
MLTH17
Maddison, C. J., Lawson, D., Tucker, G., Heess, N., Norouzi, M., Mnih, A., … Teh, Y. W.(2017) Filtering Variational Objectives. ArXiv Preprint ArXiv:1705.09279.
MWNF16
Matthews, A. G. de G., van der Wilk, M., Nickson, T., Fujii, K., Boukouvalas, A., León-Villagrá, P., … Hensman, J. (2016) GPflow: A Gaussian process library using TensorFlow. ArXiv:1610.08733 [Stat].
MoAV17
Molchanov, D., Ashukha, A., & Vetrov, D. (2017) Variational Dropout Sparsifies Deep Neural Networks. In Proceedings of ICML.
Neal96
Neal, R. M.(1996) Bayesian Learning for Neural Networks. (Vol. 118). Secaucus, NJ, USA: Springer-Verlag New York, Inc.
NCKN11
Ngiam, J., Chen, Z., Koh, P. W., & Ng, A. Y.(2011) Learning deep energy models. In Proceedings of the 28th International Conference on Machine Learning (ICML-11) (pp. 1105–1112).
RaWi06
Rasmussen, C. E., & Williams, C. K. I.(2006) Gaussian processes for machine learning. . Cambridge, Mass: MIT Press
RaDi16
Ravi, S., & Diao, Q. (2016) Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation. In PMLR (pp. 519–528).
RGMP18
Ryder, T., Golightly, A., McGough, A. S., & Prangle, D. (2018) Black-box Variational Inference for Stochastic Differential Equations. ArXiv:1802.03335 [Stat].
THSB17
Tran, D., Hoffman, M. D., Saurous, R. A., Brevdo, E., Murphy, K., & Blei, D. M.(2017) Deep Probabilistic Programming. In ICLR.
TKDR16
Tran, D., Kucukelbir, A., Dieng, A. B., Rudolph, M., Liang, D., & Blei, D. M.(2016) Edward: A library for probabilistic modeling, inference, and criticism. ArXiv:1610.09787 [Cs, Stat].
WaJo05
Wainwright, M., & Jordan, M. (2005) A variational principle for graphical models. In New Directions in Statistical Signal Processing (Vol. 155). MIT Press